Page 1 of 1

2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Thu May 22, 2008 12:00 am
by parkut
This particular work unit simply won't start, it immediately SIGTERMS

model name : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
cpu MHz : 1998.000
cache size : 4096 KB
Memory: 975.99 MB physical, 1.94 GB virtual
...
Current Work Unit
-----------------
Name: Protein in POPC
Tag: P2605R0C540G46
Download time: May 21 22:54:05
Due time: May 25 22:54:05
Progress: 0% [__________]

Code: Select all


./fah6 -verbosity 9 -smp

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

2 cores detected


--- Opening Log file [May 21 23:48:51] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 6.02beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /root/fah6
Executable: ./fah6
Arguments: -verbosity 9 -smp 

[23:48:51] - Ask before connecting: No
[23:48:51] - User name: parkut (Team 4)
[23:48:51] - User ID: 3DAEED787A87E4F8
[23:48:51] - Machine ID: 1
[23:48:51] 
[23:48:51] Loaded queue successfully.
[23:48:51] 
[23:48:51] + Processing work unit
[23:48:51] Core required: FahCore_a1.exe
[23:48:51] Core found.
[23:48:51] - Autosending finished units...
[23:48:51] Trying to send all finished work units
[23:48:51] + No unsent completed units remaining.
[23:48:51] - Autosend completed
[23:48:51] Working on Unit 06 [May 21 23:48:51]
[23:48:51] + Working ...
[23:48:51] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 06 -checkpoint 15 -verbose -lifeline 10178 -version 602'

[23:48:51] 
[23:48:51] *------------------------------*
[23:48:51] Folding@Home Gromacs SMP Core
[23:48:51] Version 1.74 (November 27, 2006)
[23:48:51] 
[23:48:51] Preparing to commence simulation
[23:48:51] - Ensuring status. Please wait.
[23:48:51] 
[23:48:51] Project: 2605 (Run 0, Clone 540, Gen 46)
[23:48:51] 
[23:48:51] Assembly optimizations on if available.
[23:48:51] Entering M.D.
[23:49:08] - Expanded 2420849 -> 12854153 (decompressed 530.9 percent)
[23:49:08] 
[23:49:08] Project: 2605 (Run 0, Clone 540, Gen 46)
[23:49:08] 
[23:49:08] Entering M.D.
NNODES=4, MYRANK=3, HOSTNAME=conroe7.parkut.com
NNODES=4, MYRANK=0, HOSTNAME=conroe7.parkut.com
NNODES=4, MYRANK=2, HOSTNAME=conroe7.parkut.com
NNODES=4, MYRANK=1, HOSTNAME=conroe7.parkut.com
NODEID=2 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=3 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

(single precision)
starting mdrun 'Protein in POPC'
500000 steps,   1000.0 ps.

[23:49:16] s
[23:49:16] otein: ProteExtra SSE boost OK.
[23:49:16] ocal files
[23:49:16] Extra SSE boost OK.
[23:49:16] Finalizing output
[23:49:16] nt)
[23:49:16] 
[23:49:16] Folding@home Core Shutdown: INTERRUPTED
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[23:49:20] CoreStatus = 66 (102)
[23:49:20] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[23:49:20] Killing all core threads

Folding@Home Client Shutdown.



Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Thu May 22, 2008 2:42 am
by parkut
After 16 attempts at restarting and having the same immediate SIGTERM
I deleted the work files, queue.dat and machinedependant.dat files
restarted and picked up a different 2605, which is now running normally.

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Thu May 22, 2008 2:50 am
by sortofageek
That WU has not been returned by anyone as of this time.

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Wed Jul 02, 2008 2:30 am
by Xilikon
Another report of that WU on one box, which even shut down after 3 tries (each try stopped with a 0x0 error) :

Code: Select all

[21:50:25] Writing local files
[21:50:25] Completed 410000 out of 500000 steps  (82 percent)
[22:02:07] Writing local files
[22:02:07] Completed 415000 out of 500000 steps  (83 percent)
[22:13:48] Writing local files
[22:13:48] Completed 420000 out of 500000 steps  (84 percent)
[22:25:30] Writing local files
[22:25:30] Completed 425000 out of 500000 steps  (85 percent)
[22:37:18] Writing local files
[22:37:18] Completed 430000 out of 500000 steps  (86 percent)
[22:49:06] Writing local files
[22:49:06] Completed 435000 out of 500000 steps  (87 percent)
[23:00:54] Writing local files
[23:00:54] Completed 440000 out of 500000 steps  (88 percent)
[23:12:41] Writing local files
[23:12:41] Completed 445000 out of 500000 steps  (89 percent)
[23:24:20] Writing local files
[23:24:20] Completed 450000 out of 500000 steps  (90 percent)
[23:36:09] Writing local files
[23:36:09] Completed 455000 out of 500000 steps  (91 percent)
[23:47:59] Writing local files
[23:47:59] Completed 460000 out of 500000 steps  (92 percent)
[23:59:49] Writing local files
[23:59:49] Completed 465000 out of 500000 steps  (93 percent)
[00:11:39] Writing local files
[00:11:39] Completed 470000 out of 500000 steps  (94 percent)
[00:23:29] Writing local files
[00:23:29] Completed 475000 out of 500000 steps  (95 percent)
[00:35:18] Writing local files
[00:35:19] Completed 480000 out of 500000 steps  (96 percent)
[00:44:21] - Autosending finished units...
[00:44:21] Trying to send all finished work units
[00:44:21] + No unsent completed units remaining.
[00:44:21] - Autosend completed
[00:47:08] Writing local files
[00:47:08] Completed 485000 out of 500000 steps  (97 percent)
[00:58:56] Writing local files
[00:58:56] Completed 490000 out of 500000 steps  (98 percent)
[01:10:45] Writing local files
[01:10:45] Completed 495000 out of 500000 steps  (99 percent)
[01:22:35] Writing local files
[01:22:35] Completed 500000 out of 500000 steps  (100 percent)
[01:22:35] Writing final coordinates.
[01:22:35] Past main M.D. loop
[01:22:35] Will end MPI now
[01:23:35] 
[01:23:35] Finished Work Unit:
[01:23:35] - Reading up to 3718704 from "work/wudata_09.arc": Read 3718704
[01:23:35] - Reading up to 1770828 from "work/wudata_09.xtc": Read 1770828
[01:23:35] goefile size: 0
[01:23:35] logfile size: 16912
[01:23:35] Leaving Run
[01:23:40] - Writing 5510844 bytes of core data to disk...
[01:23:40]   ... Done.
[01:23:46] - Shutting down core
[01:23:46] 
[01:23:46] Folding@home Core Shutdown: FINISHED_UNIT
[01:24:01] CoreStatus = 64 (100)
[01:24:01] Unit 9 finished with 79 percent of time to deadline remaining.
[01:24:01] Updated performance fraction: 0.792123
[01:24:01] Sending work to server


[01:24:01] + Attempting to send results
[01:24:01] - Reading file work/wuresults_09.dat from core
[01:24:01]   (Read 5510844 bytes from disk)
[01:24:01] Connecting to http://171.64.65.56:8080/
[01:25:00] Posted data.
[01:25:00] Initial: 0000; - Uploaded at ~89 kB/s
[01:25:01] - Averaged speed for that direction ~89 kB/s
[01:25:01] + Results successfully sent
[01:25:01] Thank you for your contribution to Folding@Home.
[01:25:01] + Number of Units Completed: 9

[01:29:08] - Warning: Could not delete all work unit files (9): Core returned invalid code
[01:29:08] Trying to send all finished work units
[01:29:08] + No unsent completed units remaining.
[01:29:08] - Preparing to get new work unit...
[01:29:08] + Attempting to get work packet
[01:29:08] - Will indicate memory of 768 MB
[01:29:08] - Connecting to assignment server
[01:29:08] Connecting to http://assign.stanford.edu:8080/
[01:29:08] Posted data.
[01:29:08] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[01:29:08] + News From Folding@Home: Welcome to Folding@Home
[01:29:08] Loaded queue successfully.
[01:29:08] Connecting to http://171.64.65.56:8080/
[01:29:11] Posted data.
[01:29:11] Initial: 0000; - Receiving payload (expected size: 2421361)
[01:29:14] - Downloaded at ~788 kB/s
[01:29:14] - Averaged speed for that direction ~729 kB/s
[01:29:14] + Received work.
[01:29:14] Trying to send all finished work units
[01:29:14] + No unsent completed units remaining.
[01:29:14] + Closed connections
[01:29:14] 
[01:29:14] + Processing work unit
[01:29:14] Core required: FahCore_a1.exe
[01:29:14] Core found.
[01:29:14] Working on Unit 00 [July 2 01:29:14]
[01:29:14] + Working ...
[01:29:14] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 00 -priority 96 -checkpoint 20 -verbose -lifeline 5410 -version 602'

[01:29:14] 
[01:29:14] *------------------------------*
[01:29:14] Folding@Home Gromacs SMP Core
[01:29:14] Version 1.74 (November 27, 2006)
[01:29:14] 
[01:29:14] Preparing to commence simulation
[01:29:14] - Ensuring status. Please wait.
[01:29:15] - Starting from initial work packet
[01:29:15] 
[01:29:15] Project: 2605 (Run 0, Clone 540, Gen 46)
[01:29:15] 
[01:29:15] Assembly optimizations on if available.
[01:29:15] Entering M.D.
[01:29:32]  percent)
[01:29:32] - Starting from initial work packet
[01:29:32] 
[01:29:32] Project: 2605 (Run 0, Clone 540, Gen 46)
[01:29:32] 
[01:29:32] Entering M.D.
[01:29:39] Protein: Protein in POPC
[01:29:39] Writing local files
[01:29:39] Extra SSE boost OK.
[01:29:44] CoreStatus = 0 (0)
[01:29:44] Client-core communications error: ERROR 0x0
[01:29:44] Deleting current work unit & continuing...
[01:34:08] - Warning: Could not delete all work unit files (0): Core returned invalid code
[01:34:08] Trying to send all finished work units
[01:34:08] + No unsent completed units remaining.
[01:34:08] - Preparing to get new work unit...
[01:34:08] + Attempting to get work packet
[01:34:08] - Will indicate memory of 768 MB
[01:34:08] - Connecting to assignment server
[01:34:08] Connecting to http://assign.stanford.edu:8080/
[01:34:08] Posted data.
[01:34:08] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[01:34:08] + News From Folding@Home: Welcome to Folding@Home
[01:34:08] Loaded queue successfully.
[01:34:08] Connecting to http://171.64.65.56:8080/
[01:34:11] Posted data.
[01:34:11] Initial: 0000; - Receiving payload (expected size: 2421361)
[01:34:14] - Downloaded at ~788 kB/s
[01:34:14] - Averaged speed for that direction ~741 kB/s
[01:34:14] + Received work.
[01:34:14] + Closed connections
[01:34:19] 
[01:34:19] + Processing work unit
[01:34:19] Core required: FahCore_a1.exe
[01:34:19] Core found.
[01:34:19] Working on Unit 01 [July 2 01:34:19]
[01:34:19] + Working ...
[01:34:19] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -priority 96 -checkpoint 20 -verbose -lifeline 5410 -version 602'

[01:34:19] 
[01:34:19] *------------------------------*
[01:34:19] Folding@Home Gromacs SMP Core
[01:34:19] Version 1.74 (November 27, 2006)
[01:34:19] 
[01:34:19] Preparing to commence simulation
[01:34:19] - Ensuring status. Please wait.
[01:34:36] - Looking at optimizations...
[01:34:36] - Working with standard loops on this execution.
[01:34:36] - Previous termination of core was improper.
[01:34:36] - Going to use standard loops.
[01:34:36] - Files status OK
[01:34:37] (decompressed 530.9 percent)
[01:34:37] - Starting from initial work packet
[01:34:37] 
[01:34:37] Project: 2605 (Run 0, Clone 540, Gen 46)
[01:34:37] 
[01:34:37] Entering M.D.
[01:34:37] ne 540, Gen 46)
[01:34:37] 
[01:34:37] Entering M.D.
[01:34:44] g local files
[01:34:44] boost OK.
[01:34:44] boost OK.
[01:34:44] ocal files
[01:34:44] Extra SSE boost OK.
[01:34:45] Finalizing output
[01:34:45] cent)
[01:34:49] CoreStatus = 0 (0)
[01:34:49] Client-core communications error: ERROR 0x0
[01:34:49] Deleting current work unit & continuing...
[01:39:13] - Warning: Could not delete all work unit files (1): Core returned invalid code
[01:39:13] Trying to send all finished work units
[01:39:13] + No unsent completed units remaining.
[01:39:13] - Preparing to get new work unit...
[01:39:13] + Attempting to get work packet
[01:39:13] - Will indicate memory of 768 MB
[01:39:13] - Connecting to assignment server
[01:39:13] Connecting to http://assign.stanford.edu:8080/
[01:39:13] Posted data.
[01:39:13] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[01:39:13] + News From Folding@Home: Welcome to Folding@Home
[01:39:13] Loaded queue successfully.
[01:39:13] Connecting to http://171.64.65.56:8080/
[01:39:16] Posted data.
[01:39:16] Initial: 0000; - Receiving payload (expected size: 2421361)
[01:39:19] - Downloaded at ~788 kB/s
[01:39:19] - Averaged speed for that direction ~750 kB/s
[01:39:19] + Received work.
[01:39:19] + Closed connections
[01:39:24] 
[01:39:24] + Processing work unit
[01:39:24] Core required: FahCore_a1.exe
[01:39:24] Core found.
[01:39:24] Working on Unit 02 [July 2 01:39:24]
[01:39:24] + Working ...
[01:39:24] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -priority 96 -checkpoint 20 -verbose -lifeline 5410 -version 602'

[01:39:24] 
[01:39:24] *------------------------------*
[01:39:24] Folding@Home Gromacs SMP Core
[01:39:24] Version 1.74 (November 27, 2006)
[01:39:24] 
[01:39:24] Preparing to commence simulation
[01:39:24] - Ensuring status. Please wait.
[01:39:41] - Looking at optimizations...
[01:39:41] - Working with standard loops on this execution.
[01:39:41] - Previous termination of core was improper.
[01:39:41] - Going to use standard loops.
[01:39:41] - Files status OK
[01:39:42] (decompressed 530.9 percent)
[01:39:42] - Starting from initial work pa- Starting from initial work packet
[01:39:42] 
[01:39:42] Project: 2605 (Run 0, Clone 540, Gen 46)
[01:39:42] 
[01:39:42] Entering M.D.
[01:39:49] g local files
[01:39:49] in in POPC
[01:39:49] Writing local files
[01:39:49] Extra SSE boost OK.
[01:39:50] 0000 steps  (0 percent)
[01:39:50] 
[01:39:50] Folding@home Core Shutdown: INTERRUPTED
[01:39:55] CoreStatus = 66 (102)
[01:39:55] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[01:39:55] Killing all core threads

Folding@Home Client Shutdown.
This is on a known stable box.

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Wed Jul 02, 2008 1:29 pm
by rbrandman
Thanks for your post. I'll alert the researcher in charge of this project, Peter Kasson, so he can check into it.

Relly

2605 (Run 0, Clone 540, Gen 46)

Posted: Tue Jul 08, 2008 2:14 pm
by Flathead74
I have received this WU multiple times, and on different machines, in the past few days.

The result is always the same:

Code: Select all

[20:09:16] Project: 2605 (Run 0, Clone 540, Gen 46)
[20:09:16] 
[20:09:16] Assembly optimizations on if available.
[20:09:16] Entering M.D.
[20:09:22] Rejecting checkpoint
[20:09:23] OPC
[20:09:23] Writing local files
[20:09:23] Extra SSE boost OK.
[20:09:23] 
[20:09:23] Extra SSE boost OK.
[20:09:23] Writing local files
[20:09:23] Completed 0 out of 500000 steps  (0 percent)
[20:09:28] CoreStatus = 0 (0)
[20:09:28] Client-core communications error: ERROR 0x0
[20:09:28] Deleting current work unit & continuing...
As this WU has been tried by many, and with a great variety of hardware,
perhaps the time is right to remove this time waster from circulation.

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Tue Jul 08, 2008 2:29 pm
by rbrandman
Thanks, I'll let Peter know that there are still issues with the WU.

Relly

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Tue Jul 08, 2008 2:35 pm
by kasson
WU stopped--shouldn't be assigned any further.

Re: 2605 (Run 0, Clone 540, Gen 46) CoreStatus = 66 (102)

Posted: Tue Jul 08, 2008 2:48 pm
by Flathead74
kasson wrote:WU stopped--shouldn't be assigned any further.
Thank you, ever so much.