Page 1 of 1

Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Fri Feb 26, 2010 5:43 pm
by Phantom
Got a strange one here... Recommend someone keep an eye out for it... Only uses 75% of dual C2D Mac Pro running latest Snow Leopard 10.6.2.

Never gets to 1% frame completion -- just stays at 0% completion for hours. Normally frame times around 7 minutes... Manually terminated by user at 16:55:37 in the log below

Code: Select all

[10:42:29] - Preparing to get new work unit...
[10:42:29] Cleaning up work directory
[10:42:29] + Attempting to get work packet
[10:42:29] Passkey found
[10:42:29] - Will indicate memory of 8192 MB
[10:42:29] - Connecting to assignment server
[10:42:29] Connecting to http://assign.stanford.edu:8080/
[10:42:30] Posted data.
[10:42:30] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[10:42:30] + News From Folding@Home: Welcome to Folding@Home
[10:42:30] Loaded queue successfully.
[10:42:30] Sent data
[10:42:30] Connecting to http://171.64.65.54:8080/
[10:42:30] Posted data.
[10:42:30] Initial: 0000; - Receiving payload (expected size: 1768098)
[10:42:32] - Downloaded at ~863 kB/s
[10:42:32] - Averaged speed for that direction ~750 kB/s
[10:42:32] + Received work.
[10:42:32] Trying to send all finished work units
[10:42:32] + No unsent completed units remaining.
[10:42:32] + Closed connections
[10:42:32] 
[10:42:32] + Processing work unit
[10:42:32] Core required: FahCore_a3.exe
[10:42:32] Core found.
[10:42:32] Working on queue slot 04 [February 26 10:42:32 UTC]
[10:42:32] + Working ...
[10:42:32] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 04 -np 4 -checkpoint 15 -forceasm -verbose -lifeline 284 -version 629'

[10:42:32] 
[10:42:32] *------------------------------*
[10:42:32] Folding@Home Gromacs SMP Core
[10:42:32] Version 2.13 (Dec 9 2009)
[10:42:32] 
[10:42:32] Preparing to commence simulation
[10:42:32] - Assembly optimizations manually forced on.
[10:42:32] - Not checking prior termination.
[10:42:33] - Expanded 1767586 -> 1967109 (decompressed 111.2 percent)
[10:42:33] Called DecompressByteArray: compressed_data_size=1767586 data_size=1967109, decompressed_data_size=1967109 diff=0
[10:42:33] - Digital signature verified
[10:42:33] 
[10:42:33] Project: 6023 (Run 1, Clone 36, Gen 23)
[10:42:33] 
[10:42:33] Assembly optimizations on if available.
[10:42:33] Entering M.D.
[10:42:39] Completed 0 out of 500000 steps  (0%)
[14:26:42] - Autosending finished units... [February 26 14:26:42 UTC]
[14:26:42] Trying to send all finished work units
[14:26:42] + No unsent completed units remaining.
[14:26:42] - Autosend completed
[16:55:37] ***** Got a SIGTERM signal (15)
[16:55:37] Killing all core threads

Folding@Home Client Shutdown.
I've stopped WU, restarted system, and resumed WU; however, still doesn't move past 0%... No special indications in log.

FahCore_a3.exe was running with 7 threads and only consumes 75% of my CPUs (shows as 300% in Activity Monitor). Most of the other FahCore_a3 WUs seem to run with 8 threads and consume nearly 100% of each CPU on my system (shows as >390% in Activity Monitor).

Dumped WU after no progress. New WU was assigned.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Fri Feb 26, 2010 6:36 pm
by Wrish
I'm a little confused. You have 4 cores (2 dual core processors) but A3 spawns 7 or 8 threads? I'm unfamiliar with Snow Leopard but wonder how you know it's 7 threads. Do check that when your client is stopped, there is no A3 process still hanging around. A3 should spawn 4 threads unless you force it with -smp 8, and if no other process is hogging a CPU core, it should use close to 400% CPU.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Fri Feb 26, 2010 11:20 pm
by bruce
FahCore_a3 will spawn one worker thread per CPU-core (unless you specify otherwise) plus several other threads that do almost nothing . . . except to wait for a timer to tell it to run a brief process and go back to sleep. For example, there's a thread that waits 6 hours before checking the queue to see if there are any WUs that still need to be uploaded.

In *nix, it's easy to see the threads. In Windows, the threads are normally hidden within a smaller number of processes.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Sun Feb 28, 2010 9:39 pm
by Phantom
The evil WU was recently reassigned to me... Exact same symptoms... Stuck at 0% completion and using only 300% of an idle Mac Pro... Unfortunately, I wasn't around to detect the situation earlier and just detected the repeat situation.

Dumping that bad WU again. (Hope it will successfully complete for some other slightly different configuration -- or that it gets removed from the work queue...)

Fold on!

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Mon Mar 01, 2010 4:02 am
by bruce
Was the the same machine or another one? If it's the same one, what other WUs were completed between your first report and this one?

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Mon Mar 01, 2010 9:11 am
by Phantom
Bruce -- Yes. Same machine. Between the two instances of the evil WU, the machine processed the following WUs:

Assigned @ [February 26 17:35:48 UTC] Project: 6012 (Run 0, Clone 145, Gen 73) --> CoreStatus = 64
(Note that this WU was erroneously reassigned to this same machine below, even though successfully completed and results returned)

Assigned @ [February 27 05:45:50 UTC] Project: 6015 (Run 0, Clone 53, Gen 56) --> CoreStatus = 0
(Note that this WU was correctly reassigned to this same machine below, AND was then successfully completed and results returned)

Assigned @ [February 27 05:48:37 UTC] Project: 6012 (Run 0, Clone 145, Gen 73) --> CoreStatus = 0
(Strange seeing that the WU was correctly completed/returned above)

Assigned @ [February 27 05:51:32 UTC] Project: 6015 (Run 0, Clone 53, Gen 56) --> CoreStatus = 64

Assigned @ [February 27 18:18:09 UTC] Project: 6023 (Run 1, Clone 36, Gen 23) *** 2nd assignment of subject WU

Do these times correspond to any WU assignment issues that you've noticed on your end?

Other than this little apparent hiccup in the subject WU and these "interesting" WU assignments, the machine has been continually crunching through a mix of A1, A2, and A3 WUs.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Tue Mar 02, 2010 11:47 am
by ikerekes
got this killer wu second time in 2 days (:
Had to delete it and move forward

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Tue Mar 02, 2010 12:41 pm
by toTOW
I marked this WU as a bad one.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Sat Mar 13, 2010 3:33 pm
by hrsetrdr
I just got a P6023 (Run 1, Clone 119, Gen37) which looks to be progressing normally, on a C2D E6300 @2.0ghz,Linux kernel 2.6.31-14. TPF=16 mins 23 sec.

Re: Project 6023 (Run 1, Clone 36, Gen 23)

Posted: Sat Mar 13, 2010 6:30 pm
by Phantom
I also successfully fold P6023 -- just not the evil WU in question. Currently, I've got Project 6023 (Run 0, Clone 81, Gen 45) folding on one of my Mac Minis (C2D T7600 @ 2.33GHz) with TPF=15 mins 45 sec.

Beware the evil WU! ;-)