Page 1 of 1

Project: 6041 (Run 0, Clone 132, Gen 176)

Posted: Sun Jul 31, 2011 9:26 am
by bollix47
Possible bad WU. Failed immediately.

Client has done almost 2000 SMP WUs successfully and cpu is liquid cooled so it's unlikely that temperature was a problem.

Some data was returned and the next WU is folding fine.

Code: Select all

[07:41:17] + Processing work unit
[07:41:17] Core required: FahCore_a3.exe
[07:41:17] Core found.
[07:41:17] Working on queue slot 07 [July 31 07:41:17 UTC]
[07:41:17] + Working ...
[07:41:17] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 07 -np 8 -priority 96 -checkpoint 30 -verbose -lifeline 1732 -version 634'

[07:41:17] 
[07:41:17] *------------------------------*
[07:41:17] Folding@Home Gromacs SMP Core
[07:41:17] Version 2.27 (Dec. 15, 2010)
[07:41:17] 
[07:41:17] Preparing to commence simulation
[07:41:17] - Looking at optimizations...
[07:41:17] - Created dyn
[07:41:17] - Files status OK
[07:41:18] - Expanded 7882668 -> 10126021 (decompressed 128.4 percent)
[07:41:18] Called DecompressByteArray: compressed_data_size=7882668 data_size=10126021, decompressed_data_size=10126021 diff=0
[07:41:18] - Digital signature verified
[07:41:18] 
[07:41:18] Project: 6041 (Run 0, Clone 132, Gen 176)
[07:41:18] 
[07:41:18] Assembly optimizations on if available.
[07:41:18] Entering M.D.
[07:41:24] Mapping NT from 8 to 8 
[07:41:25] Completed 0 out of 250000 steps  (0%)
[07:41:26] mdrun returned 255
[07:41:26] Going to send back what have done -- stepsTotalG=250000
[07:41:26] Work fraction=755914309632.0000 steps=250000.
[07:41:30] logfile size=12441 infoLength=12441 edr=25 trr=1
[07:41:30] logfile size: 12441 info=12441 bed=25 hdr=1
[07:41:30] - Writing 12979 bytes of core data to disk...
[07:41:30]   ... Done.
[07:41:30] 
[07:41:30] Folding@home Core Shutdown: UNSTABLE_MACHINE
[07:41:30] CoreStatus = 7A (122)
[07:41:30] Sending work to server
[07:41:30] Project: 6041 (Run 0, Clone 132, Gen 176)


[07:41:30] + Attempting to send results [July 31 07:41:30 UTC]
[07:41:30] - Reading file work/wuresults_07.dat from core
[07:41:30]   (Read 12979 bytes from disk)
[07:41:30] Connecting to http://171.64.65.54:8080/
[07:41:31] Posted data.
[07:41:31] Initial: 0000; - Uploaded at ~13 kB/s
[07:41:31] - Averaged speed for that direction ~96 kB/s
[07:41:31] + Results successfully sent
[07:41:31] Thank you for your contribution to Folding@Home.
[07:41:31] Trying to send all finished work units
[07:41:31] + No unsent completed units remaining.
[07:41:31] - Preparing to get new work unit...
[07:41:31] Cleaning up work directory
[07:41:31] + Attempting to get work packet
[07:41:31] Passkey found
[07:41:31] - Will indicate memory of 3954 MB
[07:41:31] - Connecting to assignment server
[07:41:31] Connecting to http://assign.stanford.edu:8080/
[07:41:31] Posted data.
[07:41:31] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[07:41:31] + News From Folding@Home: Welcome to Folding@Home
[07:41:32] Loaded queue successfully.
[07:41:32] Sent data
[07:41:32] Connecting to http://171.64.65.54:8080/
[07:41:33] Posted data.
[07:41:33] Initial: 0000; - Receiving payload (expected size: 1763579)
[07:41:34] - Downloaded at ~1722 kB/s
[07:41:34] - Averaged speed for that direction ~1084 kB/s
[07:41:34] + Received work.
[07:41:34] Trying to send all finished work units
[07:41:34] + No unsent completed units remaining.
[07:41:34] + Closed connections
[07:41:39] 

Re: Project: 6041 (Run 0, Clone 132, Gen 176)

Posted: Sun Jul 31, 2011 9:34 am
by PantherX
It was a bad one for many many donors:
The WU (P6041,R0,C132,G176) has been reported as a bad WU.
Thanks a lot for your report.

Re: Project: 6041 (Run 0, Clone 132, Gen 176)

Posted: Mon Aug 01, 2011 10:03 pm
by ChasR
18 straight instant EUEs and a 24 hour pause on Project: 6041 (Run 0, Clone 132, Gen 176).

Re: Project: 6041 (Run 0, Clone 132, Gen 176)

Posted: Mon Aug 01, 2011 11:41 pm
by bruce
bollix47 and PantherX beat you to it.

Using PDT (the local time at Stanford) bolix reported the problem at 2:26 am, PantherX notified the servers at 2:34 am, and the servers suspended new assignments of the WU at 8:00, all on Sun Jul 31. When was the WU assigned to you, ChasR?

Re: Project: 6041 (Run 0, Clone 132, Gen 176)

Posted: Tue Aug 02, 2011 2:15 am
by ChasR
I was on out of town from Thursday until this afternoon. Once I went so far as to make the effort to report it as a bad, I figured I might as well post about it, though of little use. I got the 24 hour pause notice from HFM at 0258 PDT Sunday July 31, so it couldn't have been very long before then. I can't currently view the log to see the exact assignment time on the machine in question.