Hardware configuration: HP xw4600 workstation (4GB)+Q9650+Sapphire Vapor-X HD4890, HP Z600 workstation (4GB)+2xXEON E5540+Sapphire HD5770, HP ML350 server (4GB)+2xXEON E5520+Diamond HD3850
Many reproducible failures with this WU. At last count, this WU has aborted on error 0x77 (UNKNOWN_ERROR) 6 times in a row at 20 minutes past the 7% completion mark each time. (The other 7 CPU clients running on the same machine exhibit no problems at all.)
Thanks for reporting the problem. If you haven't deleted it already, go ahead (run the client with the - delete xx flag, where xx is its queue position) and try for another one.
I'm confirming this. The same unit has been haunting me for some time now. If i delete the queu/workunit files I still keep getting it...
[edit] Yep, I have deleted the same workunit several times now from the queue and the server still tries to give it to me. Annoying.
[edit] After several tries the server finally gave me something else to crunch. Let's see if the same unit comes to haunt me again later.
I wish Stanford would delete this rogue unit one of our members has suffered multiple failures @ 7% with this unit.
[16:40:54] Project: 2499 (Run 191, Clone 3, Gen 1)
[16:40:54]
[16:40:55] Assembly optimizations on if available.
[16:40:55] Entering M.D.
[16:41:02] Protein: Translocon_ALX2
[16:41:02]
[16:41:03] Writing local files
[16:41:05] Extra SSE boost OK.
[16:41:06] Writing local files
[16:41:06] Completed 0 out of 500000 steps (0%)
[17:26:37] Writing local files
[17:26:37] Completed 5000 out of 500000 steps (1%)
[18:14:30] Writing local files
[18:14:30] Completed 10000 out of 500000 steps (2%)
[19:04:46] Writing local files
[19:04:46] Completed 15000 out of 500000 steps (3%)
[20:07:10] Writing local files
[20:07:10] Completed 20000 out of 500000 steps (4%)
[21:22:23] Writing local files
[21:22:24] Completed 25000 out of 500000 steps (5%)
[22:42:57] Writing local files
[22:42:59] Completed 30000 out of 500000 steps (6%)
[00:08:07] Writing local files
[00:08:08] Completed 35000 out of 500000 steps (7%)
[01:04:42]
[01:04:43] Folding@home Core Shutdown: UNKNOWN_ERROR
[01:04:46] CoreStatus = 77 (119)
[01:04:46] Client-core communications error: ERROR 0x77
[01:04:46] This is a sign of more serious problems, shutting down.
Always 7%, but if he stops it early and restart it throws the same error.
This "rogue" unit has been in circulation for OVER 2 months now, can't it be REMOVED from the server instead of keep handing it out & failing multiple times thus causing immense frustration. The user eventually removed the -advmethods flag on my advice & grabbed a different unit.