proj 6805 R 3626 c3 g14 - NANs detected on GPU
Posted: Wed Mar 09, 2011 11:02 pm
[21:00:34] Project: 6805 (Run 3626, Clone 3, Gen 14)
[21:00:34]
[21:00:34] Assembly optimizations on if available.
[21:00:34] Entering M.D.
[21:00:36] Tpr hash work/wudata_03.tpr: 2009117313 622594832 3856663995 3903849362 195187923
[21:00:36] Working on ALZHEIMER'S DISEASE AMYLOID
[21:00:36] Client config found, loading data.
[21:00:36] Starting GUI Server
[21:00:36] Setting checkpoint frequency: 500000
[21:00:36] Setting checkpoint frequency: 500000
[21:01:50] Completed 500000 out of 50000000 steps (1%).
[21:03:04] Completed 1000000 out of 50000000 steps (2%).
[21:04:18] Completed 1500000 out of 50000000 steps (3%).
[21:05:33] Completed 2000000 out of 50000000 steps (4%).
[21:06:47] Completed 2500000 out of 50000000 steps (5%).
[21:08:01] Completed 3000000 out of 50000000 steps (6%).
[21:09:15] Completed 3500000 out of 50000000 steps (7%).
[21:10:31] Completed 4000000 out of 50000000 steps (8%).
[21:11:46] Completed 4500000 out of 50000000 steps (9%).
[21:13:01] Completed 5000000 out of 50000000 steps (10%).
[21:14:16] Completed 5500000 out of 50000000 steps (11%).
[21:15:30] Completed 6000000 out of 50000000 steps (12%).
[21:16:45] Completed 6500000 out of 50000000 steps (13%).
[21:18:00] Completed 7000000 out of 50000000 steps (14%).
[21:18:00] mdrun_gpu returned 52
[21:18:00] NANs detected on GPU
[21:18:00]
[21:18:00] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:18:03] CoreStatus = 7A (122)
[21:18:03] Sending work to server
[21:18:03] Project: 6805 (Run 3626, Clone 3, Gen 14)
[21:18:03] - Error: Could not get length of results file work/wuresults_03.dat
[21:18:03] - Error: Could not read unit 03 file. Removing from queue.
[21:18:03] - Preparing to get new work unit...
[21:18:03] Cleaning up work directory
Another one on the same project -
had Nana error with another 6805 before
just did 9 in a row with no issues till this one
and the WU one after this one works fine
[21:00:34]
[21:00:34] Assembly optimizations on if available.
[21:00:34] Entering M.D.
[21:00:36] Tpr hash work/wudata_03.tpr: 2009117313 622594832 3856663995 3903849362 195187923
[21:00:36] Working on ALZHEIMER'S DISEASE AMYLOID
[21:00:36] Client config found, loading data.
[21:00:36] Starting GUI Server
[21:00:36] Setting checkpoint frequency: 500000
[21:00:36] Setting checkpoint frequency: 500000
[21:01:50] Completed 500000 out of 50000000 steps (1%).
[21:03:04] Completed 1000000 out of 50000000 steps (2%).
[21:04:18] Completed 1500000 out of 50000000 steps (3%).
[21:05:33] Completed 2000000 out of 50000000 steps (4%).
[21:06:47] Completed 2500000 out of 50000000 steps (5%).
[21:08:01] Completed 3000000 out of 50000000 steps (6%).
[21:09:15] Completed 3500000 out of 50000000 steps (7%).
[21:10:31] Completed 4000000 out of 50000000 steps (8%).
[21:11:46] Completed 4500000 out of 50000000 steps (9%).
[21:13:01] Completed 5000000 out of 50000000 steps (10%).
[21:14:16] Completed 5500000 out of 50000000 steps (11%).
[21:15:30] Completed 6000000 out of 50000000 steps (12%).
[21:16:45] Completed 6500000 out of 50000000 steps (13%).
[21:18:00] Completed 7000000 out of 50000000 steps (14%).
[21:18:00] mdrun_gpu returned 52
[21:18:00] NANs detected on GPU
[21:18:00]
[21:18:00] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:18:03] CoreStatus = 7A (122)
[21:18:03] Sending work to server
[21:18:03] Project: 6805 (Run 3626, Clone 3, Gen 14)
[21:18:03] - Error: Could not get length of results file work/wuresults_03.dat
[21:18:03] - Error: Could not read unit 03 file. Removing from queue.
[21:18:03] - Preparing to get new work unit...
[21:18:03] Cleaning up work directory
Another one on the same project -
had Nana error with another 6805 before
just did 9 in a row with no issues till this one
and the WU one after this one works fine