Page 1 of 1
Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Thu Sep 10, 2009 11:54 pm
by SidVicious
Another borked WU:
Code: Select all
[22:26:29] Project: 5744 (Run 0, Clone 34, Gen 351)
[22:26:29]
[22:26:29] Assembly optimizations on if available.
[22:26:29] Entering M.D.
[22:26:35] Tpr hash work/wudata_09.tpr: 946666329 2926698113 3632645470 3862744873 578111165
[22:26:35] Working on Protein
[22:26:36] Client config found, loading data.
[22:26:36] Starting GUI Server
[22:26:40] mdrun_gpu returned
[22:26:40] SHAKE violations on GPU
[22:26:40]
[22:26:40] Folding@home Core Shutdown: UNSTABLE_MACHINE
[22:26:44] CoreStatus = 7A (122)
[22:26:44] Sending work to server
[22:26:44] Project: 5744 (Run 0, Clone 34, Gen 351)
[22:26:44] - Read packet limit of 540015616... Set to 524286976.
[22:26:44] - Error: Could not get length of results file work/wuresults_09.dat
[22:26:44] - Error: Could not read unit 09 file. Removing from queue.
[22:26:44] EUE limit exceeded. Pausing 24 hours.
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Sat Sep 12, 2009 7:11 pm
by valleton
Thank you folding@home for giving me too this broken WU. I have to babysit this thing now more often and often.. Hope I don't have to suspect failing hardware since I'm not the only one.
Code: Select all
[17:28:11] Project: 5744 (Run 0, Clone 34, Gen 351)
[17:28:11]
[17:28:11] Assembly optimizations on if available.
[17:28:11] Entering M.D.
[17:28:17] Tpr hash work/wudata_06.tpr: 946666329 2926698113 3632645470 3862744873 578111165
[17:28:18] Working on Protein
[17:28:18] Client config found, loading data.
[17:28:18] Starting GUI Server
[17:28:19] mdrun_gpu returned
[17:28:19] SHAKE violations on GPU
[17:28:19]
[17:28:19] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:28:23] CoreStatus = 7A (122)
[17:28:23] Sending work to server
[17:28:23] Project: 5744 (Run 0, Clone 34, Gen 351)
[17:28:23] - Read packet limit of 540015616... Set to 524286976.
[17:28:23] - Error: Could not get length of results file work/wuresults_06.dat
[17:28:23] - Error: Could not read unit 06 file. Removing from queue.
[17:28:23] EUE limit exceeded. Pausing 24 hours.
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Sun Sep 13, 2009 2:24 am
by SidVicious
Not much we can do beside reporting those corrupted WU, unfortunately, this one was reported in two weeks ago, in the wrong forum : viewtopic.php?f=51&t=11230#p111715
toTOW commented twice but didn't take any action, the thread should have been relocated here at the very least.
I wonder how many days worth of folding were trashed because of that WU triggering the 24h sleep then being reissued to another unsuspecting victim...
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Thu Oct 01, 2009 12:01 am
by Zagen30
I got this one, too. It's a good thing I have remote access to some of my computers, because it would've been stuck for a while had I not checked.
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Thu Oct 01, 2009 4:29 pm
by jrweiss
SidVicious wrote:Not much we can do beside reporting those corrupted WU, unfortunately, this one was reported in two weeks ago, in the wrong forum : viewtopic.php?f=51&t=11230#p111715
toTOW commented twice but didn't take any action, the thread should have been relocated here at the very least.
Apparently even the report in the "right" forum didn't trigger a remedy...
Besides, once a "SuperModerator" responds without asking for a repost or move, I can only assume the info will be passed to the proper people.
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Mon Nov 02, 2009 11:15 pm
by bapriebe
jrweiss wrote:Besides, once a "SuperModerator" responds without asking for a repost or move, I can only assume the info will be passed to the proper people.
This broken WU is still alive and kicking over a month later. It just disabled one of my GPU clients for 24 hours
.
EDIT: Just got this thing back again November 3. Another GPU client shut down. Can someone at Stanford please take this WU out of circulation?
Re: Project: 5744 (Run 0, Clone 34, Gen 351)
Posted: Wed Nov 04, 2009 8:35 pm
by muziqaz
As I don't want to create another topic with the same WU, I thought i will post it here:
Code: Select all
[15:02:07] *------------------------------*
[15:02:07] Folding@Home GPU Core - Beta
[15:02:07] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:02:07]
[15:02:07] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:02:07] Build host: amoeba
[15:02:07] Board Type: AMD
[15:02:07] Core :
[15:02:07] Preparing to commence simulation
[15:02:07] - Looking at optimizations...
[15:02:07] - Created dyn
[15:02:07] - Files status OK
[15:02:07] - Expanded 68466 -> 357580 (decompressed 522.2 percent)
[15:02:07] Called DecompressByteArray: compressed_data_size=68466 data_size=357580, decompressed_data_size=357580 diff=0
[15:02:07] - Digital signature verified
[15:02:07]
[15:02:07] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:02:07]
[15:02:07] Assembly optimizations on if available.
[15:02:07] Entering M.D.
[15:02:13] Tpr hash work/wudata_00.tpr: 3036552476 1826283706 2930555241 3274968556 2291782333
[15:02:21] Working on Protein
[15:02:21] Client config found, loading data.
[15:02:21] Starting GUI Server
[15:02:25] mdrun_gpu returned
[15:02:25] SHAKE violations on GPU
[15:02:25]
[15:02:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:02:27] CoreStatus = 7A (122)
[15:02:27] Sending work to server
[15:02:27] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:02:27] - Read packet limit of 540015616... Set to 524286976.
[15:02:27] - Error: Could not get length of results file work/wuresults_00.dat
[15:02:27] - Error: Could not read unit 00 file. Removing from queue.
[15:02:27] Trying to send all finished work units
[15:02:27] + No unsent completed units remaining.
[15:02:27] - Preparing to get new work unit...
[15:02:27] + Attempting to get work packet
[15:02:27] - Will indicate memory of 8190 MB
[15:02:27] - Connecting to assignment server
[15:02:27] Connecting to http://assign-GPU.stanford.edu:8080/
[15:02:28] Posted data.
[15:02:28] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[15:02:28] + News From Folding@Home: Welcome to Folding@Home
[15:02:28] Loaded queue successfully.
[15:02:28] Connecting to http://171.64.65.102:8080/
[15:02:29] Posted data.
[15:02:29] Initial: 0000; - Receiving payload (expected size: 68978)
[15:02:30] - Downloaded at ~67 kB/s
[15:02:30] - Averaged speed for that direction ~61 kB/s
[15:02:30] + Received work.
[15:02:30] Trying to send all finished work units
[15:02:30] + No unsent completed units remaining.
[15:02:30] + Closed connections
[15:02:35]
[15:02:35] + Processing work unit
[15:02:35] Core required: FahCore_11.exe
[15:02:35] Core found.
[15:02:35] Working on queue slot 01 [November 4 15:02:35 UTC]
[15:02:35] + Working ...
[15:02:35] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 3528 -version 623'
[15:02:36]
[15:02:36] *------------------------------*
[15:02:36] Folding@Home GPU Core - Beta
[15:02:36] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:02:36]
[15:02:36] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:02:36] Build host: amoeba
[15:02:36] Board Type: AMD
[15:02:36] Core :
[15:02:36] Preparing to commence simulation
[15:02:36] - Looking at optimizations...
[15:02:36] - Created dyn
[15:02:36] - Files status OK
[15:02:36] - Expanded 68466 -> 357580 (decompressed 522.2 percent)
[15:02:36] Called DecompressByteArray: compressed_data_size=68466 data_size=357580, decompressed_data_size=357580 diff=0
[15:02:36] - Digital signature verified
[15:02:36]
[15:02:36] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:02:36]
[15:02:36] Assembly optimizations on if available.
[15:02:36] Entering M.D.
[15:02:42] Tpr hash work/wudata_01.tpr: 3036552476 1826283706 2930555241 3274968556 2291782333
[15:02:45] Working on Protein
[15:02:45] Client config found, loading data.
[15:02:46] Starting GUI Server
[15:02:49] mdrun_gpu returned
[15:02:49] SHAKE violations on GPU
[15:02:49]
[15:02:49] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:02:52] CoreStatus = 7A (122)
[15:02:52] Sending work to server
[15:02:52] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:02:52] - Read packet limit of 540015616... Set to 524286976.
[15:02:52] - Error: Could not get length of results file work/wuresults_01.dat
[15:02:52] - Error: Could not read unit 01 file. Removing from queue.
[15:02:52] Trying to send all finished work units
[15:02:52] + No unsent completed units remaining.
[15:02:52] - Preparing to get new work unit...
[15:02:52] + Attempting to get work packet
[15:02:52] - Will indicate memory of 8190 MB
[15:02:52] - Connecting to assignment server
[15:02:52] Connecting to http://assign-GPU.stanford.edu:8080/
[15:02:53] Posted data.
[15:02:53] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[15:02:53] + News From Folding@Home: Welcome to Folding@Home
[15:02:53] Loaded queue successfully.
[15:02:53] Connecting to http://171.64.65.102:8080/
[15:02:53] Posted data.
[15:02:53] Initial: 0000; - Receiving payload (expected size: 68978)
[15:02:55] - Downloaded at ~33 kB/s
[15:02:55] - Averaged speed for that direction ~56 kB/s
[15:02:55] + Received work.
[15:02:55] Trying to send all finished work units
[15:02:55] + No unsent completed units remaining.
[15:02:55] + Closed connections
[15:03:00]
[15:03:00] + Processing work unit
[15:03:00] Core required: FahCore_11.exe
[15:03:00] Core found.
[15:03:00] Working on queue slot 02 [November 4 15:03:00 UTC]
[15:03:00] + Working ...
[15:03:00] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 3528 -version 623'
[15:03:00]
[15:03:00] *------------------------------*
[15:03:00] Folding@Home GPU Core - Beta
[15:03:00] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:03:00]
[15:03:00] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:03:00] Build host: amoeba
[15:03:00] Board Type: AMD
[15:03:00] Core :
[15:03:00] Preparing to commence simulation
[15:03:00] - Looking at optimizations...
[15:03:00] - Created dyn
[15:03:00] - Files status OK
[15:03:00] - Expanded 68466 -> 357580 (decompressed 522.2 percent)
[15:03:00] Called DecompressByteArray: compressed_data_size=68466 data_size=357580, decompressed_data_size=357580 diff=0
[15:03:00] - Digital signature verified
[15:03:00]
[15:03:00] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:03:00]
[15:03:00] Assembly optimizations on if available.
[15:03:00] Entering M.D.
[15:03:06] Tpr hash work/wudata_02.tpr: 3036552476 1826283706 2930555241 3274968556 2291782333
[15:03:07] Working on Protein
[15:03:07] Client config found, loading data.
[15:03:07] Starting GUI Server
[15:03:10] mdrun_gpu returned
[15:03:10] SHAKE violations on GPU
[15:03:10]
[15:03:10] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:03:12] CoreStatus = 7A (122)
[15:03:12] Sending work to server
[15:03:12] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:03:12] - Read packet limit of 540015616... Set to 524286976.
[15:03:12] - Error: Could not get length of results file work/wuresults_02.dat
[15:03:12] - Error: Could not read unit 02 file. Removing from queue.
[15:03:12] Trying to send all finished work units
[15:03:12] + No unsent completed units remaining.
[15:03:12] - Preparing to get new work unit...
[15:03:12] + Attempting to get work packet
[15:03:12] - Will indicate memory of 8190 MB
[15:03:12] - Connecting to assignment server
[15:03:12] Connecting to http://assign-GPU.stanford.edu:8080/
[15:03:13] Posted data.
[15:03:13] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[15:03:13] + News From Folding@Home: Welcome to Folding@Home
[15:03:13] Loaded queue successfully.
[15:03:13] Connecting to http://171.64.65.102:8080/
[15:03:14] Posted data.
[15:03:14] Initial: 0000; - Receiving payload (expected size: 68978)
[15:03:15] - Downloaded at ~67 kB/s
[15:03:15] - Averaged speed for that direction ~58 kB/s
[15:03:15] + Received work.
[15:03:15] Trying to send all finished work units
[15:03:15] + No unsent completed units remaining.
[15:03:15] + Closed connections
[15:03:20]
[15:03:20] + Processing work unit
[15:03:20] Core required: FahCore_11.exe
[15:03:20] Core found.
[15:03:20] Working on queue slot 03 [November 4 15:03:20 UTC]
[15:03:20] + Working ...
[15:03:20] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 3528 -version 623'
[15:03:20]
[15:03:20] *------------------------------*
[15:03:20] Folding@Home GPU Core - Beta
[15:03:20] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:03:20]
[15:03:20] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:03:20] Build host: amoeba
[15:03:20] Board Type: AMD
[15:03:20] Core :
[15:03:20] Preparing to commence simulation
[15:03:20] - Looking at optimizations...
[15:03:20] - Created dyn
[15:03:20] - Files status OK
[15:03:20] - Expanded 68466 -> 357580 (decompressed 522.2 percent)
[15:03:20] Called DecompressByteArray: compressed_data_size=68466 data_size=357580, decompressed_data_size=357580 diff=0
[15:03:20] - Digital signature verified
[15:03:20]
[15:03:20] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:03:20]
[15:03:20] Assembly optimizations on if available.
[15:03:20] Entering M.D.
[15:03:26] Tpr hash work/wudata_03.tpr: 3036552476 1826283706 2930555241 3274968556 2291782333
[15:03:30] Working on Protein
[15:03:30] Client config found, loading data.
[15:03:30] Starting GUI Server
[15:03:34] mdrun_gpu returned
[15:03:34] SHAKE violations on GPU
[15:03:34]
[15:03:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:03:37] CoreStatus = 7A (122)
[15:03:37] Sending work to server
[15:03:37] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:03:37] - Read packet limit of 540015616... Set to 524286976.
[15:03:37] - Error: Could not get length of results file work/wuresults_03.dat
[15:03:37] - Error: Could not read unit 03 file. Removing from queue.
[15:03:37] Trying to send all finished work units
[15:03:37] + No unsent completed units remaining.
[15:03:37] - Preparing to get new work unit...
[15:03:37] + Attempting to get work packet
[15:03:37] - Will indicate memory of 8190 MB
[15:03:37] - Connecting to assignment server
[15:03:37] Connecting to http://assign-GPU.stanford.edu:8080/
[15:03:37] Posted data.
[15:03:37] Initial: 40AB; - Successful: assigned to (171.64.65.102).
[15:03:37] + News From Folding@Home: Welcome to Folding@Home
[15:03:37] Loaded queue successfully.
[15:03:37] Connecting to http://171.64.65.102:8080/
[15:03:38] Posted data.
[15:03:38] Initial: 0000; - Receiving payload (expected size: 68978)
[15:03:39] - Downloaded at ~67 kB/s
[15:03:39] - Averaged speed for that direction ~60 kB/s
[15:03:39] + Received work.
[15:03:39] Trying to send all finished work units
[15:03:39] + No unsent completed units remaining.
[15:03:39] + Closed connections
[15:03:44]
[15:03:44] + Processing work unit
[15:03:44] Core required: FahCore_11.exe
[15:03:44] Core found.
[15:03:44] Working on queue slot 04 [November 4 15:03:44 UTC]
[15:03:44] + Working ...
[15:03:44] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -priority 96 -nocpulock -checkpoint 3 -verbose -lifeline 3528 -version 623'
[15:03:44]
[15:03:44] *------------------------------*
[15:03:44] Folding@Home GPU Core - Beta
[15:03:44] Version 1.24 (Mon Feb 9 11:00:12 PST 2009)
[15:03:44]
[15:03:44] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[15:03:44] Build host: amoeba
[15:03:44] Board Type: AMD
[15:03:44] Core :
[15:03:44] Preparing to commence simulation
[15:03:44] - Looking at optimizations...
[15:03:44] - Created dyn
[15:03:44] - Files status OK
[15:03:44] - Expanded 68466 -> 357580 (decompressed 522.2 percent)
[15:03:44] Called DecompressByteArray: compressed_data_size=68466 data_size=357580, decompressed_data_size=357580 diff=0
[15:03:44] - Digital signature verified
[15:03:44]
[15:03:44] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:03:44]
[15:03:44] Assembly optimizations on if available.
[15:03:44] Entering M.D.
[15:03:50] Tpr hash work/wudata_04.tpr: 3036552476 1826283706 2930555241 3274968556 2291782333
[15:03:55] Working on Protein
[15:03:55] Client config found, loading data.
[15:03:55] Starting GUI Server
[15:03:59] mdrun_gpu returned
[15:03:59] SHAKE violations on GPU
[15:03:59]
[15:03:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[15:04:03] CoreStatus = 7A (122)
[15:04:03] Sending work to server
[15:04:03] Project: 5744 (Run 1, Clone 6, Gen 574)
[15:04:03] - Read packet limit of 540015616... Set to 524286976.
[15:04:03] - Error: Could not get length of results file work/wuresults_04.dat
[15:04:03] - Error: Could not read unit 04 file. Removing from queue.
[15:04:03] EUE limit exceeded. Pausing 24 hours.
[15:59:54] - Autosending finished units... [November 4 15:59:54 UTC]
[15:59:54] Trying to send all finished work units
[15:59:54] + No unsent completed units remaining.
[15:59:54] - Autosend completed
[15:59:54] + Working...
[20:27:50] ***** Got a SIGTERM signal (2)
[20:27:50] Killing all core threads
machine is STABLE, though I am folding with 5870.
come on bring gpu3 out already