Page 1 of 1

Project: 5764 (Run 14, Clone 29, Gen 88) - UM at various %

Posted: Fri Jan 09, 2009 9:21 pm
by toTOW
Here's another one :

Code: Select all

[17:21:49] Project: 5764 (Run 14, Clone 29, Gen 88)
[17:21:49] 
[17:21:49] Assembly optimizations on if available.
[17:21:49] Entering M.D.
[17:21:56] Working on Protein
[17:21:58] Client config found, loading data.
[17:21:58] Starting GUI Server
[17:22:58] Completed 1%
[...]
[18:34:35] Completed 73%
[18:34:35] mdrun_gpu returned 
[18:34:35] NANs detected on GPU
[18:34:35] 
[18:34:35] Folding@home Core Shutdown: UNSTABLE_MACHINE
[18:34:38] CoreStatus = 7A (122)
[18:34:38] Sending work to server
[18:34:38] Project: 5764 (Run 14, Clone 29, Gen 88)
[18:34:38] - Read packet limit of 540015616... Set to 524286976.
[18:34:38] - Error: Could not get length of results file work/wuresults_03.dat
[18:34:38] - Error: Could not read unit 03 file. Removing from queue.
[18:34:38] Trying to send all finished work units
[18:34:38] + No unsent completed units remaining.
[18:34:38] - Preparing to get new work unit...
[18:34:38] + Attempting to get work packet
[18:34:38] - Will indicate memory of 1022 MB
[18:34:38] - Connecting to assignment server
[18:34:38] Connecting to http://assign-GPU.stanford.edu:8080/
[18:34:39] Posted data.
[18:34:39] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[18:34:39] + News From Folding@Home: GPU folding beta
[18:34:39] Loaded queue successfully.
[18:34:39] Connecting to http://171.64.65.106:8080/
[18:34:40] Posted data.
[18:34:40] Initial: 0000; - Receiving payload (expected size: 70632)
[18:34:41] - Downloaded at ~68 kB/s
[18:34:41] - Averaged speed for that direction ~51 kB/s
[18:34:41] + Received work.
[18:34:41] Trying to send all finished work units
[18:34:41] + No unsent completed units remaining.
[18:34:41] + Closed connections
[18:34:46] 
[18:34:46] + Processing work unit
[18:34:46] Core required: FahCore_11.exe
[18:34:46] Core found.
[18:34:46] Working on queue slot 04 [January 9 18:34:46 UTC]
[18:34:46] + Working ...
[18:34:46] - Calling '.\FahCore_11.exe -dir work/ -suffix 04 -priority 96 -checkpoint 15 -verbose -lifeline 1808 -version 623'

[18:34:46] 
[18:34:46] *------------------------------*
[18:34:46] Folding@Home GPU Core - Beta
[18:34:46] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[18:34:46] 
[18:34:46] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[18:34:46] Build host: amoeba
[18:34:46] Board Type: Nvidia
[18:34:46] Core      : 
[18:34:46] Preparing to commence simulation
[18:34:46] - Looking at optimizations...
[18:34:46] - Created dyn
[18:34:46] - Files status OK
[18:34:46] - Expanded 70120 -> 360060 (decompressed 513.4 percent)
[18:34:46] Called DecompressByteArray: compressed_data_size=70120 data_size=360060, decompressed_data_size=360060 diff=0
[18:34:46] - Digital signature verified
[18:34:46] 
[18:34:46] Project: 5764 (Run 14, Clone 29, Gen 88)
[18:34:46] 
[18:34:46] Assembly optimizations on if available.
[18:34:46] Entering M.D.
[18:34:53] Working on Protein
[18:34:55] Client config found, loading data.
[18:34:55] Starting GUI Server
[18:35:55] Completed 1%
[...]
[19:38:27] Completed 64%
[19:38:27] mdrun_gpu returned 
[19:38:27] NANs detected on GPU
[19:38:27] 
[19:38:27] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:38:31] CoreStatus = 7A (122)
[19:38:31] Sending work to server
[19:38:31] Project: 5764 (Run 14, Clone 29, Gen 88)
[19:38:31] - Read packet limit of 540015616... Set to 524286976.
[19:38:31] - Error: Could not get length of results file work/wuresults_04.dat
[19:38:31] - Error: Could not read unit 04 file. Removing from queue.
[19:38:31] Trying to send all finished work units
[19:38:31] + No unsent completed units remaining.
[19:38:31] - Preparing to get new work unit...
[19:38:31] + Attempting to get work packet
[19:38:31] - Will indicate memory of 1022 MB
[19:38:31] - Connecting to assignment server
[19:38:31] Connecting to http://assign-GPU.stanford.edu:8080/
[19:38:33] Posted data.
[19:38:33] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[19:38:33] + News From Folding@Home: GPU folding beta
[19:38:33] Loaded queue successfully.
[19:38:33] Connecting to http://171.64.65.106:8080/
[19:38:34] Posted data.
[19:38:34] Initial: 0000; - Receiving payload (expected size: 70632)
[19:38:35] - Downloaded at ~68 kB/s
[19:38:35] - Averaged speed for that direction ~54 kB/s
[19:38:35] + Received work.
[19:38:35] Trying to send all finished work units
[19:38:35] + No unsent completed units remaining.
[19:38:35] + Closed connections
[19:38:40] 
[19:38:40] + Processing work unit
[19:38:40] Core required: FahCore_11.exe
[19:38:40] Core found.
[19:38:40] Working on queue slot 05 [January 9 19:38:40 UTC]
[19:38:40] + Working ...
[19:38:40] - Calling '.\FahCore_11.exe -dir work/ -suffix 05 -priority 96 -checkpoint 15 -verbose -lifeline 1808 -version 623'

[19:38:40] 
[19:38:40] *------------------------------*
[19:38:40] Folding@Home GPU Core - Beta
[19:38:40] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:38:40] 
[19:38:40] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[19:38:40] Build host: amoeba
[19:38:40] Board Type: Nvidia
[19:38:40] Core      : 
[19:38:40] Preparing to commence simulation
[19:38:40] - Looking at optimizations...
[19:38:40] - Created dyn
[19:38:40] - Files status OK
[19:38:40] - Expanded 70120 -> 360060 (decompressed 513.4 percent)
[19:38:40] Called DecompressByteArray: compressed_data_size=70120 data_size=360060, decompressed_data_size=360060 diff=0
[19:38:40] - Digital signature verified
[19:38:40] 
[19:38:40] Project: 5764 (Run 14, Clone 29, Gen 88)
[19:38:40] 
[19:38:40] Assembly optimizations on if available.
[19:38:40] Entering M.D.
[19:38:47] Working on Protein
[19:38:49] Client config found, loading data.
[19:38:49] Starting GUI Server
[19:39:49] Completed 1%
[...]
[19:54:30] Completed 16%
[19:54:30] mdrun_gpu returned 
[19:54:30] NANs detected on GPU
[19:54:30] 
[19:54:30] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:54:35] CoreStatus = 7A (122)
[19:54:35] Sending work to server
[19:54:35] Project: 5764 (Run 14, Clone 29, Gen 88)
[19:54:35] - Read packet limit of 540015616... Set to 524286976.
[19:54:35] - Error: Could not get length of results file work/wuresults_05.dat
[19:54:35] - Error: Could not read unit 05 file. Removing from queue.
[19:54:35] Trying to send all finished work units
[19:54:35] + No unsent completed units remaining.
[19:54:35] - Preparing to get new work unit...
[19:54:35] + Attempting to get work packet
[19:54:35] - Will indicate memory of 1022 MB
[19:54:35] - Connecting to assignment server
[19:54:35] Connecting to http://assign-GPU.stanford.edu:8080/
[19:54:36] Posted data.
[19:54:36] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[19:54:36] + News From Folding@Home: GPU folding beta
[19:54:36] Loaded queue successfully.
[19:54:36] Connecting to http://171.64.65.106:8080/
[19:54:37] Posted data.
[19:54:37] Initial: 0000; - Receiving payload (expected size: 70632)
[19:54:39] - Downloaded at ~34 kB/s
[19:54:39] - Averaged speed for that direction ~50 kB/s
[19:54:39] + Received work.
[19:54:39] Trying to send all finished work units
[19:54:39] + No unsent completed units remaining.
[19:54:39] + Closed connections
[19:54:44] 
[19:54:44] + Processing work unit
[19:54:44] Core required: FahCore_11.exe
[19:54:44] Core found.
[19:54:44] Working on queue slot 06 [January 9 19:54:44 UTC]
[19:54:44] + Working ...
[19:54:44] - Calling '.\FahCore_11.exe -dir work/ -suffix 06 -priority 96 -checkpoint 15 -verbose -lifeline 1808 -version 623'

[19:54:44] 
[19:54:44] *------------------------------*
[19:54:44] Folding@Home GPU Core - Beta
[19:54:44] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:54:44] 
[19:54:44] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
[19:54:44] Build host: amoeba
[19:54:44] Board Type: Nvidia
[19:54:44] Core      : 
[19:54:44] Preparing to commence simulation
[19:54:44] - Looking at optimizations...
[19:54:44] - Created dyn
[19:54:44] - Files status OK
[19:54:44] - Expanded 70120 -> 360060 (decompressed 513.4 percent)
[19:54:44] Called DecompressByteArray: compressed_data_size=70120 data_size=360060, decompressed_data_size=360060 diff=0
[19:54:44] - Digital signature verified
[19:54:44] 
[19:54:44] Project: 5764 (Run 14, Clone 29, Gen 88)
[19:54:44] 
[19:54:44] Assembly optimizations on if available.
[19:54:44] Entering M.D.
[19:54:50] Working on Protein
[19:54:52] Client config found, loading data.
[19:54:52] Starting GUI Server
[19:55:52] Completed 1%
[...]
[21:11:24] Completed 77%
[21:11:24] mdrun_gpu returned 
[21:11:24] NANs detected on GPU
[21:11:24] 
[21:11:24] Folding@home Core Shutdown: UNSTABLE_MACHINE
[21:11:28] CoreStatus = 7A (122)
[21:11:28] Sending work to server
[21:11:28] Project: 5764 (Run 14, Clone 29, Gen 88)
[21:11:28] - Read packet limit of 540015616... Set to 524286976.
[21:11:28] - Error: Could not get length of results file work/wuresults_06.dat
[21:11:28] - Error: Could not read unit 06 file. Removing from queue.

There's no data in the DB, and it fails at various % ... I have a bad feeling about my GPU :?

Re: Project: 5764 (Run 14, Clone 29, Gen 88) - UM at various %

Posted: Sat Jan 10, 2009 12:39 am
by toTOW
I've finally been able to complete it on next try ... weird.

Re: Project: 5764 (Run 14, Clone 29, Gen 88) - UM at various %

Posted: Sat Jan 10, 2009 12:56 am
by P5-133XL
I am finding for both my ATI and my Nvidia cards that a periodic reboot seems to be necessary. Otherwise they will start having problems it continues till the reboot. I also find that they seem to act up faster, the more they are used for other purposes like non-3d gaming like solitare. They really prefer being dedicated folders with a weekly reboot.

I would pin down the problem, but so far I haven't seen a better pattern.

Re: Project: 5764 (Run 14, Clone 29, Gen 88) - UM at various %

Posted: Sat Jan 10, 2009 1:03 am
by toTOW
I did a complete shutdown of the machine with power off from the PSU rear button, but it didn't help. (the report in this thread is after the shutdown, and the one in the other thread was before)