I'm also seeing all the 5801's EUE before they even get to 1%.
The same Project: 5801 (Run 7, Clone 176, Gen 0) WU is assigned after every EUE, so the client is stuck executing with this same exact (R,C,G) bad WU 5 times until it reaches the "EUE limit exceeded. Pausing 24 hours." taking it offline for 24 hours.
I understand that there's a need to keep clients from trashing too many WU's, but there's a similar need to keep a bad WU from trashing too many clients! Please revise the algorithm next time you work on the client or server software responsible for this.
Folding@Home Client Version 6.20
[17:28:02] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:28:02]
[17:28:02] Assembly optimizations on if available.
[17:28:02] Entering M.D.
[17:28:08] Working on p5801_supervillin_e1
[17:28:09] Client config found, loading data.
[17:28:09] Starting GUI Server
[17:28:25] mdrun_gpu returned
[17:28:25] NANs detected on GPU
[17:28:25]
[17:28:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:28:28] CoreStatus = 7A (122)
[17:28:28] Sending work to server
As others have pointed out, this is BAD WU and not UNSTABLE_MACHINE. 5 of these units in a row all with the exact same "NANs detected on GPU" error about 16 seconds after "Starting GUI Server".
Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.20
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: C:\Users\GatoPrimo\AppData\Roaming\Folding@home-gpu2
Arguments: -gpu 1 -verbosity 9
[17:27:04] - Ask before connecting: No
[17:27:04] - User name: [EV]Aptera (Team 104636)
[17:27:04] - User ID: 3AB3EB432D76466E
[17:27:04] - Machine ID: 4
[17:27:04]
[17:27:04] Loaded queue successfully.
[17:27:04] Initialization complete
[17:27:04] - Preparing to get new work unit...
[17:27:04] + Attempting to get work packet
[17:27:04] - Autosending finished units... [17:27:04]
[17:27:04] Trying to send all finished work units
[17:27:04] + No unsent completed units remaining.
[17:27:04] - Autosend completed
[17:27:04] - Will indicate memory of 3070 MB
[17:27:04] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[17:27:04] - Connecting to assignment server
[17:27:04] Connecting to http://assign-GPU.stanford.edu:8080/
[17:27:04] Posted data.
[17:27:04] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:27:04] + News From Folding@Home: GPU folding beta
[17:27:05] Loaded queue successfully.
[17:27:05] Connecting to http://171.67.108.11:8080/
[17:27:08] Posted data.
[17:27:08] Initial: 0000; + Could not connect to Work Server
[17:27:08] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[17:27:18] + Attempting to get work packet
[17:27:18] - Will indicate memory of 3070 MB
[17:27:18] - Connecting to assignment server
[17:27:18] Connecting to http://assign-GPU.stanford.edu:8080/
[17:27:18] Posted data.
[17:27:18] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:27:18] + News From Folding@Home: GPU folding beta
[17:27:18] Loaded queue successfully.
[17:27:18] Connecting to http://171.67.108.11:8080/
[17:27:18] Posted data.
[17:27:18] Initial: 0000; + Could not connect to Work Server
[17:27:18] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
[17:27:34] + Attempting to get work packet
[17:27:34] - Will indicate memory of 3070 MB
[17:27:34] - Connecting to assignment server
[17:27:34] Connecting to http://assign-GPU.stanford.edu:8080/
[17:27:35] Posted data.
[17:27:35] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:27:35] + News From Folding@Home: GPU folding beta
[17:27:35] Loaded queue successfully.
[17:27:35] Connecting to http://171.67.108.11:8080/
[17:27:35] Posted data.
[17:27:35] Initial: 0000; + Could not connect to Work Server
[17:27:35] - Attempt #3 to get work failed, and no other work to do.
Waiting before retry.
[17:28:01] + Attempting to get work packet
[17:28:01] - Will indicate memory of 3070 MB
[17:28:01] - Connecting to assignment server
[17:28:01] Connecting to http://assign-GPU.stanford.edu:8080/
[17:28:01] Posted data.
[17:28:01] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:28:01] + News From Folding@Home: GPU folding beta
[17:28:01] Loaded queue successfully.
[17:28:01] Connecting to http://171.67.108.11:8080/
[17:28:01] Posted data.
[17:28:01] Initial: 0000; - Receiving payload (expected size: 43404)
[17:28:02] - Downloaded at ~42 kB/s
[17:28:02] - Averaged speed for that direction ~66 kB/s
[17:28:02] + Received work.
[17:28:02] + Closed connections
[17:28:02]
[17:28:02] + Processing work unit
[17:28:02] Core required: FahCore_11.exe
[17:28:02] Core found.
[17:28:02] Working on queue slot 08 [October 28 17:28:02 UTC]
[17:28:02] + Working ...
[17:28:02] - Calling '.\FahCore_11.exe -dir work/ -suffix 08 -priority 96 -checkpoint 30 -verbose -lifeline 4028 -version 620'
[17:28:02]
[17:28:02] *------------------------------*
[17:28:02] Folding@Home GPU Core - Beta
[17:28:02] Version 1.15 (Mon Oct 13 11:11:30 PDT 2008)
[17:28:02]
[17:28:02] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:28:02] Build host: amoeba
[17:28:02] Board Type: Nvidia
[17:28:02] Core :
[17:28:02] Preparing to commence simulation
[17:28:02] - Looking at optimizations...
[17:28:02] - Created dyn
[17:28:02] - Files status OK
[17:28:02] - Expanded 42892 -> 246265 (decompressed 574.1 percent)
[17:28:02] Called DecompressByteArray: compressed_data_size=42892 data_size=246265, decompressed_data_size=246265 diff=0
[17:28:02] - Digital signature verified
[17:28:02]
[17:28:02] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:28:02]
[17:28:02] Assembly optimizations on if available.
[17:28:02] Entering M.D.
[17:28:08] Working on p5801_supervillin_e1
[17:28:09] Client config found, loading data.
[17:28:09] Starting GUI Server
[17:28:25] mdrun_gpu returned
[17:28:25] NANs detected on GPU
[17:28:25]
[17:28:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:28:28] CoreStatus = 7A (122)
[17:28:28] Sending work to server
[17:28:28] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:28:28] - Read packet limit of 540015616... Set to 524286976.
[17:28:28] - Error: Could not get length of results file work/wuresults_08.dat
[17:28:28] - Error: Could not read unit 08 file. Removing from queue.
[17:28:28] Trying to send all finished work units
[17:28:28] + No unsent completed units remaining.
[17:28:28] - Preparing to get new work unit...
[17:28:28] + Attempting to get work packet
[17:28:28] - Will indicate memory of 3070 MB
[17:28:28] - Connecting to assignment server
[17:28:28] Connecting to http://assign-GPU.stanford.edu:8080/
[17:28:28] Posted data.
[17:28:28] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:28:28] + News From Folding@Home: GPU folding beta
[17:28:28] Loaded queue successfully.
[17:28:28] Connecting to http://171.67.108.11:8080/
[17:28:28] Posted data.
[17:28:28] Initial: 0000; - Receiving payload (expected size: 43404)
[17:28:29] - Downloaded at ~42 kB/s
[17:28:29] - Averaged speed for that direction ~61 kB/s
[17:28:29] + Received work.
[17:28:29] Trying to send all finished work units
[17:28:29] + No unsent completed units remaining.
[17:28:29] + Closed connections
[17:28:34]
[17:28:34] + Processing work unit
[17:28:34] Core required: FahCore_11.exe
[17:28:34] Core found.
[17:28:34] Working on queue slot 09 [October 28 17:28:34 UTC]
[17:28:34] + Working ...
[17:28:34] - Calling '.\FahCore_11.exe -dir work/ -suffix 09 -priority 96 -checkpoint 30 -verbose -lifeline 4028 -version 620'
[17:28:34]
[17:28:34] *------------------------------*
[17:28:34] Folding@Home GPU Core - Beta
[17:28:34] Version 1.15 (Mon Oct 13 11:11:30 PDT 2008)
[17:28:34]
[17:28:34] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:28:34] Build host: amoeba
[17:28:34] Board Type: Nvidia
[17:28:34] Core :
[17:28:34] Preparing to commence simulation
[17:28:34] - Looking at optimizations...
[17:28:34] - Created dyn
[17:28:34] - Files status OK
[17:28:34] - Expanded 42892 -> 246265 (decompressed 574.1 percent)
[17:28:34] Called DecompressByteArray: compressed_data_size=42892 data_size=246265, decompressed_data_size=246265 diff=0
[17:28:34] - Digital signature verified
[17:28:34]
[17:28:34] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:28:34]
[17:28:34] Assembly optimizations on if available.
[17:28:34] Entering M.D.
[17:28:40] Working on p5801_supervillin_e1
[17:28:41] Client config found, loading data.
[17:28:41] Starting GUI Server
[17:28:57] mdrun_gpu returned
[17:28:57] NANs detected on GPU
[17:28:57]
[17:28:57] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:29:00] CoreStatus = 7A (122)
[17:29:00] Sending work to server
[17:29:00] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:29:00] - Read packet limit of 540015616... Set to 524286976.
[17:29:00] - Error: Could not get length of results file work/wuresults_09.dat
[17:29:00] - Error: Could not read unit 09 file. Removing from queue.
[17:29:00] Trying to send all finished work units
[17:29:00] + No unsent completed units remaining.
[17:29:00] - Preparing to get new work unit...
[17:29:00] + Attempting to get work packet
[17:29:00] - Will indicate memory of 3070 MB
[17:29:00] - Connecting to assignment server
[17:29:00] Connecting to http://assign-GPU.stanford.edu:8080/
[17:29:00] Posted data.
[17:29:00] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:29:00] + News From Folding@Home: GPU folding beta
[17:29:00] Loaded queue successfully.
[17:29:00] Connecting to http://171.67.108.11:8080/
[17:29:01] Posted data.
[17:29:01] Initial: 0000; - Receiving payload (expected size: 43404)
[17:29:01] Conversation time very short, giving reduced weight in bandwidth avg
[17:29:01] - Downloaded at ~84 kB/s
[17:29:01] - Averaged speed for that direction ~64 kB/s
[17:29:01] + Received work.
[17:29:01] Trying to send all finished work units
[17:29:01] + No unsent completed units remaining.
[17:29:01] + Closed connections
[17:29:06]
[17:29:06] + Processing work unit
[17:29:06] Core required: FahCore_11.exe
[17:29:06] Core found.
[17:29:06] Working on queue slot 00 [October 28 17:29:06 UTC]
[17:29:06] + Working ...
[17:29:06] - Calling '.\FahCore_11.exe -dir work/ -suffix 00 -priority 96 -checkpoint 30 -verbose -lifeline 4028 -version 620'
[17:29:06]
[17:29:06] *------------------------------*
[17:29:06] Folding@Home GPU Core - Beta
[17:29:06] Version 1.15 (Mon Oct 13 11:11:30 PDT 2008)
[17:29:06]
[17:29:06] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:29:06] Build host: amoeba
[17:29:06] Board Type: Nvidia
[17:29:06] Core :
[17:29:06] Preparing to commence simulation
[17:29:06] - Looking at optimizations...
[17:29:06] - Created dyn
[17:29:06] - Files status OK
[17:29:06] - Expanded 42892 -> 246265 (decompressed 574.1 percent)
[17:29:06] Called DecompressByteArray: compressed_data_size=42892 data_size=246265, decompressed_data_size=246265 diff=0
[17:29:06] - Digital signature verified
[17:29:06]
[17:29:06] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:29:06]
[17:29:06] Assembly optimizations on if available.
[17:29:06] Entering M.D.
[17:29:12] Working on p5801_supervillin_e1
[17:29:13] Client config found, loading data.
[17:29:13] Starting GUI Server
[17:29:30] mdrun_gpu returned
[17:29:30] NANs detected on GPU
[17:29:30]
[17:29:30] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:29:32] CoreStatus = 7A (122)
[17:29:32] Sending work to server
[17:29:32] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:29:32] - Read packet limit of 540015616... Set to 524286976.
[17:29:32] - Error: Could not get length of results file work/wuresults_00.dat
[17:29:32] - Error: Could not read unit 00 file. Removing from queue.
[17:29:32] Trying to send all finished work units
[17:29:32] + No unsent completed units remaining.
[17:29:32] - Preparing to get new work unit...
[17:29:32] + Attempting to get work packet
[17:29:32] - Will indicate memory of 3070 MB
[17:29:32] - Connecting to assignment server
[17:29:32] Connecting to http://assign-GPU.stanford.edu:8080/
[17:29:33] Posted data.
[17:29:33] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:29:33] + News From Folding@Home: GPU folding beta
[17:29:33] Loaded queue successfully.
[17:29:33] Connecting to http://171.67.108.11:8080/
[17:29:33] Posted data.
[17:29:33] Initial: 0000; - Receiving payload (expected size: 43404)
[17:29:33] Conversation time very short, giving reduced weight in bandwidth avg
[17:29:33] - Downloaded at ~84 kB/s
[17:29:33] - Averaged speed for that direction ~66 kB/s
[17:29:33] + Received work.
[17:29:33] Trying to send all finished work units
[17:29:33] + No unsent completed units remaining.
[17:29:33] + Closed connections
[17:29:38]
[17:29:38] + Processing work unit
[17:29:38] Core required: FahCore_11.exe
[17:29:38] Core found.
[17:29:38] Working on queue slot 01 [October 28 17:29:38 UTC]
[17:29:38] + Working ...
[17:29:38] - Calling '.\FahCore_11.exe -dir work/ -suffix 01 -priority 96 -checkpoint 30 -verbose -lifeline 4028 -version 620'
[17:29:39]
[17:29:39] *------------------------------*
[17:29:39] Folding@Home GPU Core - Beta
[17:29:39] Version 1.15 (Mon Oct 13 11:11:30 PDT 2008)
[17:29:39]
[17:29:39] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:29:39] Build host: amoeba
[17:29:39] Board Type: Nvidia
[17:29:39] Core :
[17:29:39] Preparing to commence simulation
[17:29:39] - Looking at optimizations...
[17:29:39] - Created dyn
[17:29:39] - Files status OK
[17:29:39] - Expanded 42892 -> 246265 (decompressed 574.1 percent)
[17:29:39] Called DecompressByteArray: compressed_data_size=42892 data_size=246265, decompressed_data_size=246265 diff=0
[17:29:39] - Digital signature verified
[17:29:39]
[17:29:39] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:29:39]
[17:29:39] Assembly optimizations on if available.
[17:29:39] Entering M.D.
[17:29:45] Working on p5801_supervillin_e1
[17:29:46] Client config found, loading data.
[17:29:46] Starting GUI Server
[17:30:02] mdrun_gpu returned
[17:30:02] NANs detected on GPU
[17:30:02]
[17:30:02] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:30:05] CoreStatus = 7A (122)
[17:30:05] Sending work to server
[17:30:05] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:30:05] - Read packet limit of 540015616... Set to 524286976.
[17:30:05] - Error: Could not get length of results file work/wuresults_01.dat
[17:30:05] - Error: Could not read unit 01 file. Removing from queue.
[17:30:05] Trying to send all finished work units
[17:30:05] + No unsent completed units remaining.
[17:30:05] - Preparing to get new work unit...
[17:30:05] + Attempting to get work packet
[17:30:05] - Will indicate memory of 3070 MB
[17:30:05] - Connecting to assignment server
[17:30:05] Connecting to http://assign-GPU.stanford.edu:8080/
[17:30:05] Posted data.
[17:30:05] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[17:30:05] + News From Folding@Home: GPU folding beta
[17:30:05] Loaded queue successfully.
[17:30:05] Connecting to http://171.67.108.11:8080/
[17:30:05] Posted data.
[17:30:05] Initial: 0000; - Receiving payload (expected size: 43404)
[17:30:06] - Downloaded at ~42 kB/s
[17:30:06] - Averaged speed for that direction ~61 kB/s
[17:30:06] + Received work.
[17:30:06] Trying to send all finished work units
[17:30:06] + No unsent completed units remaining.
[17:30:06] + Closed connections
[17:30:11]
[17:30:11] + Processing work unit
[17:30:11] Core required: FahCore_11.exe
[17:30:11] Core found.
[17:30:11] Working on queue slot 02 [October 28 17:30:11 UTC]
[17:30:11] + Working ...
[17:30:11] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -checkpoint 30 -verbose -lifeline 4028 -version 620'
[17:30:11]
[17:30:11] *------------------------------*
[17:30:11] Folding@Home GPU Core - Beta
[17:30:11] Version 1.15 (Mon Oct 13 11:11:30 PDT 2008)
[17:30:11]
[17:30:11] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:30:11] Build host: amoeba
[17:30:11] Board Type: Nvidia
[17:30:11] Core :
[17:30:11] Preparing to commence simulation
[17:30:11] - Looking at optimizations...
[17:30:11] - Created dyn
[17:30:11] - Files status OK
[17:30:11] - Expanded 42892 -> 246265 (decompressed 574.1 percent)
[17:30:11] Called DecompressByteArray: compressed_data_size=42892 data_size=246265, decompressed_data_size=246265 diff=0
[17:30:11] - Digital signature verified
[17:30:11]
[17:30:11] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:30:11]
[17:30:11] Assembly optimizations on if available.
[17:30:11] Entering M.D.
[17:30:17] Working on p5801_supervillin_e1
[17:30:18] Client config found, loading data.
[17:30:18] Starting GUI Server
[17:30:34] mdrun_gpu returned
[17:30:34] NANs detected on GPU
[17:30:34]
[17:30:34] Folding@home Core Shutdown: UNSTABLE_MACHINE
[17:30:37] CoreStatus = 7A (122)
[17:30:37] Sending work to server
[17:30:37] Project: 5801 (Run 7, Clone 176, Gen 0)
[17:30:37] - Read packet limit of 540015616... Set to 524286976.
[17:30:37] - Error: Could not get length of results file work/wuresults_02.dat
[17:30:37] - Error: Could not read unit 02 file. Removing from queue.
[17:30:37] EUE limit exceeded. Pausing 24 hours.