Project 5767 (Run 5- Clone 173- Gen 1013)
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 6
- Joined: Sat Mar 14, 2009 1:22 am
Project 5767 (Run 5- Clone 173- Gen 1013)
This WU has been assigned to me repeatedly, and fails to process on each of 5 GPUs that it has been assigned to; GPUs which have not had problems processing other 576x Projects. I'm wondering if this is, perhaps, just a bad WU? If so, is there a way to prevent it from being re-assigned to me?
-
- Site Moderator
- Posts: 6359
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Project 5767 (Run 5- Clone 173- Gen 1013)
Could you show the log of a failure ?
-
- Posts: 6
- Joined: Sat Mar 14, 2009 1:22 am
Re: Project 5767 (Run 5- Clone 173- Gen 1013)
This is the log of the last time it happened. My other fahlog-prev.txts only go back a few days, so I don't have any of them to post, though the error message is exactly the same. I make a list of any WU that I have that fails so I have a record to compare against should I get another WU that fails. In most cases, if a WU fails, I am usually able to process the same WU after stopping and restarting the client. This particular unit I have not had such luck with.
Please keep in mind that this particular WU has failed on multiple different cards, including my two GTS250s which process these 576x WU on a very regular basis without failing. In fact, those two cards have been so remarkably stable, that I pay special attention if a WU DOES fail on them. It is very, very rare. - I do not have any temperature problems on any of the cards. They stay between 68°C and 75°C while folding on these WU in an air conditioned room.
Here's the Log:
#######################################################################################################################
Please keep in mind that this particular WU has failed on multiple different cards, including my two GTS250s which process these 576x WU on a very regular basis without failing. In fact, those two cards have been so remarkably stable, that I pay special attention if a WU DOES fail on them. It is very, very rare. - I do not have any temperature problems on any of the cards. They stay between 68°C and 75°C while folding on these WU in an air conditioned room.
Here's the Log:
#######################################################################################################################
Code: Select all
[10:49:01] *------------------------------*
[10:49:01] Folding@Home GPU Core - Beta
[10:49:01] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:49:01]
[10:49:01] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[10:49:01] Build host: amoeba
[10:49:01] Board Type: Nvidia
[10:49:01] Core :
[10:49:01] Preparing to commence simulation
[10:49:01] - Looking at optimizations...
[10:49:01] - Created dyn
[10:49:01] - Files status OK
[10:49:01] - Expanded 46576 -> 252912 (decompressed 543.0 percent)
[10:49:01] Called DecompressByteArray: compressed_data_size=46576 data_size=252912, decompressed_data_size=252912 diff=0
[10:49:01] - Digital signature verified
[10:49:01]
[10:49:01] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:01]
[10:49:01] Assembly optimizations on if available.
[10:49:01] Entering M.D.
[10:49:08] Working on Protein
[10:49:08] Client config found, loading data.
[10:49:08] mdrun_gpu returned
[10:49:08] NANs detected on GPU
[10:49:08]
[10:49:08] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:49:11] CoreStatus = 7A (122)
[10:49:11] Sending work to server
[10:49:11] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:11] - Read packet limit of 540015616... Set to 524286976.
[10:49:11] - Error: Could not get length of results file work/wuresults_03.dat
[10:49:11] - Error: Could not read unit 03 file. Removing from queue.
[10:49:11] - Preparing to get new work unit...
[10:49:11] + Attempting to get work packet
[10:49:11] - Connecting to assignment server
[10:49:12] - Successful: assigned to (171.67.108.11).
[10:49:12] + News From Folding@Home: Welcome to Folding@Home
[10:49:12] Loaded queue successfully.
[10:49:13] + Closed connections
[10:49:18]
[10:49:18] + Processing work unit
[10:49:18] Core required: FahCore_11.exe
[10:49:18] Core found.
[10:49:18] Working on queue slot 04 [September 22 10:49:18 UTC]
[10:49:18] + Working ...
[10:49:18]
[10:49:18] *------------------------------*
[10:49:18] Folding@Home GPU Core - Beta
[10:49:18] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:49:18]
[10:49:18] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[10:49:18] Build host: amoeba
[10:49:18] Board Type: Nvidia
[10:49:18] Core :
[10:49:18] Preparing to commence simulation
[10:49:18] - Looking at optimizations...
[10:49:18] - Created dyn
[10:49:18] - Files status OK
[10:49:18] - Expanded 46576 -> 252912 (decompressed 543.0 percent)
[10:49:18] Called DecompressByteArray: compressed_data_size=46576 data_size=252912, decompressed_data_size=252912 diff=0
[10:49:18] - Digital signature verified
[10:49:18]
[10:49:18] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:18]
[10:49:18] Assembly optimizations on if available.
[10:49:18] Entering M.D.
[10:49:25] Working on Protein
[10:49:25] Client config found, loading data.
[10:49:25] mdrun_gpu returned
[10:49:25] NANs detected on GPU
[10:49:25]
[10:49:25] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:49:28] CoreStatus = 7A (122)
[10:49:28] Sending work to server
[10:49:28] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:28] - Read packet limit of 540015616... Set to 524286976.
[10:49:28] - Error: Could not get length of results file work/wuresults_04.dat
[10:49:28] - Error: Could not read unit 04 file. Removing from queue.
[10:49:28] - Preparing to get new work unit...
[10:49:28] + Attempting to get work packet
[10:49:28] - Connecting to assignment server
[10:49:29] - Successful: assigned to (171.67.108.11).
[10:49:29] + News From Folding@Home: Welcome to Folding@Home
[10:49:29] Loaded queue successfully.
[10:49:30] + Closed connections
[10:49:35]
[10:49:35] + Processing work unit
[10:49:35] Core required: FahCore_11.exe
[10:49:35] Core found.
[10:49:35] Working on queue slot 05 [September 22 10:49:35 UTC]
[10:49:35] + Working ...
[10:49:35]
[10:49:35] *------------------------------*
[10:49:35] Folding@Home GPU Core - Beta
[10:49:35] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:49:35]
[10:49:35] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[10:49:35] Build host: amoeba
[10:49:35] Board Type: Nvidia
[10:49:35] Core :
[10:49:35] Preparing to commence simulation
[10:49:35] - Looking at optimizations...
[10:49:35] - Created dyn
[10:49:35] - Files status OK
[10:49:35] - Expanded 46576 -> 252912 (decompressed 543.0 percent)
[10:49:35] Called DecompressByteArray: compressed_data_size=46576 data_size=252912, decompressed_data_size=252912 diff=0
[10:49:35] - Digital signature verified
[10:49:35]
[10:49:35] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:35]
[10:49:35] Assembly optimizations on if available.
[10:49:35] Entering M.D.
[10:49:42] Working on Protein
[10:49:42] Client config found, loading data.
[10:49:42] mdrun_gpu returned
[10:49:42] NANs detected on GPU
[10:49:42]
[10:49:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:49:45] CoreStatus = 7A (122)
[10:49:45] Sending work to server
[10:49:45] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:45] - Read packet limit of 540015616... Set to 524286976.
[10:49:45] - Error: Could not get length of results file work/wuresults_05.dat
[10:49:45] - Error: Could not read unit 05 file. Removing from queue.
[10:49:45] - Preparing to get new work unit...
[10:49:45] + Attempting to get work packet
[10:49:45] - Connecting to assignment server
[10:49:46] - Successful: assigned to (171.67.108.11).
[10:49:46] + News From Folding@Home: Welcome to Folding@Home
[10:49:46] Loaded queue successfully.
[10:49:47] + Closed connections
[10:49:52]
[10:49:52] + Processing work unit
[10:49:52] Core required: FahCore_11.exe
[10:49:52] Core found.
[10:49:52] Working on queue slot 06 [September 22 10:49:52 UTC]
[10:49:52] + Working ...
[10:49:52]
[10:49:52] *------------------------------*
[10:49:52] Folding@Home GPU Core - Beta
[10:49:52] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:49:52]
[10:49:52] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[10:49:52] Build host: amoeba
[10:49:52] Board Type: Nvidia
[10:49:52] Core :
[10:49:52] Preparing to commence simulation
[10:49:52] - Looking at optimizations...
[10:49:52] - Created dyn
[10:49:52] - Files status OK
[10:49:52] - Expanded 46576 -> 252912 (decompressed 543.0 percent)
[10:49:52] Called DecompressByteArray: compressed_data_size=46576 data_size=252912, decompressed_data_size=252912 diff=0
[10:49:52] - Digital signature verified
[10:49:52]
[10:49:52] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:49:52]
[10:49:52] Assembly optimizations on if available.
[10:49:52] Entering M.D.
[10:49:59] Working on Protein
[10:49:59] Client config found, loading data.
[10:49:59] mdrun_gpu returned
[10:49:59] NANs detected on GPU
[10:49:59]
[10:49:59] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:50:02] CoreStatus = 7A (122)
[10:50:02] Sending work to server
[10:50:02] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:50:02] - Read packet limit of 540015616... Set to 524286976.
[10:50:02] - Error: Could not get length of results file work/wuresults_06.dat
[10:50:02] - Error: Could not read unit 06 file. Removing from queue.
[10:50:02] - Preparing to get new work unit...
[10:50:02] + Attempting to get work packet
[10:50:02] - Connecting to assignment server
[10:50:03] - Successful: assigned to (171.67.108.11).
[10:50:03] + News From Folding@Home: Welcome to Folding@Home
[10:50:03] Loaded queue successfully.
[10:50:04] + Closed connections
[10:50:09]
[10:50:09] + Processing work unit
[10:50:09] Core required: FahCore_11.exe
[10:50:09] Core found.
[10:50:09] Working on queue slot 07 [September 22 10:50:09 UTC]
[10:50:09] + Working ...
[10:50:09]
[10:50:09] *------------------------------*
[10:50:09] Folding@Home GPU Core - Beta
[10:50:09] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[10:50:09]
[10:50:09] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[10:50:09] Build host: amoeba
[10:50:09] Board Type: Nvidia
[10:50:09] Core :
[10:50:09] Preparing to commence simulation
[10:50:09] - Looking at optimizations...
[10:50:09] - Created dyn
[10:50:09] - Files status OK
[10:50:09] - Expanded 46576 -> 252912 (decompressed 543.0 percent)
[10:50:09] Called DecompressByteArray: compressed_data_size=46576 data_size=252912, decompressed_data_size=252912 diff=0
[10:50:09] - Digital signature verified
[10:50:09]
[10:50:09] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:50:09]
[10:50:09] Assembly optimizations on if available.
[10:50:09] Entering M.D.
[10:50:16] Working on Protein
[10:50:16] Client config found, loading data.
[10:50:16] mdrun_gpu returned
[10:50:16] NANs detected on GPU
[10:50:16]
[10:50:16] Folding@home Core Shutdown: UNSTABLE_MACHINE
[10:50:19] CoreStatus = 7A (122)
[10:50:19] Sending work to server
[10:50:19] Project: 5767 (Run 5, Clone 173, Gen 1013)
[10:50:19] - Read packet limit of 540015616... Set to 524286976.
[10:50:19] - Error: Could not get length of results file work/wuresults_07.dat
[10:50:19] - Error: Could not read unit 07 file. Removing from queue.
[10:50:19] EUE limit exceeded. Pausing 24 hours.
[16:39:33] + Working...
Folding@Home Client Shutdown.