Page 1 of 1

P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 1:21 am
by rickoic
Running on eVga 460 O/C @ 806.

Began to suddenly get NAN's with GPU 1. Tried several times to run work unit but got 5 failed.

Code: Select all


--- Opening Log file [September 9 01:11:42 UTC] 


# Windows GPU Systray Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30r2

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: c:\gpu1
Arguments: -gpu 1 

[01:11:42] - Ask before connecting: No
[01:11:42] - User name: rickoic (Team 15)
[01:11:42] - 
[01:11:42] - Machine ID: 3
[01:11:42] 
[01:11:42] Gpu type=3 species=0.
[01:11:42] Loaded queue successfully.
[01:11:42] Initialization complete
[01:11:42] 
[01:11:42] + Processing work unit
[01:11:42] Core required: FahCore_15.exe
[01:11:42] Core found.
[01:11:42] Working on queue slot 06 [September 9 01:11:42 UTC]
[01:11:42] + Working ...
[01:11:43] 
[01:11:43] *------------------------------*
[01:11:43] Folding@Home GPU Core -- Beta
[01:11:43] Version 2.09 (Thu May 20 11:58:42 PDT 2010)
[01:11:43] 
[01:11:43] Build host: SimbiosNvdWin7
[01:11:43] Board Type: Nvidia
[01:11:43] Core      : 
[01:11:43] Preparing to commence simulation
[01:11:43] - Looking at optimizations...
[01:11:43] - Files status OK
[01:11:43] sizeof(CORE_PACKET_HDR) = 512 file=<>
[01:11:43] - Expanded 28902 -> 163067 (decompressed 564.2 percent)
[01:11:43] Called DecompressByteArray: compressed_data_size=28902 data_size=163067, decompressed_data_size=163067 diff=0
[01:11:43] - Digital signature verified
[01:11:43] 
[01:11:43] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:11:43] 
[01:11:43] Assembly optimizations on if available.
[01:11:43] Entering M.D.
[01:11:49] Tpr hash work/wudata_06.tpr:  4106429482 201827291 1542027732 544844961 2111337814
[01:11:50] Working on 582 p2750_N68H_AM03
[01:11:50] Client config found, loading data.
[01:11:50] Starting GUI Server
[01:11:54] mdrun_gpu returned 
[01:11:54] NANs detected on GPU
[01:11:54] 
[01:11:54] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:11:57] CoreStatus = 7A (122)
[01:11:57] Sending work to server
[01:11:57] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:11:57] - Error: Could not get length of results file work/wuresults_06.dat
[01:11:57] - Error: Could not read unit 06 file. Removing from queue.
[01:11:57] - Preparing to get new work unit...
[01:11:57] Cleaning up work directory
[01:11:57] + Attempting to get work packet
[01:11:57] Passkey found
[01:11:57] Gpu type=3 species=0.
[01:11:57] - Connecting to assignment server
[01:12:02] - Successful: assigned to (171.67.108.20).
[01:12:02] + News From Folding@Home: Welcome to Folding@Home
[01:12:02] Loaded queue successfully.
[01:12:02] Gpu type=3 species=0.
[01:12:13] + Closed connections
[01:12:18] 
[01:12:18] + Processing work unit
[01:12:18] Core required: FahCore_15.exe
[01:12:18] Core found.
[01:12:18] Working on queue slot 07 [September 9 01:12:18 UTC]
[01:12:18] + Working ...
[01:12:18] 
[01:12:18] *------------------------------*
[01:12:18] Folding@Home GPU Core -- Beta
[01:12:18] Version 2.09 (Thu May 20 11:58:42 PDT 2010)
[01:12:18] 
[01:12:18] Build host: SimbiosNvdWin7
[01:12:18] Board Type: Nvidia
[01:12:18] Core      : 
[01:12:18] Preparing to commence simulation
[01:12:18] - Looking at optimizations...
[01:12:18] DeleteFrameFiles: successfully deleted file=work/wudata_07.ckp
[01:12:18] - Created dyn
[01:12:18] - Files status OK
[01:12:18] sizeof(CORE_PACKET_HDR) = 512 file=<>
[01:12:18] - Expanded 28902 -> 163067 (decompressed 564.2 percent)
[01:12:18] Called DecompressByteArray: compressed_data_size=28902 data_size=163067, decompressed_data_size=163067 diff=0
[01:12:18] - Digital signature verified
[01:12:18] 
[01:12:18] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:12:18] 
[01:12:18] Assembly optimizations on if available.
[01:12:18] Entering M.D.
[01:12:24] Tpr hash work/wudata_07.tpr:  4106429482 201827291 1542027732 544844961 2111337814
[01:12:24] Working on 582 p2750_N68H_AM03
[01:12:24] Client config found, loading data.
[01:12:25] Starting GUI Server
[01:12:29] mdrun_gpu returned 
[01:12:29] NANs detected on GPU
[01:12:29] 
[01:12:29] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:12:33] CoreStatus = 7A (122)
[01:12:33] Sending work to server
[01:12:33] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:12:33] - Error: Could not get length of results file work/wuresults_07.dat
[01:12:33] - Error: Could not read unit 07 file. Removing from queue.
[01:12:33] - Preparing to get new work unit...
[01:12:33] Cleaning up work directory
[01:12:33] + Attempting to get work packet
[01:12:33] Passkey found
[01:12:33] Gpu type=3 species=0.
[01:12:33] - Connecting to assignment server
[01:12:33] - Successful: assigned to (171.67.108.20).
[01:12:33] + News From Folding@Home: Welcome to Folding@Home
[01:12:33] Loaded queue successfully.
[01:12:33] Gpu type=3 species=0.
[01:12:34] + Closed connections
[01:12:39] 
[01:12:39] + Processing work unit
[01:12:39] Core required: FahCore_15.exe
[01:12:39] Core found.
[01:12:39] Working on queue slot 08 [September 9 01:12:39 UTC]
[01:12:39] + Working ...
[01:12:39] 
[01:12:39] *------------------------------*
[01:12:39] Folding@Home GPU Core -- Beta
[01:12:39] Version 2.09 (Thu May 20 11:58:42 PDT 2010)
[01:12:39] 
[01:12:39] Build host: SimbiosNvdWin7
[01:12:39] Board Type: Nvidia
[01:12:39] Core      : 
[01:12:39] Preparing to commence simulation
[01:12:39] - Looking at optimizations...
[01:12:39] DeleteFrameFiles: successfully deleted file=work/wudata_08.ckp
[01:12:39] - Created dyn
[01:12:39] - Files status OK
[01:12:39] sizeof(CORE_PACKET_HDR) = 512 file=<>
[01:12:39] - Expanded 28902 -> 163067 (decompressed 564.2 percent)
[01:12:39] Called DecompressByteArray: compressed_data_size=28902 data_size=163067, decompressed_data_size=163067 diff=0
[01:12:39] - Digital signature verified
[01:12:39] 
[01:12:39] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:12:39] 
[01:12:39] Assembly optimizations on if available.
[01:12:39] Entering M.D.
[01:12:45] Tpr hash work/wudata_08.tpr:  4106429482 201827291 1542027732 544844961 2111337814
[01:12:45] Working on 582 p2750_N68H_AM03
[01:12:45] Client config found, loading data.
[01:12:46] Starting GUI Server
[01:12:50] mdrun_gpu returned 
[01:12:50] NANs detected on GPU
[01:12:50] 
[01:12:50] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:12:54] CoreStatus = 7A (122)
[01:12:54] Sending work to server
[01:12:54] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:12:54] - Error: Could not get length of results file work/wuresults_08.dat
[01:12:54] - Error: Could not read unit 08 file. Removing from queue.
[01:12:54] - Preparing to get new work unit...
[01:12:54] Cleaning up work directory
[01:12:54] + Attempting to get work packet
[01:12:54] Passkey found
[01:12:54] Gpu type=3 species=0.
[01:12:54] - Connecting to assignment server
[01:12:54] - Successful: assigned to (171.67.108.20).
[01:12:54] + News From Folding@Home: Welcome to Folding@Home
[01:12:54] Loaded queue successfully.
[01:12:54] Gpu type=3 species=0.
[01:12:55] + Closed connections
[01:13:00] 
[01:13:00] + Processing work unit
[01:13:00] Core required: FahCore_15.exe
[01:13:00] Core found.
[01:13:00] Working on queue slot 09 [September 9 01:13:00 UTC]
[01:13:00] + Working ...
[01:13:00] 
[01:13:00] *------------------------------*
[01:13:00] Folding@Home GPU Core -- Beta
[01:13:00] Version 2.09 (Thu May 20 11:58:42 PDT 2010)
[01:13:00] 
[01:13:00] Build host: SimbiosNvdWin7
[01:13:00] Board Type: Nvidia
[01:13:00] Core      : 
[01:13:00] Preparing to commence simulation
[01:13:00] - Looking at optimizations...
[01:13:00] DeleteFrameFiles: successfully deleted file=work/wudata_09.ckp
[01:13:00] - Created dyn
[01:13:00] - Files status OK
[01:13:00] sizeof(CORE_PACKET_HDR) = 512 file=<>
[01:13:00] - Expanded 28902 -> 163067 (decompressed 564.2 percent)
[01:13:00] Called DecompressByteArray: compressed_data_size=28902 data_size=163067, decompressed_data_size=163067 diff=0
[01:13:00] - Digital signature verified
[01:13:00] 
[01:13:00] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:13:00] 
[01:13:00] Assembly optimizations on if available.
[01:13:00] Entering M.D.
[01:13:06] Tpr hash work/wudata_09.tpr:  4106429482 201827291 1542027732 544844961 2111337814
[01:13:06] Working on 582 p2750_N68H_AM03
[01:13:06] Client config found, loading data.
[01:13:07] Starting GUI Server
[01:13:12] mdrun_gpu returned 
[01:13:12] NANs detected on GPU
[01:13:12] 
[01:13:12] Folding@home Core Shutdown: UNSTABLE_MACHINE
[01:13:14] CoreStatus = 7A (122)
[01:13:14] Sending work to server
[01:13:14] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:13:14] - Error: Could not get length of results file work/wuresults_09.dat
[01:13:14] - Error: Could not read unit 09 file. Removing from queue.
[01:13:14] - Preparing to get new work unit...
[01:13:14] Cleaning up work directory
[01:13:14] + Attempting to get work packet
[01:13:14] Passkey found
[01:13:14] Gpu type=3 species=0.
[01:13:14] - Connecting to assignment server
[01:13:15] - Successful: assigned to (171.67.108.20).
[01:13:15] + News From Folding@Home: Welcome to Folding@Home
[01:13:15] Loaded queue successfully.
[01:13:15] Gpu type=3 species=0.
[01:13:16] + Closed connections
[01:13:21] 
[01:13:21] + Processing work unit
[01:13:21] Core required: FahCore_15.exe
[01:13:21] Core found.
[01:13:21] Working on queue slot 00 [September 9 01:13:21 UTC]
[01:13:21] + Working ...
[01:13:21] 
[01:13:21] *------------------------------*
[01:13:21] Folding@Home GPU Core -- Beta
[01:13:21] Version 2.09 (Thu May 20 11:58:42 PDT 2010)
[01:13:21] 
[01:13:21] Build host: SimbiosNvdWin7
[01:13:21] Board Type: Nvidia
[01:13:21] Core      : 
[01:13:21] Preparing to commence simulation
[01:13:21] - Looking at optimizations...
[01:13:21] DeleteFrameFiles: successfully deleted file=work/wudata_00.ckp
[01:13:21] - Created dyn
[01:13:21] - Files status OK
[01:13:21] sizeof(CORE_PACKET_HDR) = 512 file=<>
[01:13:21] - Expanded 28902 -> 163067 (decompressed 564.2 percent)
[01:13:21] Called DecompressByteArray: compressed_data_size=28902 data_size=163067, decompressed_data_size=163067 diff=0
[01:13:21] - Digital signature verified
[01:13:21] 
[01:13:21] Project: 10632 (Run 66, Clone 44, Gen 3)
[01:13:21] 
[01:13:21] Assembly optimizations on if available.
[01:13:21] Entering M.D.
GPU 0 in main slot was running just fine.
Stopped pc and swapped GPU's now GPU 1 (now gpu 0) continues current w/u just fine without errors.
GPU 0 (now 1) began getting NAN's immediately and timed out.

Tks
Rick

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 1:27 am
by COOLDUDEGAMER
Hello rickoic,

Did you try to either delete the core and/or the WU then try it again? If so, I would probably re-setup the FAH directory it was installed to.

These are just basic questions from me as I have limited knowledge with multi-GPU systems, but someone else should be here to help soon.

I hope I helped a little.

Signed,

COOLDUDEGAMER

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 1:48 am
by rickoic
I did try that and got the same w/u d/l'd after deleting all files except the client, and the dll's, with the same results.

Tks
Rick

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 2:00 am
by gwildperson
I'm not sure I understand all the numbers in your sig. Are you saying that both GPUs are identical hardware (and therefore interchangeable as far as drivers are concerned)? If so, it's clearly some kind of hardware failure (e.g.- one GPU won't take the same overclock as the other).

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 2:26 am
by rickoic
Both GPU's are eVga 460's. So yes interchangable.
Motherboard has 3 16X slots so I also tried slots 1 and 2, and well as 1 and 3 with the same results.
Thought of the OC factor so lowered it to 794 which didn't help.
Like I said the GPU that was #1 GPU (which was doing NAN's) worked just fine when moved to #0 GPU.
GPU which was working fine as #0 GPU began giving NAN's when moved to GPU #1 whether in slot 2 or 3.

Tks
Rick

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 2:30 am
by gwildperson
When you interchanged #0 and #1, did that mean pci-e slots or did it mean data directories or did it mean both?

So is it a bad WU or has it been true for a whole series of WUs?

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 3:05 am
by PantherX
This is what I can understand: (I hope that I am correct; if not, please correct me)
GTX 460 A in PCI-E Slot 1 = Folding
GTX 460 B in PCI-E Slot 1 = Folding
GTX 460 A in PCI-E Slot 2 = NANs
GTX 460 B in PCI-E Slot 2 = NANs
GTX 460 A in PCI-E Slot 3 = NANs
GTX 460 B in PCI-E Slot 3 = NANs

So here are some questions/suggestions:
Is your PSU powerful enough to support 2 X GTX 460?
Rather than reducing the OC, return them to stock or lower to see if the problem goes away
Have you tweaked any unknown BIOS settings?
What driver version are you using?
Are you using SLI? If no then enable it. If yes then disable it.
Use GPU A only in the system and test each PCI-E Slot, repeat with GPU B. If there are no errors, then it might be a motherboard issue

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 4:40 am
by bruce
I'm not sure if the information is useful, but Project: 10632 (Run 66, Clone 44, Gen 3) has been successfully completed by someone so its NOT a bad WU.

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 1:51 pm
by rickoic
Well it may not be a wu problem for everyone, but wu 10632 (Run 66, Clone 44, Gen 3) was a problem for me.
I erased the wu once again and then threw the -advmethods flag onto my shortcut and started the GPU again.
This time it d/l'd P10936 (Run 1, Clone 0, Gen 6) and is happily folding along. Know its early yet, but it's gotten beyond just Failing and is actually folding having completed 4% so far, with no errors.

Fold on
Rick

Re: P10632 (R66, C44, G3)

Posted: Thu Sep 09, 2010 2:48 pm
by rickoic
PantherX wrote:This is what I can understand: (I hope that I am correct; if not, please correct me)
GTX 460 A in PCI-E Slot 1 = Folding
GTX 460 B in PCI-E Slot 1 = Folding
GTX 460 A in PCI-E Slot 2 = NANs
GTX 460 B in PCI-E Slot 2 = NANs
GTX 460 A in PCI-E Slot 3 = NANs
GTX 460 B in PCI-E Slot 3 = NANs

So here are some questions/suggestions:
Is your PSU powerful enough to support 2 X GTX 460?Power supply is a Stablepower 1000W
Rather than reducing the OC, return them to stock or lower to see if the problem goes away
Have you tweaked any unknown BIOS settings?Last time I changed BIOS was to restore default settings which was 74 wu's ago
What driver version are you using?8.17.12.5896
Are you using SLI? If no then enable it. If yes then disable it.
Use GPU A only in the system and test each PCI-E Slot, repeat with GPU B. If there are no errors, then it might be a motherboard issue
As indicated in other post GPU is folding nicely now on a P10936 and is 57% completed with no problems whatsoever.

Fold on
Rick

Re: P10632 (R66, C44, G3)

Posted: Mon Sep 13, 2010 6:48 pm
by COOLDUDEGAMER
I am glad that you got the 2nd GPU folding. :)