Page 1 of 1

Weird Issue

Posted: Thu Oct 14, 2010 4:15 pm
by tank1023
Okay I fired up a small dedicated folding rig the other day, I've been running one SMP and one GPU client.

My GPU is an EVGA 260 at stock clocks.

The GPU client has failed every 353 WU (5765,5766,5767 and 5768) 131 in all and will sometimes pause my client for 2 hours.

I have been able to complete 353 WU 5769 and up.

All other WU have completed with no failures at all.

Why am I only failing the above WU??


I used GPU tracker to set up my clients FYI

Re: Weird Issue

Posted: Thu Oct 14, 2010 4:30 pm
by PantherX
Please post your FAHlog so we can help you. Also make sure that you use the -verbosity 9 flag to make the FAHlog more detailed which will help us in troubleshooting.

Re: Weird Issue

Posted: Thu Oct 14, 2010 4:41 pm
by tank1023
Here is the last one that failed. This is the first time I used GPU tracker. I normally use the console version. So I'm not too sure how it's all set up.
I thinks it's odd that only certin projects are failing, ones I consider easy. The "harder" ones have never failed.


--- Opening Log file [October 10 08:07:53 UTC]


# Windows GPU Console Edition #################################################
###############################################################################

Folding@Home Client Version 6.23

http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\Montella\Documents\FAH_GPU_Tracker_V2[1]\FAH GPU Tracker V2\GPU0
Executable: C:\Users\Montella\Documents\FAH_GPU_Tracker_V2[1]\FAH GPU Tracker V2\FAH_GPU2.exe
Arguments: -oneunit -forceasm -advmethods -gpu 0

Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.

[08:07:53] - Ask before connecting: No
[08:07:53] - User name: tank1023 (Team 111065)
[08:07:53] - User ID:
[08:07:53] - Machine ID: 3
[08:07:53]
[08:07:53] Work directory not found. Creating...
[08:07:53] Could not open work queue, generating new queue...
[08:07:53] - Preparing to get new work unit...
[08:07:53] + Attempting to get work packet
[08:07:53] - Connecting to assignment server
[08:07:53] - Successful: assigned to (171.67.108.11).
[08:07:53] + News From Folding@Home: Welcome to Folding@Home
[08:07:54] Loaded queue successfully.
[08:07:54] + Closed connections
[08:07:54]
[08:07:54] + Processing work unit
[08:07:54] Core required: FahCore_11.exe
[08:07:54] Core found.
[08:07:54] Working on queue slot 01 [October 10 08:07:54 UTC]
[08:07:54] + Working ...
[08:07:54]
[08:07:54] *------------------------------*
[08:07:54] Folding@Home GPU Core
[08:07:54] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[08:07:54]
[08:07:54] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[08:07:54] Build host: amoeba
[08:07:54] Board Type: Nvidia
[08:07:54] Core :
[08:07:54] Preparing to commence simulation
[08:07:54] - Assembly optimizations manually forced on.
[08:07:54] - Not checking prior termination.
[08:07:54] - Expanded 46648 -> 252912 (decompressed 542.1 percent)
[08:07:54] Called DecompressByteArray: compressed_data_size=46648 data_size=252912, decompressed_data_size=252912 diff=0
[08:07:54] - Digital signature verified
[08:07:54]
[08:07:54] Project: 5768 (Run 11, Clone 99, Gen 850)
[08:07:54]
[08:07:54] Assembly optimizations on if available.
[08:07:54] Entering M.D.
[08:08:00] Tpr hash work/wudata_01.tpr: 4206722245 975669206 4132916699 2085918582 336443692
[08:08:00]
[08:08:00] Calling fah_main args: 14 usage=100
[08:08:00]
[08:08:01] Working on Protein
[08:08:01] Client config found, loading data.
[08:08:01] mdrun_gpu returned
[08:08:01] NANs detected on GPU
[08:08:01]
[08:08:01] Folding@home Core Shutdown: UNSTABLE_MACHINE
[08:08:04] CoreStatus = 7A (122)
[08:08:04] Sending work to server
[08:08:04] Project: 5768 (Run 11, Clone 99, Gen 850)
[08:08:04] - Read packet limit of 540015616... Set to 524286976.
[08:08:04] - Error: Could not get length of results file work/wuresults_01.dat
[08:08:04] - Error: Could not read unit 01 file. Removing from queue.
[08:08:04] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.

Re: Weird Issue

Posted: Thu Oct 14, 2010 5:11 pm
by 7im
Have you ever run the MemtestG80 test program on your GPU?

Re: Weird Issue

Posted: Thu Oct 14, 2010 10:11 pm
by tank1023
7im wrote:Have you ever run the MemtestG80 test program on your GPU?
No, I've used ATI to test my cards before.

Re: Weird Issue

Posted: Thu Oct 14, 2010 10:56 pm
by codysluder
Although this request is specifically addressed to owners of ATI 4000/5000 GPUs, the same software works for your GPU. I recommend that you try it.
viewtopic.php?f=51&t=16119

I know you said you were running at stock clocks, but reducing the clock rate still might help, particularly if you have low voltages or too much heat in your case.

Better cooling might help, too. What's your fan speed?

Re: Weird Issue

Posted: Fri Oct 15, 2010 4:16 am
by tank1023
Temps are 56c this 260 runs way cooler then my 275's.
I posted this problem over at EVGA's forums, I didn't realise this but alot of people are having issues with theses WU's and the 200 series cards. Is this a know issue by Stanford?

Re: Weird Issue

Posted: Fri Oct 15, 2010 4:34 am
by PantherX
tank1023 wrote:...I didn't realise this but alot of people are having issues with theses WU's and the 200 series cards. Is this a know issue by Stanford?
If you mean that Project 5768 is "problematic" for 200 Series GPU, then I have folded ~ 104 of these WUs on my GTX 260 without any issues.

Re: Weird Issue

Posted: Fri Oct 15, 2010 6:40 pm
by tank1023
PantherX wrote:
tank1023 wrote:...I didn't realise this but alot of people are having issues with theses WU's and the 200 series cards. Is this a know issue by Stanford?
If you mean that Project 5768 is "problematic" for 200 Series GPU, then I have folded ~ 104 of these WUs on my GTX 260 without any issues.

Then that confuses me further. I've only been folding for a year and a half but I've never run into this issue before. This card folded for months without any issues, then I put it in a different rig and now I'm having issues.
Do you think it could be a driver issue cause I did just download the latest NVIDA Beta drivers?

Re: Weird Issue

Posted: Sat Oct 16, 2010 1:52 am
by bruce
There seem to be almost constant changes to GPU drivers right now. Many of those changes are game-specific and don't change anything for FAH.

We do NOT recommend running beta drivers. There are good reasons to wait for a fully tested and approved set of drivers. What you do is ultimately your own choice but we have seen more problems with beta drivers and I don't ever remember a FAH problem that was solved by them.

Re: Weird Issue

Posted: Sat Oct 16, 2010 2:29 am
by PantherX
Well, I have no knowledge about GPU Tracker hence am not aware if/how it interacts with the F@h Client. You have stated that this is your first time with GPU Tracker so maybe you're doing something wrong. BTW if the F@h Client were to pause, it would be 24 hours not 2 hours.

To begin with, what is different from your previous rig and this new one? Corrupted Drivers? Incorrectly installed softwares? faulty hardware? etc. Also since you're using GTX 260 and the rig is dedicated, you ought to be using 258.96 WHQL Drivers since they are A) WHQL and not BETA, B) Is recommended for folding. Please note that your GPU can run GPU2 or GPU3 BETA Client and both are available in Console Versions (GPU3 Console version can be downloaded via my GPU Guide; link in signature). Also the -forceasm flag is useless for GPUs as they don't use it.

Re: Weird Issue

Posted: Sat Oct 16, 2010 2:35 am
by tank1023
Thanks panther, I'll try you suggestions. I use the console version on all my other GPU's may be time to reinstall the drivers and console client. Thanks for the input guys!