GTX 780 BAd Work Units?

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

s/j
Posts: 16
Joined: Sun Sep 20, 2009 2:53 pm

GTX 780 BAd Work Units?

Post by s/j »

I still have the issue from a prior closed thread in the Drivers posts as follows: (Does this still seem to be an AS issue? Should I just continue to wait for Stanford to refine the AS? Thoughts?):

I have a new build, folded great for a week, now one of the GPUs (Zotac GTX 780 fails folding evrytime. GTX 970 folds great.
I have tried swapping PCIE16 positions as well as pausing folding, deleting work folder and re-booting.
Using MSI Afterburner, from day 1 the GPUs have been underclocked 120 mhz and GPU temps are consistently in the low 70's C.
Build is AMD FX-8350, 16 GB RAM, Gigabyte990FXA-UD3 Motherboard, 1000w Rosewill PSU. GPU 1 is EVGA GTX 970, GPU 2 is Zotac GTX 780 OC. Driver Version is 9.18.13.4411. (344.16)
Here is some log on a GPU failure:
Log Started 2014-10-06T19:46:01Z ***********************
19:46:02:WU03:FS01:0x17:Project: 13001 (Run 236, Clone 3, Gen 11)
19:46:02:WU03:FS01:0x17:Unit: 0x0000001d538b3db7532892a3432b10e4
19:46:02:WU03:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:46:02:WU03:FS01:0x17:Machine: 1
19:46:02:WU03:FS01:0x17:Reading tar file state.xml
19:46:02:WU03:FS01:0x17:Reading tar file system.xml
19:46:03:WU03:FS01:0x17:Reading tar file integrator.xml
19:46:03:WU03:FS01:0x17:Reading tar file core.xml
19:46:03:WU03:FS01:0x17:Digital signatures verified
19:46:03:WU03:FS01:0x17:Folding@home GPU core17
19:46:03:WU03:FS01:0x17:Version 0.0.52
19:49:10:WU02:FS00:0xa4:Completed 390000 out of 500000 steps (78%)
19:49:55:WU03:FS01:0x17:ERROR:exception: Force RMSE error of 453.966 with threshold of 5
19:49:55:WU03:FS01:0x17:Saving result file logfile_01.txt
19:49:55:WU03:FS01:0x17:Saving result file log.txt
19:49:55:WU03:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
19:49:56:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:49:56:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:236 clone:3 gen:11 core:0x17 unit:0x0000001d538b3db7532892a3432b10e4
19:49:56:WU03:FS01:Uploading 2.26KiB to 140.163.4.231
19:49:56:WU03:FS01:Connecting to 140.163.4.231:8080
19:49:56:WU03:FS01:Upload complete
19:49:56:WU03:FS01:Server responded WORK_ACK (400)
Any thoughts?
rwh202
Posts: 410
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: GTX 780 BAd Work Units?

Post by rwh202 »

Are you sure which gpu is folding and which is failing?
That failure is typical of maxwell on core_17 which would be the 970. The client often mixes up the slots on a mixed GPU system so it might be worth checking the temps / usage of the gpu to be sure which is running. I had that problem with a 660 and 750ti in the same system.
If that's the issue, then it will be a case of manually setting the slot/cuda/opencl ids.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GTX 780 BAd Work Units?

Post by bruce »

That WU has been reissued to 5 different people and in each case, it failed. There's always a chance of bad WUs and the only way to identify them is to process them. They're reissued a few times and either completed or taken out of circulation. The assignments of this WU were all on 2014-10-05 and it was withdrawn from circulation.

It truly is a BAD_WORK_UNIT and this has NOTHING to do with the AS. Asking about a single failure after this much time has elapsed doesn't help anybody identify a problem which can be solved.
s/j
Posts: 16
Joined: Sun Sep 20, 2009 2:53 pm

Re: GTX 780 BAd Work Units?

Post by s/j »

RWH, I wondered the same thing so I un-installed F@H and re-installed with one GPU to verify, as I also thought perhaps it was actually the 970 failing. I confirmed it is the 780.

Bruce, as stated, I have tried deleting work folder and re-booting and try this multiple times per day. I get that it is not a single bad WU, just saying that is the log message. I could post mulitple logs from today with the exact same message but I don't see how that contributes anything either? I thought my post made it apparent this is not a single failure.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GTX 780 BAd Work Units?

Post by bruce »

Please list the project/run/clone/gen of several more WUs that are failing.
Breach
Posts: 204
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: GTX 780 BAd Work Units?

Post by Breach »

13001 had a similar failure on Maxwells, only that there all WUs would fail instantly.

viewtopic.php?f=18&t=26807&start=60#p269635

This failure rather looks more similar to the failures of 10470-10473:

viewtopic.php?f=66&t=26528&start=60#p269314

Where the fault woudl happen during folding (with some WUs only from experience).
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
s/j
Posts: 16
Joined: Sun Sep 20, 2009 2:53 pm

Re: GTX 780 BAd Work Units?

Post by s/j »

Will do Bruce. (It's going to suck if this GTX 780 has died after one week).

Here are a few from today:
18:56:41:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:56:41:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:69 clone:3 gen:42 core:0x17 unit:0x0000006c538b3db75328634d9a354a91

17:24:53:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:24:53:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:15 clone:8 gen:31 core:0x17 unit:0x00000040538b3db753285433c0ce9452

17:13:35:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:13:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:960 clone:1 gen:19 core:0x17 unit:0x00000025538b3db75310aabe24976893

As RWH suggested, If I didn't know better I would swear the system has the 970 and 780 confused.

Thanks.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GTX 780 BAd Work Units?

Post by bruce »

project:13001 run:69 clone:3 gen:42. Bad WU. Failed repeatedly.
project:13001 run:15 clone:8 gen:31. Indeterminate until later. Failed only for you.
project:13000 run:960 clone:1 gen:19. Bad WU. Failed repeatedly.
Kjetil
Posts: 175
Joined: Sat Apr 14, 2012 5:56 pm
Location: Stavanger Norway

Re: GTX 780 BAd Work Units?

Post by Kjetil »

It is the same error i have on maxwell, now they running, but very slow. 980-PPD 17H, 750Ti 1D 12H on core 18 P1047x.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GTX 780 BAd Work Units?

Post by bruce »

This may or may not help, but it can't hurt.

In a part of the log just before the part that you posted, you'll find a message something like
Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" xyz/FahCore_17.exe ...(arguments)

Carefully copy the entire colored directory including whatever actually appears where I've abbreviated a long path as xyz.
Stop your client.
Delete the file xyz/FahCore_17.exe
Restart the client.
(A new copy of xyz/FahCore_17.exe will download and work should resume.)

Let me know if this helps.
s/j
Posts: 16
Joined: Sun Sep 20, 2009 2:53 pm

Re: GTX 780 BAd Work Units?

Post by s/j »

Bruce, no joy. I verified new core download:
20:06:00:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AM ... ore_17.fah
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GTX 780 BAd Work Units?

Post by 7im »

What version was downloaded? v52 or v55
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
s/j
Posts: 16
Joined: Sun Sep 20, 2009 2:53 pm

Re: GTX 780 BAd Work Units?

Post by s/j »

7im: Where do I get that? Don't see in core file properties.
Here is what log says:
20:06:04:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Eds Sled/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 3276 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
20:06:04:WU00:FS01:Started FahCore on PID 3788
bollix47
Posts: 2965
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: GTX 780 BAd Work Units?

Post by bollix47 »

Navigate to the directory with FahCore_17.exe in it and open a command prompt.

Type:

FahCore_17.exe --info

The version should be just under the ******Build****** line.

EDIT: The following is much easier. :e?:
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: GTX 780 BAd Work Units?

Post by 7im »

Or in the log file when the FAHCore and work unit starts up...
19:46:02:WU03:FS01:0x17:Unit: 0x0000001d538b3db7532892a3432b10e4
19:46:02:WU03:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:46:02:WU03:FS01:0x17:Machine: 1
19:46:02:WU03:FS01:0x17:Reading tar file state.xml
19:46:02:WU03:FS01:0x17:Reading tar file system.xml
19:46:03:WU03:FS01:0x17:Reading tar file integrator.xml
19:46:03:WU03:FS01:0x17:Reading tar file core.xml
19:46:03:WU03:FS01:0x17:Digital signatures verified
19:46:03:WU03:FS01:0x17:Folding@home GPU core17
19:46:03:WU03:FS01:0x17:Version 0.0.52
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply