GTX 780 BAd Work Units?
Moderators: Site Moderators, FAHC Science Team
GTX 780 BAd Work Units?
I still have the issue from a prior closed thread in the Drivers posts as follows: (Does this still seem to be an AS issue? Should I just continue to wait for Stanford to refine the AS? Thoughts?):
I have a new build, folded great for a week, now one of the GPUs (Zotac GTX 780 fails folding evrytime. GTX 970 folds great.
I have tried swapping PCIE16 positions as well as pausing folding, deleting work folder and re-booting.
Using MSI Afterburner, from day 1 the GPUs have been underclocked 120 mhz and GPU temps are consistently in the low 70's C.
Build is AMD FX-8350, 16 GB RAM, Gigabyte990FXA-UD3 Motherboard, 1000w Rosewill PSU. GPU 1 is EVGA GTX 970, GPU 2 is Zotac GTX 780 OC. Driver Version is 9.18.13.4411. (344.16)
Here is some log on a GPU failure:
Log Started 2014-10-06T19:46:01Z ***********************
19:46:02:WU03:FS01:0x17:Project: 13001 (Run 236, Clone 3, Gen 11)
19:46:02:WU03:FS01:0x17:Unit: 0x0000001d538b3db7532892a3432b10e4
19:46:02:WU03:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:46:02:WU03:FS01:0x17:Machine: 1
19:46:02:WU03:FS01:0x17:Reading tar file state.xml
19:46:02:WU03:FS01:0x17:Reading tar file system.xml
19:46:03:WU03:FS01:0x17:Reading tar file integrator.xml
19:46:03:WU03:FS01:0x17:Reading tar file core.xml
19:46:03:WU03:FS01:0x17:Digital signatures verified
19:46:03:WU03:FS01:0x17:Folding@home GPU core17
19:46:03:WU03:FS01:0x17:Version 0.0.52
19:49:10:WU02:FS00:0xa4:Completed 390000 out of 500000 steps (78%)
19:49:55:WU03:FS01:0x17:ERROR:exception: Force RMSE error of 453.966 with threshold of 5
19:49:55:WU03:FS01:0x17:Saving result file logfile_01.txt
19:49:55:WU03:FS01:0x17:Saving result file log.txt
19:49:55:WU03:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
19:49:56:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:49:56:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:236 clone:3 gen:11 core:0x17 unit:0x0000001d538b3db7532892a3432b10e4
19:49:56:WU03:FS01:Uploading 2.26KiB to 140.163.4.231
19:49:56:WU03:FS01:Connecting to 140.163.4.231:8080
19:49:56:WU03:FS01:Upload complete
19:49:56:WU03:FS01:Server responded WORK_ACK (400)
Any thoughts?
I have a new build, folded great for a week, now one of the GPUs (Zotac GTX 780 fails folding evrytime. GTX 970 folds great.
I have tried swapping PCIE16 positions as well as pausing folding, deleting work folder and re-booting.
Using MSI Afterburner, from day 1 the GPUs have been underclocked 120 mhz and GPU temps are consistently in the low 70's C.
Build is AMD FX-8350, 16 GB RAM, Gigabyte990FXA-UD3 Motherboard, 1000w Rosewill PSU. GPU 1 is EVGA GTX 970, GPU 2 is Zotac GTX 780 OC. Driver Version is 9.18.13.4411. (344.16)
Here is some log on a GPU failure:
Log Started 2014-10-06T19:46:01Z ***********************
19:46:02:WU03:FS01:0x17:Project: 13001 (Run 236, Clone 3, Gen 11)
19:46:02:WU03:FS01:0x17:Unit: 0x0000001d538b3db7532892a3432b10e4
19:46:02:WU03:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:46:02:WU03:FS01:0x17:Machine: 1
19:46:02:WU03:FS01:0x17:Reading tar file state.xml
19:46:02:WU03:FS01:0x17:Reading tar file system.xml
19:46:03:WU03:FS01:0x17:Reading tar file integrator.xml
19:46:03:WU03:FS01:0x17:Reading tar file core.xml
19:46:03:WU03:FS01:0x17:Digital signatures verified
19:46:03:WU03:FS01:0x17:Folding@home GPU core17
19:46:03:WU03:FS01:0x17:Version 0.0.52
19:49:10:WU02:FS00:0xa4:Completed 390000 out of 500000 steps (78%)
19:49:55:WU03:FS01:0x17:ERROR:exception: Force RMSE error of 453.966 with threshold of 5
19:49:55:WU03:FS01:0x17:Saving result file logfile_01.txt
19:49:55:WU03:FS01:0x17:Saving result file log.txt
19:49:55:WU03:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
19:49:56:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:49:56:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:236 clone:3 gen:11 core:0x17 unit:0x0000001d538b3db7532892a3432b10e4
19:49:56:WU03:FS01:Uploading 2.26KiB to 140.163.4.231
19:49:56:WU03:FS01:Connecting to 140.163.4.231:8080
19:49:56:WU03:FS01:Upload complete
19:49:56:WU03:FS01:Server responded WORK_ACK (400)
Any thoughts?
-
- Posts: 410
- Joined: Mon Nov 15, 2010 8:51 pm
- Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces - Location: South Coast, UK
Re: GTX 780 BAd Work Units?
Are you sure which gpu is folding and which is failing?
That failure is typical of maxwell on core_17 which would be the 970. The client often mixes up the slots on a mixed GPU system so it might be worth checking the temps / usage of the gpu to be sure which is running. I had that problem with a 660 and 750ti in the same system.
If that's the issue, then it will be a case of manually setting the slot/cuda/opencl ids.
That failure is typical of maxwell on core_17 which would be the 970. The client often mixes up the slots on a mixed GPU system so it might be worth checking the temps / usage of the gpu to be sure which is running. I had that problem with a 660 and 750ti in the same system.
If that's the issue, then it will be a case of manually setting the slot/cuda/opencl ids.
Re: GTX 780 BAd Work Units?
That WU has been reissued to 5 different people and in each case, it failed. There's always a chance of bad WUs and the only way to identify them is to process them. They're reissued a few times and either completed or taken out of circulation. The assignments of this WU were all on 2014-10-05 and it was withdrawn from circulation.
It truly is a BAD_WORK_UNIT and this has NOTHING to do with the AS. Asking about a single failure after this much time has elapsed doesn't help anybody identify a problem which can be solved.
It truly is a BAD_WORK_UNIT and this has NOTHING to do with the AS. Asking about a single failure after this much time has elapsed doesn't help anybody identify a problem which can be solved.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: GTX 780 BAd Work Units?
RWH, I wondered the same thing so I un-installed F@H and re-installed with one GPU to verify, as I also thought perhaps it was actually the 970 failing. I confirmed it is the 780.
Bruce, as stated, I have tried deleting work folder and re-booting and try this multiple times per day. I get that it is not a single bad WU, just saying that is the log message. I could post mulitple logs from today with the exact same message but I don't see how that contributes anything either? I thought my post made it apparent this is not a single failure.
Bruce, as stated, I have tried deleting work folder and re-booting and try this multiple times per day. I get that it is not a single bad WU, just saying that is the log message. I could post mulitple logs from today with the exact same message but I don't see how that contributes anything either? I thought my post made it apparent this is not a single failure.
Re: GTX 780 BAd Work Units?
Please list the project/run/clone/gen of several more WUs that are failing.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: GTX 780 BAd Work Units?
13001 had a similar failure on Maxwells, only that there all WUs would fail instantly.
viewtopic.php?f=18&t=26807&start=60#p269635
This failure rather looks more similar to the failures of 10470-10473:
viewtopic.php?f=66&t=26528&start=60#p269314
Where the fault woudl happen during folding (with some WUs only from experience).
viewtopic.php?f=18&t=26807&start=60#p269635
This failure rather looks more similar to the failures of 10470-10473:
viewtopic.php?f=66&t=26528&start=60#p269314
Where the fault woudl happen during folding (with some WUs only from experience).
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
Re: GTX 780 BAd Work Units?
Will do Bruce. (It's going to suck if this GTX 780 has died after one week).
Here are a few from today:
18:56:41:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:56:41:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:69 clone:3 gen:42 core:0x17 unit:0x0000006c538b3db75328634d9a354a91
17:24:53:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:24:53:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:15 clone:8 gen:31 core:0x17 unit:0x00000040538b3db753285433c0ce9452
17:13:35:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:13:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:960 clone:1 gen:19 core:0x17 unit:0x00000025538b3db75310aabe24976893
As RWH suggested, If I didn't know better I would swear the system has the 970 and 780 confused.
Thanks.
Here are a few from today:
18:56:41:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:56:41:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:69 clone:3 gen:42 core:0x17 unit:0x0000006c538b3db75328634d9a354a91
17:24:53:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:24:53:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:13001 run:15 clone:8 gen:31 core:0x17 unit:0x00000040538b3db753285433c0ce9452
17:13:35:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:13:35:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13000 run:960 clone:1 gen:19 core:0x17 unit:0x00000025538b3db75310aabe24976893
As RWH suggested, If I didn't know better I would swear the system has the 970 and 780 confused.
Thanks.
Re: GTX 780 BAd Work Units?
project:13001 run:69 clone:3 gen:42. Bad WU. Failed repeatedly.
project:13001 run:15 clone:8 gen:31. Indeterminate until later. Failed only for you.
project:13000 run:960 clone:1 gen:19. Bad WU. Failed repeatedly.
project:13001 run:15 clone:8 gen:31. Indeterminate until later. Failed only for you.
project:13000 run:960 clone:1 gen:19. Bad WU. Failed repeatedly.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: GTX 780 BAd Work Units?
It is the same error i have on maxwell, now they running, but very slow. 980-PPD 17H, 750Ti 1D 12H on core 18 P1047x.
Re: GTX 780 BAd Work Units?
This may or may not help, but it can't hurt.
In a part of the log just before the part that you posted, you'll find a message something like
Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" xyz/FahCore_17.exe ...(arguments)
Carefully copy the entire colored directory including whatever actually appears where I've abbreviated a long path as xyz.
Stop your client.
Delete the file xyz/FahCore_17.exe
Restart the client.
(A new copy of xyz/FahCore_17.exe will download and work should resume.)
Let me know if this helps.
In a part of the log just before the part that you posted, you'll find a message something like
Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" xyz/FahCore_17.exe ...(arguments)
Carefully copy the entire colored directory including whatever actually appears where I've abbreviated a long path as xyz.
Stop your client.
Delete the file xyz/FahCore_17.exe
Restart the client.
(A new copy of xyz/FahCore_17.exe will download and work should resume.)
Let me know if this helps.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: GTX 780 BAd Work Units?
Bruce, no joy. I verified new core download:
20:06:00:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AM ... ore_17.fah
20:06:00:WU00:FS01:Downloading core from http://web.stanford.edu/~pande/Win32/AM ... ore_17.fah
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: GTX 780 BAd Work Units?
What version was downloaded? v52 or v55
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: GTX 780 BAd Work Units?
7im: Where do I get that? Don't see in core file properties.
Here is what log says:
20:06:04:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Eds Sled/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 3276 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
20:06:04:WU00:FS01:Started FahCore on PID 3788
Here is what log says:
20:06:04:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:/Users/Eds Sled/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe" -dir 00 -suffix 01 -version 704 -lifeline 3276 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
20:06:04:WU00:FS01:Started FahCore on PID 3788
Re: GTX 780 BAd Work Units?
Navigate to the directory with FahCore_17.exe in it and open a command prompt.
Type:
FahCore_17.exe --info
The version should be just under the ******Build****** line.
EDIT: The following is much easier.
Type:
FahCore_17.exe --info
The version should be just under the ******Build****** line.
EDIT: The following is much easier.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: GTX 780 BAd Work Units?
Or in the log file when the FAHCore and work unit starts up...
19:46:02:WU03:FS01:0x17:Unit: 0x0000001d538b3db7532892a3432b10e4
19:46:02:WU03:FS01:0x17:CPU: 0x00000000000000000000000000000000
19:46:02:WU03:FS01:0x17:Machine: 1
19:46:02:WU03:FS01:0x17:Reading tar file state.xml
19:46:02:WU03:FS01:0x17:Reading tar file system.xml
19:46:03:WU03:FS01:0x17:Reading tar file integrator.xml
19:46:03:WU03:FS01:0x17:Reading tar file core.xml
19:46:03:WU03:FS01:0x17:Digital signatures verified
19:46:03:WU03:FS01:0x17:Folding@home GPU core17
19:46:03:WU03:FS01:0x17:Version 0.0.52
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.