9401 fails on GM107 but not GK106 {Hopefully fixed}

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

bfromcolo
Posts: 56
Joined: Fri Mar 01, 2013 1:12 am

Re: 9401 fails on 750ti

Post by bfromcolo »

Same issue in Ubuntu 12.04 with the latest NVIDIA Linux driver 334.21. If there is any additional information I can provide let me know.

12:53:33:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
12:53:33:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:53:33:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9401 run:439 clone:0 gen:20 core:0x17 unit:0x0000001e6652edaf52eaf025adc7f1ce
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti

Post by bruce »

Windows monitors various hardware devices, including the GPU, and makes sure it's working. (You've probably seen something like their messages from IE saying site xxx is not responding when bringing up a new page takes longer than they're expecting, sometimes followed immediately by a display of that page.) Windows issues a GPU_RESET which is supposed to get the computer working again and reports it in the event log. FAH generally reports BAD_WORK_UNIT (114=0x72).

Does anybody know if Linux resets the GPU if it fails to respond within some predefined time-limit? Does Linux have a log of such events? If it does, was there an event recorded at that same time? I have not researched how Linux deals with a hung GPU, but I'd be surprised if it didn't have something similar to what MS is doing. (I thought I'd ask you guys first.)
Freightanimal
Posts: 9
Joined: Tue Mar 11, 2014 11:12 pm
Location: Pennsylvania

Re: 9401 fails on 750ti

Post by Freightanimal »

That answers my question then. I was hoping that would be a work around so I could at least get this puppy folding. Does anyone know if there is a way to exclude core 17 projects so I could at least work on core 15 work units?
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: 9401 fails on 750ti

Post by P5-133XL »

You can exclude Core_17 by running the v6 client. To my knowledge, that is the only way, currently.
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: 9401 fails on 750ti

Post by PantherX »

Does the WU fail immediately or does it run for X time and then fails? Is your GPU operating within normal temperature range? I do hope that you are using 64-bit Ubuntu version as it is required for GPU folding.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
uddarts
Posts: 21
Joined: Sun Feb 26, 2012 3:10 pm

Re: 9401 fails on 750ti

Post by uddarts »

PantherX wrote:Does the WU fail immediately or does it run for X time and then fails? Is your GPU operating within normal temperature range? I do hope that you are using 64-bit Ubuntu version as it is required for GPU folding.
the last one i attempted to run, i monitored it with gpuz and the core never showed more than an occasional 1% blip. it would registered failed after about 15 minutes.


ud
win7 64bit / amd 630 cpu 3 / 750ti - gm 107 / 337.88 drivers / v7.4.4
Meiz
Posts: 9
Joined: Thu Mar 13, 2014 6:44 am

Re: 9401 fails on 750ti

Post by Meiz »

Has there been any news of an update for core 17? I didn't want to leave my GPU idling, so I removed the Advanced flag and I had been lucky in getting nothing but core 15 work units since. It just got it's first batch of p8900s and as expected they all failed. I hope they come up with a fix soon, with core 17 being released to full F@H I can't fold at all until they do.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 9401 fails on 750ti

Post by 7im »

I don't see anything posted on the news blog about a core update, so no news.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
tofuwombat
Posts: 19
Joined: Mon Nov 22, 2010 4:06 pm

Re: 9401 fails on 750ti plus one here

Post by tofuwombat »

For me, Maxwell hardware GTX750ti FTW Does NOT fold on v7.
(with or without client-type advanced. "Runs" for several minutes before failing. No steps complete. same same behavior when heavily underclocked)

Good temps.

Even overclocking (100% power target on precision)
it tests ok:
Explicit: 25.4167 ns/day
Implicit: 104.916 ns/day
"Verify Accuracy" selected

Card: evga P/N: 02G-P4-3757-KR
OS: Win7 pro 64 bit sp1
Nvidia Drivers: 335.23
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti plus one here

Post by bruce »

tofuwombat wrote:For me, Maxwell hardware GTX750ti FTW Does NOT fold on v7.
(with or without client-type advanced. "Runs" for several minutes before failing. No steps complete. same same behavior when heavily underclocked)
The Pande Group is aware of the fact that Maxwell is not yet supported, but there's no ETA for a fix.

In my experience, FahCore_17 spends several minutes doing CPU calculations before it actually starts using the GPU. That would have nothing to do with the type of GPU you have or whether it's supported yet.

Notice that between 20:36:56 and 20:42:16, some 5.3 minutes passed initializing (before starting the first frame!)
Notice that between 20:42:16 and 20:47:11, some 4.9 minutes passed doing the first frame.
After that, frame 2 is some 4.3 minutes, frame 3 is 4.2 minutes, frame 4 is 4.3 minutes, and frame 5 is 4.3 minutes.

The actual times will depend on the project and on your hardware, of course, but the 5.3 minutes of initiaztion seems to be based on CPU speed and the things I'm calling frame times depend mostly GPU speed. From that, I predict a unsupported GPU would have died about 20:42:16.

Code: Select all

20:36:56:WU00:FS01:0x17:Reading tar file core.xml
20:36:56:WU00:FS01:0x17:Digital signatures verified
20:36:56:WU00:FS01:0x17:Folding@home GPU core17
20:42:16:WU00:FS01:0x17:Completed 0 out of 1000000 steps (0%)
20:42:16:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
20:47:11:WU00:FS01:0x17:Completed 10000 out of 1000000 steps (1%)
20:51:27:WU00:FS01:0x17:Completed 20000 out of 1000000 steps (2%)
20:55:44:WU00:FS01:0x17:Completed 30000 out of 1000000 steps (3%)
20:59:59:WU00:FS01:0x17:Completed 40000 out of 1000000 steps (4%)
21:04:14:WU00:FS01:0x17:Completed 50000 out of 1000000 steps (5%)
Freightanimal
Posts: 9
Joined: Tue Mar 11, 2014 11:12 pm
Location: Pennsylvania

Re: 9401 fails on 750ti

Post by Freightanimal »

Meiz wrote:Has there been any news of an update for core 17? I didn't want to leave my GPU idling, so I removed the Advanced flag and I had been lucky in getting nothing but core 15 work units since. It just got it's first batch of p8900s and as expected they all failed. I hope they come up with a fix soon, with core 17 being released to full F@H I can't fold at all until they do.
I have had some of that luck in getting some core 15 work units now also. Made me wonder if there was anything that could be done server side to give out only the core 15 work units to this card in particular
tofuwombat
Posts: 19
Joined: Mon Nov 22, 2010 4:06 pm

Re: 9401 fails on 750ti

Post by tofuwombat »

First, thanks for the quick response, I always find good help here.
bruce wrote: The Pande Group is aware of the fact that Maxwell is not yet supported, but there's no ETA for a fix.

[/code]
In light of the above, why is 750ti in the whitelist???

0x10de:0x1380:2:4:GM107 [GeForce GTX 750 Ti] :?
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 9401 fails on 750ti

Post by 7im »

tofuwombat wrote:First, thanks for the quick response, I always find good help here.
bruce wrote: The Pande Group is aware of the fact that Maxwell is not yet supported, but there's no ETA for a fix.

[/code]
In light of the above, why is 750ti in the whitelist???

0x10de:0x1380:2:4:GM107 [GeForce GTX 750 Ti] :?

By luck, or whatever, Fahcore_15 seems to fold on Maxwell without any modifications needed. Seems the CUDA core was written well enough to keep working with new CUDA hardware.

OpenCL in Fahcore_17 does not appear to be the same.

If that hardware were blacklisted, they couldn't fold any WUs. And folding some are better than none.

There is a reason they call the newest hardware the bleeding edge. Always check the requirements section of the install guide to see if the new hardware is supported. If that is unclear there, ask here!
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 9401 fails on 750ti

Post by bruce »

Another reason: Suppose Development is working on a change that would fix the problem with Maxwell (maybe currently true, maybe not) They'd have to test it ... so it would have to be whitelisted.

Suppose there's a serious bug in nVidia's OpenCL support that's not in their CUDA support .... Nothing can be done by Stanford until nVidia fixes OpenCL. (I have no facts supporting this assumption, nor any proving it's not true either.)
tofuwombat
Posts: 19
Joined: Mon Nov 22, 2010 4:06 pm

Re: 9401 fails on 750ti

Post by tofuwombat »

bruce wrote: . . .have to test it ... so it would have to be whitelisted.
The need for development is clear, hopefully they have their own lists.

I thought the point of the public whitelist was to identify supported hardware.
Is there some other list that tells which cards "just work"???

Perhaps there is room for a "shiny" list that ACTUALLY works, without failing on V7.

Easier to supply than the much wanted flag to only pickup core_blah . . .
Post Reply