Failing units, low ppd, and returned units.
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
It would take me a while to look back.. I wonder what the 96xx projects are for research wise? If they are specific to say alzheimers, maybe we can designate our projects to fold for something else that way it will no pull the projects that fail. Let me go look at the project list and see what it returns. It may take a bit on my phone.
9637 Disease type: unspecified
9629 Disease type: unspecified.
B, can you designate your units to fold for Cancer specifically, and see if it picks those up?
It will be in the advanced settings menu.. I can't remember where. I will suggest this on our forums as well.
9637 Disease type: unspecified
9629 Disease type: unspecified.
B, can you designate your units to fold for Cancer specifically, and see if it picks those up?
It will be in the advanced settings menu.. I can't remember where. I will suggest this on our forums as well.
Re: Failing units, low ppd, and returned units.
Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.
My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
As frustrating as it is, I am still in Arizona. I get home Saturday night and will be setting up HFM and will be keep it in mind to use Dropbox as well.mmonnin wrote:Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.
My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
I am going to try to kick up a laptop, and set up a connection so that I can remotely control my PC while away from now on. I didn't have the money right away, but I will have it right after I arrive home.
-
- Posts: 474
- Joined: Sat Aug 01, 2015 1:34 am
- Hardware configuration: 1. 2 each E5-2630 V3 processors, 64 GB RAM, GTX980SC GPU, and GTX980 GPU running on windows 8.1 operating system.
2. I7-6950X V3 processor, 32 GB RAM, 1 GTX980tiFTW, and 2 each GTX1080FTW GPUs running on windows 8.1 operating system. - Location: New Mexico
Re: Failing units, low ppd, and returned units.
I don't believe I have had any 96XX projects run on my system. I have noted that almost all of my issues were on a factory overclocked GTX960. The GTX980 that I fold is bottom of the line, and it had no issues that I can recall.
Re: Failing units, low ppd, and returned units.
It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Failing units, low ppd, and returned units.
Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
Re: Failing units, low ppd, and returned units.
I have Removed the client-type altogether on both 980 Rigs.bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
So now as the norm we will have to wait several hours to see what come down the pike on both Rigs.
No Flag and I get Beta Projects ZETA_DEV is under Core 18 And UNKNOWN_ENUM 21 P9704 (R11, C8, G109)
This is the next one to fail.
Project: 9641 Failed *OPENMM_21 21 P9641 this is not one of the UNKNOWN_ENUM Core
Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB
Last edited by bcavnaugh on Wed Nov 18, 2015 3:52 pm, edited 4 times in total.
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
Re: Failing units, low ppd, and returned units.
Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
Re: Failing units, low ppd, and returned units.
Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB
15:30:29:WU02:FS00:0x21:Completed 780000 out of 2000000 steps (39%)
15:31:56:WU02:FS00:0x21:Completed 800000 out of 2000000 steps (40%)
15:32:05:WU02:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
15:32:05:WU02:FS00:0x21:Max number of retries reached. Aborting.
15:32:05:WU02:FS00:0x21:ERROR:Max Retries Reached
15:32:05:WU02:FS00:0x21:Saving result file logfile_01.txt
15:32:05:WU02:FS00:0x21:Saving result file log.txt
15:32:05:WU02:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:32:05:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:32:05:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9641 run:0 clone:37 gen:31 core:0x21 unit:0x0000002eab436c9b5609bee4be719abe
15:32:05:WU02:FS00:Uploading 12.50KiB to 171.67.108.155
15:32:05:WU02:FS00:Connecting to 171.67.108.155:8080
15:32:06:WU02:FS00:Upload complete
15:32:06:WU02:FS00:Server responded WORK_ACK (400)
15:30:29:WU02:FS00:0x21:Completed 780000 out of 2000000 steps (39%)
15:31:56:WU02:FS00:0x21:Completed 800000 out of 2000000 steps (40%)
15:32:05:WU02:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
15:32:05:WU02:FS00:0x21:Max number of retries reached. Aborting.
15:32:05:WU02:FS00:0x21:ERROR:Max Retries Reached
15:32:05:WU02:FS00:0x21:Saving result file logfile_01.txt
15:32:05:WU02:FS00:0x21:Saving result file log.txt
15:32:05:WU02:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:32:05:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:32:05:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9641 run:0 clone:37 gen:31 core:0x21 unit:0x0000002eab436c9b5609bee4be719abe
15:32:05:WU02:FS00:Uploading 12.50KiB to 171.67.108.155
15:32:05:WU02:FS00:Connecting to 171.67.108.155:8080
15:32:06:WU02:FS00:Upload complete
15:32:06:WU02:FS00:Server responded WORK_ACK (400)
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
-
- Posts: 7
- Joined: Mon Mar 18, 2013 3:19 pm
Re: Failing units, low ppd, and returned units.
I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPUbruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm also running my client without any flags on it.
I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.
Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
Re: Failing units, low ppd, and returned units.
This maybe somewhat true but not for the normal user.z999z3mystorys wrote:I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPUbruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm also running my client without any flags on it.
I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.
Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
Not all users use overclocking or downclocking software to set the Graphics Cards.
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
Re: Failing units, low ppd, and returned units.
Round 1 Log Files
https://docs.google.com/document/d/1kgN ... sp=sharing
https://docs.google.com/document/d/1i64 ... sp=sharing
My computers are going back the their default Overclocked Settings and setting the GPU back to 1500 MHz
https://docs.google.com/document/d/1kgN ... sp=sharing
https://docs.google.com/document/d/1i64 ... sp=sharing
My computers are going back the their default Overclocked Settings and setting the GPU back to 1500 MHz
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
Re: Failing units, low ppd, and returned units.
At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
bruce wrote:At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l
Bcavnaugh is showing that the VRAM has been lowered from stock 7000mhz to 6000mhz and lower.. Well lower than stock speeds and even lower than last generation speeds.
The problems persist.
He lowered all clocks ridiculously low, removing any factory overclock and all memory clocks. This shows that although everyone isn't experiencing it, that lowering the clocks does not fix the issue in any way shape or form.
Since it is spread over multiple systems, and is usually 96xx series work units as well as some 97xx work units, maybe they just aren't compatible with Maxwell cards?
P.S. This isn't a witch hunt, as you stated before. This is fact spread over multiple systems with Core clocks lowered to stock and memory clocks lowered well below stock. The issue persists, and these guys are burning a lot of electricity trying to find a cute so Stanford can find more cures.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Failing units, low ppd, and returned units.
Yes, The problem persists. Lowering the memory clocks was never a solution, simply a workaround that has helped some people finish more work units. This is while we wait for Stanford to revise and improve the core.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.