Page 4 of 8
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:24 am
by Scarlet-Tech
It would take me a while to look back.. I wonder what the 96xx projects are for research wise? If they are specific to say alzheimers, maybe we can designate our projects to fold for something else that way it will no pull the projects that fail. Let me go look at the project list and see what it returns. It may take a bit on my phone.
9637 Disease type: unspecified
9629 Disease type: unspecified.
B, can you designate your units to fold for Cancer specifically, and see if it picks those up?
It will be in the advanced settings menu.. I can't remember where. I will suggest this on our forums as well.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:37 am
by mmonnin
Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.
My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 4:10 am
by Scarlet-Tech
mmonnin wrote:Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.
My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
As frustrating as it is, I am still in Arizona. I get home Saturday night and will be setting up HFM and will be keep it in mind to use Dropbox as well.
I am going to try to kick up a laptop, and set up a connection so that I can remotely control my PC while away from now on. I didn't have the money right away, but I will have it right after I arrive home.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 11:41 am
by Ricky
I don't believe I have had any 96XX projects run on my system. I have noted that almost all of my issues were on a factory overclocked GTX960. The GTX980 that I fold is bottom of the line, and it had no issues that I can recall.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 2:50 pm
by bruce
It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:36 pm
by bcavnaugh
Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:41 pm
by bcavnaugh
bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I have Removed the client-type altogether on both 980 Rigs.
So now as the norm we will have to wait several hours to see what come down the pike on both Rigs.
No Flag and I get Beta Projects ZETA_DEV is under Core 18 And UNKNOWN_ENUM 21 P9704 (R11, C8, G109)
This is the next one to fail.
Project: 9641 Failed *OPENMM_21 21 P9641 this is not one of the UNKNOWN_ENUM Core
Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:42 pm
by bcavnaugh
Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC
11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 3:46 pm
by bcavnaugh
Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB
15:30:29:WU02:FS00:0x21:Completed 780000 out of 2000000 steps (39%)
15:31:56:WU02:FS00:0x21:Completed 800000 out of 2000000 steps (40%)
15:32:05:WU02:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
15:32:05:WU02:FS00:0x21:Max number of retries reached. Aborting.
15:32:05:WU02:FS00:0x21:ERROR:Max Retries Reached
15:32:05:WU02:FS00:0x21:Saving result file logfile_01.txt
15:32:05:WU02:FS00:0x21:Saving result file log.txt
15:32:05:WU02:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:32:05:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:32:05:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9641 run:0 clone:37 gen:31 core:0x21 unit:0x0000002eab436c9b5609bee4be719abe
15:32:05:WU02:FS00:Uploading 12.50KiB to 171.67.108.155
15:32:05:WU02:FS00:Connecting to 171.67.108.155:8080
15:32:06:WU02:FS00:Upload complete
15:32:06:WU02:FS00:Server responded WORK_ACK (400)
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 6:15 pm
by z999z3mystorys
bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPU
I'm also running my client without any flags on it.
I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.
Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
Re: Failing units, low ppd, and returned units.
Posted: Wed Nov 18, 2015 9:12 pm
by bcavnaugh
z999z3mystorys wrote:bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPU
I'm also running my client without any flags on it.
I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.
Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
This maybe somewhat true but not for the normal user.
Not all users use overclocking or downclocking software to set the Graphics Cards.
Re: Failing units, low ppd, and returned units.
Posted: Thu Nov 19, 2015 3:19 am
by bcavnaugh
Round 1 Log Files
https://docs.google.com/document/d/1kgN ... sp=sharing
https://docs.google.com/document/d/1i64 ... sp=sharing
My computers are going back the their default Overclocked Settings and setting the GPU back to 1500 MHz
Re: Failing units, low ppd, and returned units.
Posted: Thu Nov 19, 2015 4:29 am
by bruce
At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l
Re: Failing units, low ppd, and returned units.
Posted: Thu Nov 19, 2015 4:41 am
by Scarlet-Tech
bruce wrote:At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l
Bcavnaugh is showing that the VRAM has been lowered from stock 7000mhz to 6000mhz and lower.. Well lower than stock speeds and even lower than last generation speeds.
The problems persist.
He lowered all clocks ridiculously low, removing any factory overclock and all memory clocks. This shows that although everyone isn't experiencing it, that lowering the clocks does not fix the issue in any way shape or form.
Since it is spread over multiple systems, and is usually 96xx series work units as well as some 97xx work units, maybe they just aren't compatible with Maxwell cards?
P.S. This isn't a witch hunt, as you stated before. This is fact spread over multiple systems with Core clocks lowered to stock and memory clocks lowered well below stock. The issue persists, and these guys are burning a lot of electricity trying to find a cute so Stanford can find more cures.
Re: Failing units, low ppd, and returned units.
Posted: Thu Nov 19, 2015 4:53 am
by 7im
Yes, The problem persists. Lowering the memory clocks was never a solution, simply a workaround that has helped some people finish more work units. This is while we wait for Stanford to revise and improve the core.