Failing units, low ppd, and returned units.
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Failing units, low ppd, and returned units.
I am not new to Folding, I am new to the forums. I have been on EVGA forums and folding team for 2 years now.
My system upgrades have always been about bettering the work units competed and building a beautiful system. When I set it up, I usually make sure that my system is folding and working well when I am not gaming.
Well, I am currently out of state, and I am watching my ppd crash, work units fail constantly, and my system continues to burn electricity like there is no tomorrow.. Normally, I get 1.42-1.54m ppd and 20-30 work units completed. Lately, I am getting 14 or so work units completed and the PPD has dropped to 1.2 at the high point. This all started after the EVGA folding challenge for November started. There are lots of failed units reported, and I was wondering if Stanford has any update as to what is going on.
I will be frankly honest. I have avoided these forums since I started folding hearing that Stanford loves to blame Nvidia for failed units.. Since I have been using the same system and the problem suddenly arises without changing anything, especially nothing from nvidia, it would seem that Stanford is sending out tons of bad work units. I have never had more than a couple of failed units on the drivers I am using now, but it would seem that Stanford has still been pointing the finger away from themselves.
Could Stanford please look into this issue? It's obviously not Nvidia, since Nothing on my system changed, only the work units received.
Since Stanford is getting free use of thousands of computers, looking into this would be beneficial so that they can get more research completed. This is for everyone's benefit, but Stanford gets to use our hardware for free. They can remove the obviously bad and failing units to find what is causing them to fail, and send out units with less issues.
Also, I hear the mods like to send warning for threads like this on these forums. If you feel a warning needs to be sent, please send a good explanation as to what I have done wrong. I will copy and post this on the other forums that deal with folding and see if they can provide insight as well.
If Stanford wants free hardware to do the work for them, they should invest the time to make sure everything is smooth for those that are helping them.
System (everything is 100% stock. No over clocks thanks to failing units constantly) :
I7 5960x, Rampage V Extreme
Four 980 K|ngp|ns driver 355 (not new, downgraded from 357)
FAH Client 4.4
My system upgrades have always been about bettering the work units competed and building a beautiful system. When I set it up, I usually make sure that my system is folding and working well when I am not gaming.
Well, I am currently out of state, and I am watching my ppd crash, work units fail constantly, and my system continues to burn electricity like there is no tomorrow.. Normally, I get 1.42-1.54m ppd and 20-30 work units completed. Lately, I am getting 14 or so work units completed and the PPD has dropped to 1.2 at the high point. This all started after the EVGA folding challenge for November started. There are lots of failed units reported, and I was wondering if Stanford has any update as to what is going on.
I will be frankly honest. I have avoided these forums since I started folding hearing that Stanford loves to blame Nvidia for failed units.. Since I have been using the same system and the problem suddenly arises without changing anything, especially nothing from nvidia, it would seem that Stanford is sending out tons of bad work units. I have never had more than a couple of failed units on the drivers I am using now, but it would seem that Stanford has still been pointing the finger away from themselves.
Could Stanford please look into this issue? It's obviously not Nvidia, since Nothing on my system changed, only the work units received.
Since Stanford is getting free use of thousands of computers, looking into this would be beneficial so that they can get more research completed. This is for everyone's benefit, but Stanford gets to use our hardware for free. They can remove the obviously bad and failing units to find what is causing them to fail, and send out units with less issues.
Also, I hear the mods like to send warning for threads like this on these forums. If you feel a warning needs to be sent, please send a good explanation as to what I have done wrong. I will copy and post this on the other forums that deal with folding and see if they can provide insight as well.
If Stanford wants free hardware to do the work for them, they should invest the time to make sure everything is smooth for those that are helping them.
System (everything is 100% stock. No over clocks thanks to failing units constantly) :
I7 5960x, Rampage V Extreme
Four 980 K|ngp|ns driver 355 (not new, downgraded from 357)
FAH Client 4.4
Re: Failing units, low ppd, and returned units.
I don't see any evidence that Stanford is "sending out tons of bad work units" -- only that something has happened to your system that we can only guess about. (Without the information requested in my sig, all we can do is guess.)
I don't know if this information will be useful, but scarlet_tech apparently is running 30 slots. (Counting twice for recent reinstalls.) The last WU returned from 27 of them all seem to have earned reasonable points. The last WU from three of them have received 0 points.
2015-09-05 22:05:50 p10495 r30 c2 g33
2015-09-28 04:07:00 p9835 r68 c6 g9
2015-10-06 18:07:55 p9430 r56 c2 g111
Only the last one appears to be recent. That particular WU was reassigned and successfully completed by someone else about 9.6 hours later so it's not a bad WU.
What information can you provide to explain which work units or systems are failing "constantly" and what kind of failures are they?
I don't know if this information will be useful, but scarlet_tech apparently is running 30 slots. (Counting twice for recent reinstalls.) The last WU returned from 27 of them all seem to have earned reasonable points. The last WU from three of them have received 0 points.
2015-09-05 22:05:50 p10495 r30 c2 g33
2015-09-28 04:07:00 p9835 r68 c6 g9
2015-10-06 18:07:55 p9430 r56 c2 g111
Only the last one appears to be recent. That particular WU was reassigned and successfully completed by someone else about 9.6 hours later so it's not a bad WU.
What information can you provide to explain which work units or systems are failing "constantly" and what kind of failures are they?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
Hi Bruce,
Currently I am in Arizona (PC in Delaware).
Folding 4 slots, all 980's.
My issues come from checking here: http://folding.extremeoverclocking.com/ ... =&u=654307
Notice that 1 week ago, I was completing many more units that I am on now. I am trying to determine if there is a hardware failure or a if this is the work units themselves.
If I could post links, which are limited for new users so I won't be able to just yet,
Currently, EVGA folding team is trying to break 3 billion points in one month, and the forums have been bustling with many users getting lots of failed units. When they started posting this information, my ppd started dropping as mentioned above.
I can see, just at this latest update, that 3 work units completed over the 3 hour time frame. This is a good thing, but going from 20+ completed work units per day to 10-14 is concerning for me.
Once I get home, 11 days from now, I will be able to post a long log off errors, as my roommate is unable to figure out how to do it. I apologize for not being able to provide the logs, but since there are many users on the EVGA forums that are providing the information with the same hardware, I will try to keep an eye open for the exact work units they are experiencing the errors with.
*edit/addition* could you provide the link where you are able to see completed and reassigned units so I would be able to watch that?
Currently I am in Arizona (PC in Delaware).
Folding 4 slots, all 980's.
My issues come from checking here: http://folding.extremeoverclocking.com/ ... =&u=654307
Notice that 1 week ago, I was completing many more units that I am on now. I am trying to determine if there is a hardware failure or a if this is the work units themselves.
If I could post links, which are limited for new users so I won't be able to just yet,
Currently, EVGA folding team is trying to break 3 billion points in one month, and the forums have been bustling with many users getting lots of failed units. When they started posting this information, my ppd started dropping as mentioned above.
I can see, just at this latest update, that 3 work units completed over the 3 hour time frame. This is a good thing, but going from 20+ completed work units per day to 10-14 is concerning for me.
Once I get home, 11 days from now, I will be able to post a long log off errors, as my roommate is unable to figure out how to do it. I apologize for not being able to provide the logs, but since there are many users on the EVGA forums that are providing the information with the same hardware, I will try to keep an eye open for the exact work units they are experiencing the errors with.
*edit/addition* could you provide the link where you are able to see completed and reassigned units so I would be able to watch that?
-
- Posts: 474
- Joined: Sat Aug 01, 2015 1:34 am
- Hardware configuration: 1. 2 each E5-2630 V3 processors, 64 GB RAM, GTX980SC GPU, and GTX980 GPU running on windows 8.1 operating system.
2. I7-6950X V3 processor, 32 GB RAM, 1 GTX980tiFTW, and 2 each GTX1080FTW GPUs running on windows 8.1 operating system. - Location: New Mexico
Re: Failing units, low ppd, and returned units.
Scarlet_tech,
I went down to driver 347.88 and have had less problems. I have not detected a bad WU in the 10 days that I have been folding with this driver.
I went down to driver 347.88 and have had less problems. I have not detected a bad WU in the 10 days that I have been folding with this driver.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
Ricky, I am on Windows 10, so 352 is the earliest driver I can go back to, unfortunately. I may try to get a new win 7 key when I get home if that will provide stability to the folding system.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
bruce wrote:I don't see any evidence that Stanford is "sending out tons of bad work units" -- only that something has happened to your system that we can only guess about. (Without the information requested in my sig, all we can do is guess.)
I don't know if this information will be useful, but scarlet_tech apparently is running 30 slots. (Counting twice for recent reinstalls.) The last WU returned from 27 of them all seem to have earned reasonable points. The last WU from three of them have received 0 points.
2015-09-05 22:05:50 p10495 r30 c2 g33
2015-09-28 04:07:00 p9835 r68 c6 g9
2015-10-06 18:07:55 p9430 r56 c2 g111
Only the last one appears to be recent. That particular WU was reassigned and successfully completed by someone else about 9.6 hours later so it's not a bad WU
Could you please provide a link where you see this information, please?
Re: Failing units, low ppd, and returned units.
Sorry, no. The Pande Group has restricted that data to forum Moderators only.scarlet_tech wrote:Could you please provide a link where you see this information, please?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
So, it is OK to post stuff like that, but not share the link to it. Makes sense. Wouldn't want the truth out there I guess.bruce wrote:Sorry, no. The Pande Group has restricted that data to forum Moderators only.scarlet_tech wrote:Could you please provide a link where you see this information, please?
I will share some posts from EVGA, since they aren't hidden and may be helpful.
Since forum Moderators can look up information, my name here does not match my folding name. My folding name is Scarlet-Tech user 654307 according to extreme over clocking. The results you pulled were for scarlet_tech. I mistyped when entering my forum name.
Scott over at bjorn3d is even reporting having to go back multiple drivers in an attempt to find stable drivers.Mekhed wrote: I'm gonna say that you're not wrong. Both of my machines have been rock solid for months folding. I had what I expected the first 3 days of the challenge and on day 4 I also dropped about 300k ppd. The last 7 days have been a struggle just to get WU's to finish and not be returned as "bad work units". I've changed drivers and lowered video card memory speeds and still having problems. You're not wrong Scarlet, something changed on day 4
Here is another user with the same hardware as me, who can report his errors that are occurring.
*********************** Log Started 2015-11-09T20:53:12Z ***********************
20:54:09:WU00:FS01:0x21:ERROR:Potential energy error of 805.531, threshold of 10
20:54:09:WU00:FS01:0x21:ERROR:Reference Potential Energy: -1.23368e+006 | Given Potential Energy: -1.23287e+006
20:54:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
I can go through 30 pages of conversation and copy and paste all of the information, but I can not post links, or I would just link the article and which pages to view.
The information is available to show that isn't just one of two people experiencing the issue, but our team number dropped substantially with the same number of folders pushing out units. This started on November 4th,and has been continuous since then.
I understand you are a moderator, and that I can not provide my own stats, but the evidence is overwhelming. From more than one team, and I am just raising the issue as it needs to be corrected. Since we have 20 people continuously trying to find an actual solution, it would be good to have the support of Stanford in this venture.
I will continue to post edits to this thread and provide more failed units from other members as they post them, so that it can't be ignored since our entire team and other teams are experiencing this issue:
11:25:26:WU02:FS01:0x21:Completed 1300000 out of 2000000 steps (65%)
11:25:33:WU02:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
11:25:33:WU02:FS01:0x21:Max number of retries reached. Aborting.
11:25:33:WU02:FS01:0x21:ERROR:Max Retries Reached
11:25:33:WU02:FS01:0x21:Saving result file logfile_01.txt
11:25:33:WU02:FS01:0x21:Saving result file log.txt
11:25:33:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:25:34:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:25:34:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9625 run:1 clone:1 gen:44
21:39:06:WU00:FS01:0x21:ERROR:Bad platformId size.
21:39:07:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
Log Started 2015-11-09T20:53:12Z ***********************
20:54:09:WU00:FS01:0x21:ERROR:Potential energy error of 805.531, threshold of 10
20:54:09:WU00:FS01:0x21:ERROR:Reference Potential Energy: -1.23368e+006 | Given Potential Energy: -1.23287e+006
20:54:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:11:11:WU02:FS03:0x21:ERROR:exception: Error downloading array velm: clEnqueueReadBuffer (-5)
23:11:11:WU02:FS03:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
23:11:12:WARNING:WU02:FS03:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:11:12:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:9704 run:64 clone:18 gen:72 core:0x21 unit:0x00000063ab404162553ec5d398a809a5
980HC
*********************** Log Started 2015-11-10T01:01:06Z ***********************
******************************* Date: 2015-11-10 *******************************
10:20:52:WARNING:WU02:FS03:Server did not like results, dumping
******************************* Date: 2015-11-10 *******************************
10:20:52:WU02:FS03:Upload 99.93%
10:20:52:WU02:FS03:Upload complete
10:20:52:WU02:FS03:Server responded WORK_QUIT (404)
10:20:52:WARNING:WU02:FS03:Server did not like results, dumping
10:20:52:WU02:FS03:Cleaning up
980HB
*********************** Log Started 2015-11-10T01:01:47Z ***********************
******************************* Date: 2015-11-10 *******************************
11:54:03:WARNING:WU00:FS01:Server did not like results, dumping
12:21:14:WU02:FS01:0x21:ERROR:Max Retries Reached
12:21:14:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:22:07:WU00:FS00:0x21:ERROR:Max Retries Reached
12:22:08:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
******************************* Date: 2015-11-10 *******************************
15:03:28:WU00:FS00:0x21:ERROR:Max Retries Reached
15:03:29:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:53:58:WU00:FS01:Upload 95.92%
11:54:03:WU00:FS01:Upload complete
11:54:03:WU00:FS01:Server responded WORK_QUIT (404)
11:54:03:WARNING:WU00:FS01:Server did not like results, dumping
11:54:03:WU00:FS01:Cleaning up
12:14:03:WU02:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
12:15:27:WU02:FS01:0x21:Completed 120000 out of 2000000 steps (6%)
12:16:52:WU02:FS01:0x21:Completed 140000 out of 2000000 steps (7%)
12:18:16:WU02:FS01:0x21:Completed 160000 out of 2000000 steps (8%)
12:19:41:WU02:FS01:0x21:Completed 180000 out of 2000000 steps (9%)
12:21:06:WU02:FS01:0x21:Completed 200000 out of 2000000 steps (10%)
12:21:14:WU02:FS01:0x21:Bad State detected... attempting to resume from last good checkpoint
12:21:14:WU02:FS01:0x21:Max number of retries reached. Aborting.
12:21:14:WU02:FS01:0x21:ERROR:Max Retries Reached
12:21:14:WU02:FS01:0x21:Saving result file logfile_01.txt
12:21:14:WU02:FS01:0x21:Saving result file log.txt
12:21:14:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
12:21:14:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:21:14:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9630 run:1 clone:23 gen:46 core:0x21 unit:0x00000041ab436c9b5609bee22119aafb
12:21:14:WU02:FS01:Uploading 9.50KiB to 171.67.108.155
12:21:14:WU02:FS01:Connecting to 171.67.108.155:8080
12:21:14:WU02:FS01:Upload complete
12:21:14:WU02:FS01:Server responded WORK_ACK (400)
12:21:14:WU02:FS01:Cleaning up
12:22:00:WU00:FS00:0x21:Completed 100000 out of 2000000 steps (5%)
12:22:07:WU00:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
12:22:07:WU00:FS00:0x21:Max number of retries reached. Aborting.
12:22:07:WU00:FS00:0x21:ERROR:Max Retries Reached
12:22:07:WU00:FS00:0x21:Saving result file logfile_01.txt
12:22:07:WU00:FS00:0x21:Saving result file log.txt
12:22:07:WU00:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
12:22:08:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:22:08:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:9629 run:0 clone:23 gen:37 core:0x21 unit:0x0000002fab436c9b5609bee23824a870
12:22:08:WU00:FS00:Uploading 8.50KiB to 171.67.108.155
12:22:08:WU00:FS00:Connecting to 171.67.108.155:8080
12:22:08:WU00:FS00:Upload complete
12:22:08:WU00:FS00:Server responded WORK_ACK (400)
12:22:08:WU00:FS00:Cleaning up
15:03:21:WU00:FS00:0x21:Completed 300000 out of 2000000 steps (15%)
15:03:28:WU00:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
15:03:28:WU00:FS00:0x21:Max number of retries reached. Aborting.
15:03:28:WU00:FS00:0x21:ERROR:Max Retries Reached
15:03:28:WU00:FS00:0x21:Saving result file logfile_01.txt
15:03:28:WU00:FS00:0x21:Saving result file log.txt
15:03:28:WU00:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:03:29:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:03:29:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:9634 run:1 clone:40 gen:14 core:0x21 unit:0x00000015ab436c9b5609bee3eb2b8f6f
15:03:29:WU00:FS00:Uploading 10.00KiB to 171.67.108.155
15:03:29:WU00:FS00:Connecting to 171.67.108.155:8080
15:03:29:WU00:FS00:Upload complete
15:03:29:WU00:FS00:Server responded WORK_ACK (400)
15:03:29:WU00:FS00:Cleaning up
These are all just a tiny example of errors that are occurring now, and I am trying to get all EVGA folders on board to post every single bad unit that is received across all platforms.. The above listed platforms are nearly identical to my system.
Re: Failing units, low ppd, and returned units.
scarlet_tech wrote:Some are helpful, many are not.bruce wrote:I will share some posts from EVGA, since they aren't hidden and may be helpful.
There's no requirement that you name match, but when I found numerous reports from the name you gave me, I made a (reasonable?) assumption. My Bad.Since forum Moderators can look up information, my name here does not match my folding name. My folding name is Scarlet-Tech user 654307 according to extreme over clocking. The results you pulled were for scarlet_tech. I mistyped when entering my forum name.
If you want me to correct your mis-typed name, send me a PM.
"rock solid for months" is NOT the same as not overclocked, and words to that effect suggest that the machine is overclocked but has been stable on previous assignments. If the GPU is overclocked you're responsible for it, not Stanford, as they do not support overclocking. Some of the new projects do use the hardware more effectively, leading to a higher that normal failure rate for machines which are overclocked -- especially overclocked VRAM on Maxwell hardware.Mekhed wrote: I'm gonna say that you're not wrong. Both of my machines have been rock solid for months folding. I had what I expected the first 3 days of the challenge and on day 4 I also dropped about 300k ppd. The last 7 days have been a struggle just to get WU's to finish and not be returned as "bad work units". I've changed drivers and lowered video card memory speeds and still having problems. You're not wrong Scarlet, something changed on day 4
Several of the new projects have been intentionally restricted to client-type=beta. [The reason they're identified as beta is because they're more likely to encounter instabilities. That warning has not changed, even though the failure rate can change.] If the projects that you're reporting are beta WUs and they happen to be unstable, remove beta from your configuration until you figure out how to keep your hardware stable.
Reports like this are essentially meaningless. I can only guess at the missing information since no WU is identified, no FahCore is identified, and the system being used is not documented per my previous instructions. Yes, an instability has been encountered but I hesitate to speculate about the cause beyond what I've already said.*********************** Log Started 2015-11-09T20:53:12Z ***********************
20:54:09:WU00:FS01:0x21:ERROR:Potential energy error of 805.531, threshold of 10
20:54:09:WU00:FS01:0x21:ERROR:Reference Potential Energy: -1.23368e+006 | Given Potential Energy: -1.23287e+006
20:54:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
If there's a completion between one team's hardware which is overclocked and another team's hardware is stable, guess which one will win the competition.The information is available to show that isn't just one of two people experiencing the issue, but our team number dropped substantially with the same number of folders pushing out units. This started on November 4th,and has been continuous since then.
I will continue to post edits to this thread and provide more failed units from other members as they post them, so that it can't be ignored since our entire team and other teams are experiencing this issue.
I have discarded all reports where I even can't determine which project or server is associated with the report (let alone having any information about the system being used.)
11:25:34:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:25:34:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9625 run:1 clone:1 gen:44
23:11:11:WU02:FS03:0x21:ERROR:exception: Error downloading array velm: clEnqueueReadBuffer (-5)
23:11:12:WARNING:WU02:FS03:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:11:12:WU02:FS03:Sending unit results: id:02 state:SEND error:FAULTY project:9704 run:64 clone:18 gen:72 core:0x21There are reports of this sort of error with 171.64.65.56. Assignments from the server have been suspended until the problem can be resolved. (I'm assuming this was the server involved ... if that's not true, then I have no explanation.)11:53:58:WU00:FS01:Upload 95.92%
11:54:03:WU00:FS01:Upload complete
11:54:03:WU00:FS01:Server responded WORK_QUIT (404)
11:54:03:WARNING:WU00:FS01:Server did not like results, dumping
11:54:03:WU00:FS01:Cleaning up
Summary:12:21:14:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:21:14:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9630 run:1 clone:23 gen:46 core:0x21
12:21:14:WU02:FS01:Uploading 9.50KiB to 171.67.108.155
12:21:14:WU02:FS01:Connecting to 171.67.108.155:8080
12:22:08:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:22:08:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:9629 run:0 clone:23 gen:37 core:0x21
15:03:29:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:03:29:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:9634 run:1 clone:40 gen:14 core:0x21
These are all just a tiny example of errors that are occurring now, and I am trying to get all EVGA folders on board to post every single bad unit that is received across all platforms.. The above listed platforms are nearly identical to my system.
FAH has no way to deal with unstable hardware except to reassign the WU to someone else. When a project is aborted, the error is noted and often partial credit is awarded. When the reissued WU is successfully completed, full credit is granted and FAH moves on to the next WU (assuming that the first machine was unstable and the one who completed it was stable.) Here's what I see for the WUs mentioned above.
project:9625 run:1 clone:1 gen:44
Error reported by Team: 111065 (partial credit) and successfully completed by (team 86565) for full credit
project:9704 run:64 clone:18 gen:72
Partial credit (error) to Team: 111065 and Team: 13531. Full credit (no error) awarded to Team: 161747 (Third try)
project:9630 run:1 clone:23 gen:46
Partial points awarded to Team: 111065 and to Team: 111065 and full points awarded to Team: 32
project:9629 run:0 clone:23 gen:37
Partial points awarded to Team: 37651 and full points awarded to Team: 111065
project:9634 run:1 clone:40 gen:14
Partial points awarded to Team: 37651 and to Team: 111065 and full points awarded to Team: 224497
I've only reported the team numbers. Some of the names associate with the failures seem to be repeated but I won't report that unless the person themself asks.
This research has taken me almost an hour, but it does seem to indicate that several machines are marginally stable and they can't handle the increased utilization that these projects are seeking. This certainly is not the first time that FAH has created a more stressful benchmark that the benchmark routines commonly used by overclockers.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
bruce wrote: This research has taken me almost an hour, but it does seem to indicate that several machines are marginally stable and they can't handle the increased utilization that these projects are seeking. This certainly is not the first time that FAH has created a more stressful benchmark that the benchmark routines commonly used by overclockers.
None of my hardware is overclocked.
I think the one with the most informations was the one that had previously been overclocked, but they have lowered everything back to stock, so I will talk to them and see if they can lower them even more.
Again, the only thing I had overclocked on my system was the cpu, as there is little or no point in over clocking 4 gpu's on a daily system. The course overclock has already been removed. I do understand that over clocking cause system instability.
I have requested the users to post log files that we can update, as this is the only public information they have passed on at this time.
I know links are limited for new members, so I am breaking this up so it will come through to a Google Drive link.
drive(.)google(.)com/folderview?id=0BylHzRH2Ab3FTUtXN0tfeGRPYXM&usp=sharing
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
5:13:02:WU02:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
15:13:02:WU02:FS01:0x21:Version 0.0.12
15:13:04:WU00:FS01:Upload 5.07%
15:13:10:WU00:FS01:Upload 10.15%
15:13:16:WU00:FS01:Upload 15.22%
15:13:23:WU00:FS01:Upload 21.31%
15:13:29:WU00:FS01:Upload 27.40%
15:13:35:WU00:FS01:Upload 32.48%
15:13:41:WU00:FS01:Upload 37.55%
15:13:41:WU02:FS01:0x21:ERROR:exception: bad allocation
15:13:41:WU02:FS01:0x21:Saving result file logfile_01.txt
15:13:41:WU02:FS01:0x21:Saving result file log.txt
15:13:41:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:13:42:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:13:42:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9205 run:16 clone:52 gen:7 core:0x21 unit:0x00000026664f2dd055d4d0238de55484
15:13:42:WU02:FS01:Uploading 2.27KiB to 171.64.65.104
15:13:42:WU02:FS01:Connecting to 171.64.65.104:8080
15:13:42:WU03:FS01:Connecting to 171.67.108.45:80
15:13:43:WU02:FS01:Upload complete
15:13:43:WU02:FS01:Server responded WORK_ACK (400)
15:13:59:WU03:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
15:13:59:WU03:FS01:0x21:Version 0.0.12
15:14:01:WU00:FS01:Upload 55.82%
15:14:07:WU00:FS01:Upload 60.89%
15:14:13:WU00:FS01:Upload 65.97%
15:14:19:WU00:FS01:Upload 71.04%
15:14:25:WU00:FS01:Upload 76.11%
15:14:32:WU00:FS01:Upload 82.20%
15:14:39:WU00:FS01:Upload 88.29%
15:14:39:WU03:FS01:0x21:ERROR:exception: bad allocation
15:14:39:WU03:FS01:0x21:Saving result file logfile_01.txt
15:14:39:WU03:FS01:0x21:Saving result file log.txt
15:14:39:WU03:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:14:40:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:14:40:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:9205 run:3 clone:33 gen:9 core:0x21 unit:0x00000039664f2dd055d4c97cfcd240bc
15:14:40:WU03:FS01:Uploading 2.27KiB to 171.64.65.104
15:14:40:WU03:FS01:Connecting to 171.64.65.104:8080
15:14:40:WU02:FS01:Connecting to 171.67.108.45:80
15:14:40:WU03:FS01:Upload complete
15:14:41:WU03:FS01:Server responded WORK_ACK (400)
20:36:46:WU02:FS02:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
20:36:46:WU02:FS02:0x21:Version 0.0.12
20:36:49:WU01:FS02:Upload 5.03%
20:36:55:WU01:FS02:Upload 9.34%
20:37:01:WU01:FS02:Upload 14.37%
20:37:07:WU01:FS02:Upload 19.40%
20:37:13:WU01:FS02:Upload 23.71%
20:37:19:WU01:FS02:Upload 28.02%
20:37:20:WU02:FS02:0x21:ERROR:exception: bad allocation
20:37:20:WU02:FS02:0x21:Saving result file logfile_01.txt
20:37:20:WU02:FS02:0x21:Saving result file log.txt
20:37:20:WU02:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
20:37:20:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:37:20:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:9206 run:0 clone:1351 gen:11 core:0x21 unit:0x00000040664f2dd056202ac9970f0f5c
20:37:20:WU02:FS02:Uploading 2.27KiB to 171.64.65.104
20:37:20:WU02:FS02:Connecting to 171.64.65.104:8080
20:37:20:WU03:FS02:Connecting to 171.67.108.45:80
20:37:21:WU03:FS02:Assigned to work server 171.64.65.58
20:37:21:WU03:FS02:Requesting new work unit for slot 02: READY gpu:1:GK104 [GeForce GTX 770] from 171.64.65.58
20:37:21:WU03:FS02:Connecting to 171.64.65.58:8080
20:37:22:WU03:FS02:Downloading 883.89KiB
20:37:24:WU02:FS02:Upload complete
20:37:24:WU02:FS02:Server responded WORK_ACK (400)
06:26:04:WU00:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
06:26:04:WU00:FS01:0x21:Version 0.0.12
06:26:05:WU01:FS01:Upload 6.31%
06:26:11:WU01:FS01:Upload 12.62%
06:26:17:WU01:FS01:Upload 18.93%
06:26:23:WU01:FS01:Upload 25.24%
06:26:29:WU01:FS01:Upload 31.55%
06:26:35:WU01:FS01:Upload 37.87%
06:26:41:WU01:FS01:Upload 44.18%
06:26:47:WU01:FS01:Upload 50.49%
06:26:53:WU01:FS01:Upload 57.85%
06:26:59:WU01:FS01:Upload 64.16%
06:27:05:WU01:FS01:Upload 70.47%
06:27:10:WU00:FS01:0x21:ERROR:exception: bad allocation
06:27:10:WU00:FS01:0x21:Saving result file logfile_01.txt
06:27:10:WU00:FS01:0x21:Saving result file log.txt
06:27:10:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:27:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:27:10:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9207 run:0 clone:22 gen:32 core:0x21 unit:0x00000038664f2dd055e91e2ca7f835bb
06:27:11:WU00:FS01:Uploading 2.28KiB to 171.64.65.104
06:27:11:WU00:FS01:Connecting to 171.64.65.104:8080
06:27:11:WU02:FS01:Connecting to 171.67.108.45:80
06:27:11:WU01:FS01:Upload 76.78%
06:27:11:WU00:FS01:Upload complete
06:27:11:WU00:FS01:Server responded WORK_ACK (400)
03:54:56:WU02:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
03:54:56:WU02:FS01:0x21:Version 0.0.12
03:55:01:WU00:FS01:Upload 12.89%
03:55:07:WU00:FS01:Upload 19.33%
03:55:13:WU00:FS01:Upload 25.78%
03:55:19:WU00:FS01:Upload 32.22%
03:55:25:WU00:FS01:Upload 38.67%
03:55:31:WU00:FS01:Upload 46.19%
03:55:37:WU00:FS01:Upload 52.63%
03:55:43:WU00:FS01:Upload 59.08%
03:55:49:WU00:FS01:Upload 65.52%
03:55:55:WU00:FS01:Upload 71.97%
03:56:00:WU02:FS01:0x21:ERROR:exception: bad allocation
03:56:00:WU02:FS01:0x21:Saving result file logfile_01.txt
03:56:00:WU02:FS01:0x21:Saving result file log.txt
03:56:00:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:56:00:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:56:00:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9209 run:0 clone:50 gen:15 core:0x21 unit:0x00000025664f2dd055edef6ef0fe2de2
03:56:00:WU02:FS01:Uploading 2.27KiB to 171.64.65.104
03:56:00:WU02:FS01:Connecting to 171.64.65.104:8080
03:56:01:WU03:FS01:Connecting to 171.67.108.45:80
03:56:01:WU02:FS01:Upload complete
03:56:01:WU02:FS01:Server responded WORK_ACK (400)
13:52:20:WU02:FS01:0x18:Folding@home Core Shutdown: FINISHED_UNIT
13:52:21:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
13:52:21:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9430 run:212 clone:9 gen:20 core:0x18 unit:0x00000016ab40413855475025d9ebb6d9
13:52:21:WU02:FS01:Uploading 24.02MiB to 171.64.65.56
13:52:21:WU02:FS01:Connecting to 171.64.65.56:8080
13:52:21:WU00:FS01:Starting
13:52:21:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Download/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 00 -suffix 01 -version 704 -lifeline 2016 -checkpoint 3 -gpu 0 -gpu-vendor nvidia
13:52:21:WU00:FS01:Started FahCore on PID 9584
13:52:21:WU00:FS01:Core PID:7240
13:52:21:WU00:FS01:FahCore 0x18 started
13:52:22:WU00:FS01:0x18:*********************** Log Started 2015-11-10T13:52:22Z ***********************
13:52:22:WU00:FS01:0x18:Project: 10486 (Run 0, Clone 22, Gen 56)
13:52:22:WU00:FS01:0x18:Unit: 0x0000005c538b3dbb54aec97e35fdfd8b
13:52:22:WU00:FS01:0x18:CPU: 0x00000000000000000000000000000000
13:52:22:WU00:FS01:0x18:Machine: 1
13:52:22:WU00:FS01:0x18:Reading tar file state.xml
13:52:23:WU00:FS01:0x18:Reading tar file system.xml
13:52:24:WU00:FS01:0x18:Reading tar file integrator.xml
13:52:24:WU00:FS01:0x18:Reading tar file core.xml
13:52:24:WU00:FS01:0x18:Digital signatures verified
13:52:24:WU00:FS01:0x18:Folding@home GPU core18
13:52:24:WU00:FS01:0x18:Version 0.0.4
13:52:27:WU02:FS01:Upload 23.15%
13:52:33:WU02:FS01:Upload 58.01%
13:52:39:WU02:FS01:Upload 94.96%
13:52:40:WU02:FS01:Upload complete
13:52:40:WU02:FS01:Server responded WORK_QUIT (404)
13:52:40:WARNING:WU02:FS01:Server did not like results, dumping
15:13:02:WU02:FS01:0x21:Version 0.0.12
15:13:04:WU00:FS01:Upload 5.07%
15:13:10:WU00:FS01:Upload 10.15%
15:13:16:WU00:FS01:Upload 15.22%
15:13:23:WU00:FS01:Upload 21.31%
15:13:29:WU00:FS01:Upload 27.40%
15:13:35:WU00:FS01:Upload 32.48%
15:13:41:WU00:FS01:Upload 37.55%
15:13:41:WU02:FS01:0x21:ERROR:exception: bad allocation
15:13:41:WU02:FS01:0x21:Saving result file logfile_01.txt
15:13:41:WU02:FS01:0x21:Saving result file log.txt
15:13:41:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:13:42:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:13:42:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9205 run:16 clone:52 gen:7 core:0x21 unit:0x00000026664f2dd055d4d0238de55484
15:13:42:WU02:FS01:Uploading 2.27KiB to 171.64.65.104
15:13:42:WU02:FS01:Connecting to 171.64.65.104:8080
15:13:42:WU03:FS01:Connecting to 171.67.108.45:80
15:13:43:WU02:FS01:Upload complete
15:13:43:WU02:FS01:Server responded WORK_ACK (400)
15:13:59:WU03:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
15:13:59:WU03:FS01:0x21:Version 0.0.12
15:14:01:WU00:FS01:Upload 55.82%
15:14:07:WU00:FS01:Upload 60.89%
15:14:13:WU00:FS01:Upload 65.97%
15:14:19:WU00:FS01:Upload 71.04%
15:14:25:WU00:FS01:Upload 76.11%
15:14:32:WU00:FS01:Upload 82.20%
15:14:39:WU00:FS01:Upload 88.29%
15:14:39:WU03:FS01:0x21:ERROR:exception: bad allocation
15:14:39:WU03:FS01:0x21:Saving result file logfile_01.txt
15:14:39:WU03:FS01:0x21:Saving result file log.txt
15:14:39:WU03:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:14:40:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:14:40:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:9205 run:3 clone:33 gen:9 core:0x21 unit:0x00000039664f2dd055d4c97cfcd240bc
15:14:40:WU03:FS01:Uploading 2.27KiB to 171.64.65.104
15:14:40:WU03:FS01:Connecting to 171.64.65.104:8080
15:14:40:WU02:FS01:Connecting to 171.67.108.45:80
15:14:40:WU03:FS01:Upload complete
15:14:41:WU03:FS01:Server responded WORK_ACK (400)
20:36:46:WU02:FS02:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
20:36:46:WU02:FS02:0x21:Version 0.0.12
20:36:49:WU01:FS02:Upload 5.03%
20:36:55:WU01:FS02:Upload 9.34%
20:37:01:WU01:FS02:Upload 14.37%
20:37:07:WU01:FS02:Upload 19.40%
20:37:13:WU01:FS02:Upload 23.71%
20:37:19:WU01:FS02:Upload 28.02%
20:37:20:WU02:FS02:0x21:ERROR:exception: bad allocation
20:37:20:WU02:FS02:0x21:Saving result file logfile_01.txt
20:37:20:WU02:FS02:0x21:Saving result file log.txt
20:37:20:WU02:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
20:37:20:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:37:20:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:9206 run:0 clone:1351 gen:11 core:0x21 unit:0x00000040664f2dd056202ac9970f0f5c
20:37:20:WU02:FS02:Uploading 2.27KiB to 171.64.65.104
20:37:20:WU02:FS02:Connecting to 171.64.65.104:8080
20:37:20:WU03:FS02:Connecting to 171.67.108.45:80
20:37:21:WU03:FS02:Assigned to work server 171.64.65.58
20:37:21:WU03:FS02:Requesting new work unit for slot 02: READY gpu:1:GK104 [GeForce GTX 770] from 171.64.65.58
20:37:21:WU03:FS02:Connecting to 171.64.65.58:8080
20:37:22:WU03:FS02:Downloading 883.89KiB
20:37:24:WU02:FS02:Upload complete
20:37:24:WU02:FS02:Server responded WORK_ACK (400)
06:26:04:WU00:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
06:26:04:WU00:FS01:0x21:Version 0.0.12
06:26:05:WU01:FS01:Upload 6.31%
06:26:11:WU01:FS01:Upload 12.62%
06:26:17:WU01:FS01:Upload 18.93%
06:26:23:WU01:FS01:Upload 25.24%
06:26:29:WU01:FS01:Upload 31.55%
06:26:35:WU01:FS01:Upload 37.87%
06:26:41:WU01:FS01:Upload 44.18%
06:26:47:WU01:FS01:Upload 50.49%
06:26:53:WU01:FS01:Upload 57.85%
06:26:59:WU01:FS01:Upload 64.16%
06:27:05:WU01:FS01:Upload 70.47%
06:27:10:WU00:FS01:0x21:ERROR:exception: bad allocation
06:27:10:WU00:FS01:0x21:Saving result file logfile_01.txt
06:27:10:WU00:FS01:0x21:Saving result file log.txt
06:27:10:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:27:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:27:10:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9207 run:0 clone:22 gen:32 core:0x21 unit:0x00000038664f2dd055e91e2ca7f835bb
06:27:11:WU00:FS01:Uploading 2.28KiB to 171.64.65.104
06:27:11:WU00:FS01:Connecting to 171.64.65.104:8080
06:27:11:WU02:FS01:Connecting to 171.67.108.45:80
06:27:11:WU01:FS01:Upload 76.78%
06:27:11:WU00:FS01:Upload complete
06:27:11:WU00:FS01:Server responded WORK_ACK (400)
03:54:56:WU02:FS01:0x21:Folding@home GPU Core21 [link=mailto:Folding@home]Folding@home[/link] Core
03:54:56:WU02:FS01:0x21:Version 0.0.12
03:55:01:WU00:FS01:Upload 12.89%
03:55:07:WU00:FS01:Upload 19.33%
03:55:13:WU00:FS01:Upload 25.78%
03:55:19:WU00:FS01:Upload 32.22%
03:55:25:WU00:FS01:Upload 38.67%
03:55:31:WU00:FS01:Upload 46.19%
03:55:37:WU00:FS01:Upload 52.63%
03:55:43:WU00:FS01:Upload 59.08%
03:55:49:WU00:FS01:Upload 65.52%
03:55:55:WU00:FS01:Upload 71.97%
03:56:00:WU02:FS01:0x21:ERROR:exception: bad allocation
03:56:00:WU02:FS01:0x21:Saving result file logfile_01.txt
03:56:00:WU02:FS01:0x21:Saving result file log.txt
03:56:00:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:56:00:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:56:00:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9209 run:0 clone:50 gen:15 core:0x21 unit:0x00000025664f2dd055edef6ef0fe2de2
03:56:00:WU02:FS01:Uploading 2.27KiB to 171.64.65.104
03:56:00:WU02:FS01:Connecting to 171.64.65.104:8080
03:56:01:WU03:FS01:Connecting to 171.67.108.45:80
03:56:01:WU02:FS01:Upload complete
03:56:01:WU02:FS01:Server responded WORK_ACK (400)
13:52:20:WU02:FS01:0x18:Folding@home Core Shutdown: FINISHED_UNIT
13:52:21:WU02:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
13:52:21:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9430 run:212 clone:9 gen:20 core:0x18 unit:0x00000016ab40413855475025d9ebb6d9
13:52:21:WU02:FS01:Uploading 24.02MiB to 171.64.65.56
13:52:21:WU02:FS01:Connecting to 171.64.65.56:8080
13:52:21:WU00:FS01:Starting
13:52:21:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Download/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 00 -suffix 01 -version 704 -lifeline 2016 -checkpoint 3 -gpu 0 -gpu-vendor nvidia
13:52:21:WU00:FS01:Started FahCore on PID 9584
13:52:21:WU00:FS01:Core PID:7240
13:52:21:WU00:FS01:FahCore 0x18 started
13:52:22:WU00:FS01:0x18:*********************** Log Started 2015-11-10T13:52:22Z ***********************
13:52:22:WU00:FS01:0x18:Project: 10486 (Run 0, Clone 22, Gen 56)
13:52:22:WU00:FS01:0x18:Unit: 0x0000005c538b3dbb54aec97e35fdfd8b
13:52:22:WU00:FS01:0x18:CPU: 0x00000000000000000000000000000000
13:52:22:WU00:FS01:0x18:Machine: 1
13:52:22:WU00:FS01:0x18:Reading tar file state.xml
13:52:23:WU00:FS01:0x18:Reading tar file system.xml
13:52:24:WU00:FS01:0x18:Reading tar file integrator.xml
13:52:24:WU00:FS01:0x18:Reading tar file core.xml
13:52:24:WU00:FS01:0x18:Digital signatures verified
13:52:24:WU00:FS01:0x18:Folding@home GPU core18
13:52:24:WU00:FS01:0x18:Version 0.0.4
13:52:27:WU02:FS01:Upload 23.15%
13:52:33:WU02:FS01:Upload 58.01%
13:52:39:WU02:FS01:Upload 94.96%
13:52:40:WU02:FS01:Upload complete
13:52:40:WU02:FS01:Server responded WORK_QUIT (404)
13:52:40:WARNING:WU02:FS01:Server did not like results, dumping
Re: Failing units, low ppd, and returned units.
The error message "bad allocation" is new to me so I'd like to gather as much information as possible to pass on to the developers.
Here's an edited transcript of what you posted.
1: FAULTY project:9205 run:16 clone:52 gen:7 core:0x21
2: FAULTY project:9205 run:3 clone:33 gen:9 core:0x21
3: FAULTY project:9206 run:0 clone:1351 gen:11 core:0x21
4: FAULTY project:9207 run:0 clone:22 gen:32 core:0x21
5: FAULTY project:9209 run:0 clone:50 gen:15 core:0x21
B) Some other unknown issue
03:56:00:WU02:FS01:0x21:ERROR:exception: bad allocation
FAULTY project:9209 run:0 clone:50 gen:15 core:0x21
Server did not like results, dumping
C) Some WUs do complete successfully
NO_ERROR project:9430 run:212 clone:9 gen:20 core:0x18
D) A new WU has started and may or may not complete successfully.
Project: 10486 (Run 0, Clone 22, Gen 56)
Questions (you may have already answered, but confirm them in one place):
What OS are you running ... including 32-bit or 64-bit?
What else is running?
Provide a detailed description of the hardware is being used, including clock rates.
Is there any possibility of unexpected conditions (like disk-full)?
Which drivers are being used?
A1:Several failures. One Completion.
A2:Several failures. Still being redistributed.
A3:Several failures. One Completion.
A4:Several failures. One Completion.
A5:Several failures. Still being redistributed.
Same or different?
B1:Several failures. Still being redistributed
These WUs all show too high a failure rate. Development will be interested in attempting to reproduce and diagnose the failures.
C & D: something is strange here. More investigation is needed.
Here's an edited transcript of what you posted.
A) An unknown issue that repeats: exception: bad allocationscarlet_tech wrote:15:13:02:WU02:FS01:0x21:Version 0.0.12
15:13:41:WU02:FS01:0x21:ERROR:exception: bad allocation
15:13:41:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:13:42:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9205 run:16 clone:52 gen:7 core:0x21
15:14:39:WU03:FS01:0x21:ERROR:exception: bad allocation
15:14:39:WU03:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:14:40:WU03:FS01:Sending unit results: id:03 state:SEND error:FAULTY project:9205 run:3 clone:33 gen:9 core:0x21
20:37:20:WU02:FS02:0x21:ERROR:exception: bad allocation
20:37:20:WU02:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
20:37:20:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:9206 run:0 clone:1351 gen:11 core:0x21
06:27:10:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:27:10:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:27:10:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:9207 run:0 clone:22 gen:32 core:0x21
03:56:00:WU02:FS01:0x21:ERROR:exception: bad allocation
03:56:00:WU02:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
03:56:00:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:9209 run:0 clone:50 gen:15 core:0x21
13:52:40:WU02:FS01:Upload complete
13:52:40:WU02:FS01:Server responded WORK_QUIT (404)
13:52:40:WARNING:WU02:FS01:Server did not like results, dumping
13:52:20:WU02:FS01:0x18:Folding@home Core Shutdown: FINISHED_UNIT
13:52:21:WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:9430 run:212 clone:9 gen:20 core:0x18
13:52:22:WU00:FS01:0x18:Project: 10486 (Run 0, Clone 22, Gen 56)
13:52:24:WU00:FS01:0x18:Version 0.0.4
1: FAULTY project:9205 run:16 clone:52 gen:7 core:0x21
2: FAULTY project:9205 run:3 clone:33 gen:9 core:0x21
3: FAULTY project:9206 run:0 clone:1351 gen:11 core:0x21
4: FAULTY project:9207 run:0 clone:22 gen:32 core:0x21
5: FAULTY project:9209 run:0 clone:50 gen:15 core:0x21
B) Some other unknown issue
03:56:00:WU02:FS01:0x21:ERROR:exception: bad allocation
FAULTY project:9209 run:0 clone:50 gen:15 core:0x21
Server did not like results, dumping
C) Some WUs do complete successfully
NO_ERROR project:9430 run:212 clone:9 gen:20 core:0x18
D) A new WU has started and may or may not complete successfully.
Project: 10486 (Run 0, Clone 22, Gen 56)
Questions (you may have already answered, but confirm them in one place):
What OS are you running ... including 32-bit or 64-bit?
What else is running?
Provide a detailed description of the hardware is being used, including clock rates.
Is there any possibility of unexpected conditions (like disk-full)?
Which drivers are being used?
A1:Several failures. One Completion.
A2:Several failures. Still being redistributed.
A3:Several failures. One Completion.
A4:Several failures. One Completion.
A5:Several failures. Still being redistributed.
Same or different?
B1:Several failures. Still being redistributed
These WUs all show too high a failure rate. Development will be interested in attempting to reproduce and diagnose the failures.
C & D: something is strange here. More investigation is needed.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 37
- Joined: Tue Nov 10, 2015 9:54 pm
Re: Failing units, low ppd, and returned units.
I am trying to make sure I get these on Google docs and such so that I don't completely clog the forum. I appreciate you passing up the info.
Re: Failing units, low ppd, and returned units.
You must spend a lot of your time looking for conspiracies under every bush. As you can see from my posts, there's no problem getting the truth from a server that's overburdened.Scarlet-Tech wrote:So, it is OK to post stuff like that, but not share the link to it. Makes sense. Wouldn't want the truth out there I guess.
If it were opened to everyone, it would be swamped with a multitude of requests from a multitude of Donors. Collectively, the Mods submit perhaps 30 transactions a week. If it were open to everybody, it would be getting perhaps 30 hits per hour and it simply isn't designed to handle that kind of load. Even so, I've seen it take several minutes to respond to a fairly simple request -- but I don't choose to gripe about the minor inconveniences in life.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Failing units, low ppd, and returned units.
Is there an easy tool available to parse log files? I don't have the time to check every WU on my machines every day, but I'd gladly drop the logs into a parsing tool and submit those results. I've not caught any WU failures that weren't my fault, but that's not to say I haven't missed any. I take it from the discussion above (and the fact that beta testers exist) that even though the information might be available on the server, it's not being reviewed or reported for analysis? In any case, if there's an easy way I can help with that, I'd be glad to.
Ryzen 5900x 12T - RTX 4070 TI