Page 1 of 2
Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 2:22 am
by Grandpa_01
I was browsing through the stats today and noticed one of the top producers production had dropped a bit recently and checked it out. It appears that either Stanford had a server code change or fac changed there settings on or about 8/20/2012. Prior to that date fac was doing around 260 WU's per update with a value of around 3700 points per WU around 1800 per week. Since that date they have been doing between 2 and 10 WU's per update with an avg of 28 per day and a avg of 196 per week with an avg point value of 22,000, (which is the approximate value of a bigadv completed past the preferred deadline) Just from looking at the chart I would say that fac was switched to bigadv and normally I would say that is ok.
But there appears to be a problem, it appears that fac can not complete the 8101's on time most of the time or not at all. It appears that it is probably completing the 8102's in time but it is failing to meet the deadline far more often than it is making it. This can not be good for the science if this is what is happening, how many WU's do you think I may be running that are being reassigned to me because this donor did not meet the deadline, From looking at the stats of the donor I would guess allot of the bigadv folders are running redundant WU's that have been reassigned because they were not turned in by the preferred deadline.
Any way I think somebody should look into this and see if my suspicions are correct here, and I am wondering if I should just shut down my rigs until this gets ironed out seems like a bit of a wast of 4500 Watts electricity on my part to rerun WU's.
Link to donors stats
http://kakaostats.com/usum.php?u=1804246
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 3:44 am
by 7im
What would you suggest PG do about this when each folder is responsible for their own actions?
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 10:31 am
by bollix47
FWIW
I can confirm Grandpa_01's suspicions. Around the end of August most(maybe all) of their CPUs switched to bigadv. They've done many(guessing ~150) since then and according to the information I can see not one of them completed before the preferred deadline. All the P8101s took around a day longer. Most of the WUs during that time have been P8101 but there were a few P8102 and they didn't make the deadline either, going slightly over.
Today I notice there's a regular SMP returned for the first time since they made the switch so perhaps they have noticed the problem and are reconfiguring again but it will be a few days before any conclusions can be drawn.
Just reporting the facts.
Personally, I won't be shutting anything down.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 12:03 pm
by Grandpa_01
I do not know what Stanford can do other than fix the server code if they changed it on the date in question. I do know it is creating a very large wast of resources, There just simply are not that many bigadv capable rigs out there and when you have a donor that large failing to meet the deadline that means that a very large portion of the bigadv folders are going to be doing redundant useless work. If nothing else block the donor, they should notice the problem and fix it if they are not getting any points.
Some kind of an attempt should be made to correct the issue, some people would not consider this a very green or efficient, which I am sure PG is a member of that group.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 1:51 pm
by 7im
Your team's BA folders monitor BA folding closely, so if there was a change on the BA servers, I estimate someone would have mentioned it already.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 2:05 pm
by Jesse_V
Wouldn't the PG notice something like this? Seems likely to me that it'd be one more thing they'd keep track of.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 2:12 pm
by 7im
Why would they notice? You obviously do not grasp the scope of the project. A few hundred workunits getting completed in 3 days instead of 2 days is not a giant red flag like you think.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 2:44 pm
by Jesse_V
7im wrote:Why would they notice? You obviously do not grasp the scope of the project. A few hundred workunits getting completed in 3 days instead of 2 days is not a giant red flag like you think.
Yes. The posts seemed to imply that there was a larger problem at hand here, particularly with 8101/8102. If those projects were making very little progress because they kept timing out, then that would be the red flag. Other than that, yes you're right.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 3:09 pm
by 7im
Look closer. The WUs ARE getting completed before the (final) deadline and the science still moves forward. And with new BA systems getting added all the time, a few dupe WUs do not create a noticeable drag.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 4:37 pm
by Grandpa_01
7im wrote:Why would they notice? You obviously do not grasp the scope of the project. A few hundred workunits getting completed in 3 days instead of 2 days is not a giant red flag like you think.
It may be bigger than you think, there are only a limited amount of bigadv capable machines in the donor base, I doubt there are more than a couple hundred, If one machine is failing to meet the preferred deadline on a hundred WU,s a week that means that the effective production of the donor based machines is probably being cut by 50% or greater. When a WU is not turned in by the preferred deadline it is then re-assigned to x amount of other machines, Just from history I would guess it is assigned to 2 to 3 other machines thus 100 failed = 300 redundant WU's. 1 large donors machines failures could potentially be tying up the resources of all of the rest of the bigadv rigs doing redundant useless work.
That would be a rather large problem in my opinion.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 5:00 pm
by 7im
1 time out = 1 reassignment. That is the default. To assume otherwise without data to back it up is only a guess. Maybe a mod could sample check a few of his WUs?
This user averages ~30 WUs a day. Even at 3x, that's less than 100 WUs a day. Out of ~1000 BA WUs returned in a day, still not a big red flag.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 5:13 pm
by Joe_H
FWIW
Your estimates are off. Most of the WU's listed as done by fac have only been assigned to one other folder. Out of the thousands of WU's done in those projects during that time period they are a small percentage. Your WAG of 50% reduction in productivity is off by at least a factor of 10.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 5:35 pm
by 7im
Also a guess, but maybe a mod could check... if this was a server setting change, the pattern of the WUs being returned would be distinct.
smp, smp, smp, smp, ba, smp, ba, smp, ba, ba, ba, ba, ba. The transitional would be gradual, over a few days time, as each client returned a WU, and then got a different type of WU.
However, if this was a donor change, the pattern might be different.
smp, smp, smp, smp, smp, ba, ba, ba, ba, ba, ba.
Also depends on the time frame of the change over. Did it happen all at once, or was it a gradual change?
If this turns out to be a donor change, there is little Pande Group can do about it. And since this user is a team of 1, with no forum links, we are running out of options. Yes, a problem to be solved, if possible. So far, no workable solution suggested...
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 9:11 pm
by tear
PG could contact the donor by e-mail (team e-mail used in registration) with a friendly note.
Or even better -- passkey-associated e-mail.
One could even automate that by means of simple script that could run weekly or so.
Re: Possible CPU cycles being wasted
Posted: Mon Sep 10, 2012 9:18 pm
by orion
And here I thought (from things that transpired in the past) that one WU missing the preferred deadline even by one minute slowed the project down since it was reassigned to another folder who could have been folding a new WU instead.
But now it seems that it doesn't matter if one misses 1 or 10 or however many preferred deadlines since it is no big deal and doesn't
really slow the project down.