Page 1 of 1

WU allocation

Posted: Wed Apr 23, 2008 12:09 pm
by keith_MM
Hi , i was interested in how the servers allocate work units. I can understand allocating work units according to a client and cpu and advmethods flags. But i mean more in terms of what happens after a WU is returned, what happens then?.
For instance does somebody look at the result file that is sent back before it is then reissued for someone else to continue the next stage with or is the next stage computed automatically and put back on the server. Are WU 's served up in the order that that they have been received , or do WU's that are nearer completion get priority and low priority WU's are available as a fallback. How long does a WU tend to sit waiting to be allocated to somebody etc.

Re: WU allocation

Posted: Wed Apr 23, 2008 12:31 pm
by bruce
Whenever a WU is returned, a new one is automatically generated from it (at least until an individual trajectory reaches some maximum Gen number). I believe that those segments are connected into a single trajectory before they are looked at. This may help: http://fahwiki.net/index.php/Runs%2C_Clones_and_Gens

How long a WU sits before being allocated to somebody varies depending on the project. I don't know the actual values or any easy to determine them, but the high-performance clients that are currently in beta test are specifically designed to minimize the turn-around time between successive generations. To do that, it's just as important that the server re-issue the next generation as soon as possible after the previous one is returned as it is for you to return the result as soon as possible after it is issued to you. I'm sure they're taking steps to minimize the time a WU sits on a server.

Projects for the classic (uniprocessor) client have much longer deadlines so it is reasonable if they spend longer on the server.

In any case, it's important that the servers have enough WUs to distribute to donors, even if somebody' connects with an unstable machine that causes a series of WUs to have rapid errors. I'm sure that keeping all that in balance is a challenge.

Re: WU allocation

Posted: Wed Apr 23, 2008 3:41 pm
by torswin
I've always wondered how they can verify that what they have is 100 % correct. For instance, what if one client, because of something weird, calculates one thing wrong which doesn't result in an EUE and it continue to go along. Won't that affect the validity of the WU as it goes along, since it is based on some flawed calculations?

Re: WU allocation

Posted: Wed Apr 23, 2008 4:22 pm
by 7im
The whole process has data checks and error correction checks. Even the the client does checking while folding each WU. If the parameters get crazy, the work unit errors out, and may or may not even send back any data.

Folding@home has been running 5+ years, and has things like this figured out already. For security reasons they do not disclose all the many details involved.

Re: WU allocation

Posted: Wed Apr 23, 2008 5:32 pm
by bruce
There are a number of methods to verify the data after it's uploaded. The runs/clones are often statistical variations similar to others and if the final results are significantly different, they'll show up. The temperature of a real protein establishes certain random motions and that randomness provides a statistical check of the computed results.

I'm sure there are other ways to spot an error, too.

Re: WU allocation

Posted: Thu Apr 24, 2008 4:35 pm
by keith_MM
Thanks Bruce, Its certainly interesting to know what happens to a WU after it departs back to the servers. As a side thought, for some of the SMP WU's ,if someone is regularly turning WU's round in say less than a third of the maximum deadline, could they be issued with a high priority WU where the client in effect folds 3 or 4 generations, sending the results back after each generation , but instead of downloading a new WU , continues from the point that it finished the previous WU.

Re: WU allocation

Posted: Thu Apr 24, 2008 10:51 pm
by bruce
I'm sure such a process COULD be done, but I doubt there would be enough of a benefit compared to the changes that would need to be made to FAH to make it worth while.

I can think of a lot more possible disadvantages and very few advantages. (I don't make the decisions, however, so when someone from the Pande Group reads this discussion, they might decide I'm wrong and decide to implement your suggestion.)

The WU would still need to be uploaded (for the records) so the most that would be saved is the time to download a new WU. There's some (minor?) processing to convert a finished WU into a new WU and the client would need to be changed so it knew how to do that part. The severs would need to be changed to know that it didn't need to send you a new WU but it would still have to update the record of what WU is currently assigned to you. If you happened to be assigned Gen N, where only N generations were required, the client would need to know to stop. If WUs of a certain project are in short supply, others donors would complain that you were hogging that particular type.

Overall, I think it makes sense to keep that functionally on the server. If there's a reason to change the logic, it's a lot easier to update the servers than to release a new version of the clients, and the updates can be scheduled by the Pande Group, depending on the priority of the required change.

Re: WU allocation

Posted: Thu Apr 24, 2008 10:55 pm
by 7im
keith_MM wrote:Thanks Bruce, Its certainly interesting to know what happens to a WU after it departs back to the servers. As a side thought, for some of the SMP WU's ,if someone is regularly turning WU's round in say less than a third of the maximum deadline, could they be issued with a high priority WU where the client in effect folds 3 or 4 generations, sending the results back after each generation , but instead of downloading a new WU , continues from the point that it finished the previous WU.
You never know, the SMP client might already be doing the equivalent of the 3-4 generations in each work unit. The SMP work units are somewhat larger than CPU work units. ;)