Page 1 of 3

R/C/G

Posted: Fri May 02, 2008 2:51 am
by RAH
This has been discussed numerous times. IIRC the finished WU gets sent back, and the new WU is configure from it.
Is this done automatically, by an existing program, say by putting the final cordinates of Gen 0 to a WU, that
becomes Gen 1?

If it is auto, couldn't the program become part of the client, where as, the results are sent back to Pande, and
then the new WU is generated, and started to be worked on at the same time, by the same machine.

Gen 0 is completed, sent back, program generates Gen 1, and get crunched.

Re: R/C/G

Posted: Fri May 02, 2008 3:43 am
by anandhanju
Interesting.

Hazarding several guesses here, I think the next gen WU creation process is an automated server task that happens until the project's WUs reach a predetermined gen (MAX_GENS=10, for example). 6 months later while finalizing results for a paper or a study, researchers could ask for 2 more gens, if needed.

While we do not know the computational abilities of the work server (there might be 8 pink e-bunnies, instead of cores for all I know), CPU usage% of these servers typically is modest. How long would it take for a normal volunteer's computer to create the next gen? I don't know.

Another thought to consider is the complexity of the logic involved in creation of a WU. Can it be bundled in a client, made to run on x platforms? What about logic updates to the gen code? Will it not need a corresponding client update and all volunteers to upgrade their clients so that the new gen code update is used to create the WU?

Lastly, I'd assume that priorities change. A WU might be a gen 50 of a planned 100 gen batch of a particular project but a more interesting and urgently required project might be released with a higher priority which needs to be proceesed sooner. So gen 51 of the old project will hibernate until the researchers open the taps again. This sort of control would be difficult to have if a client is autonomous in gen creation. Would FAH want 100,000 R+C combinations to lock up so many clients for x gens? Probably not.

Like I said, interesting idea.

Re: R/C/G

Posted: Fri May 02, 2008 4:13 am
by RAH
Yes. I was not figuring on a full generation.
If it could be done at all, then making a WU say Gen 0 thru Gen 9.
Each generation could be sent back, or 10 generations at once.

The downloading of a WU would be once every 10 days(give or take).

Re: R/C/G [push WU creation out to the client level?]

Posted: Fri May 02, 2008 5:30 am
by 7im
The results of the returned WU data is needed back at Stanford to create the next work unit in line.

Besides, the WU next in line in the project WU you just completed may not be the most helpful WU to work on next. A more important WU is probably waiting for you at Stanford. Upload the current WU, and then start on the next WU, whatever Stanford feels is the best for you to do next. Moving the WU creation function out to the client is an interesting idea, but may not be practical.

Remember the name Storage@Home.

Re: R/C/G

Posted: Fri May 02, 2008 6:34 am
by bruce
The logic to generate the next Gen is not the same for every core. It's more complicated than just continuing from the same point.

This same request was made a week or two ago.

What is really saved other than a few extra downloads? That's a pretty small advantage for such a major change.

There are a number of disadvantages. The priority issue mentioned by 7im is a drawback, updating all the clients with the logic required by each core is a drawback, and the fairness issue comes up if you get assigned to 0.0.0.0 because somebody else is already working on all the WUs -- allowing them to keep working rather than giving you a chance to snag the next WU.

Reprogramming costs and debugging costs are high and the benefit to FAH is tiny. (How many thousands of CPUs worth of extra processing power would this add compared to what could be gained by some other programming task of equal magnitude?)

Re: R/C/G [push WU creation out to the client level?]

Posted: Fri May 02, 2008 11:43 am
by RAH
7im wrote:The results of the returned WU data is needed back at Stanford to create the next work unit in line.
Why? If it is a auto program, it could be put anywhere.
7im wrote:Besides, the WU next in line in the project WU you just completed may not be the most helpful WU to work on next. A more important WU is probably waiting for you at Stanford.
With the sheer number of folders, this is more then likely old school, and a mute point anymore.
7im wrote:Upload the current WU, and then start on the next WU, whatever Stanford feels is the best for you to do next. Moving the WU creation function out to the client is an interesting idea, but may not be practical.
Then again might be.
7im wrote:Remember the name Storage@Home.
Yes. Might happen some day.
bruce wrote:The logic to generate the next Gen is not the same for every core. It's more complicated than just continuing from the same point.
Irrelevant if it can become part of the client.
bruce wrote:This same request was made a week or two ago.
I'll look it up.
bruce wrote:What is really saved other than a few extra downloads? That's a pretty small advantage for such a major change.
And maybe hundreds of hours.
bruce wrote:There are a number of disadvantages. The priority issue mentioned by 7im is a drawback, updating all the clients with the logic required by each core is a drawback, and the fairness issue comes up if you get assigned to 0.0.0.0 because somebody else is already working on all the WUs -- allowing them to keep working rather than giving you a chance to snag the next WU.

Reprogramming costs and debugging costs are high and the benefit to FAH is tiny. (How many thousands of CPUs worth of extra processing power would this add compared to what could be gained by some other programming task of equal magnitude?)
Old school issues, and if it can be done, the reprogramming should be minimal. This is 2008, things are supposed to be better then 2004.


But all in all, still just some thinking.

Re: R/C/G

Posted: Fri May 02, 2008 4:52 pm
by 7im
Programming should be minimal? You assume too much.

Please read Bruce's response again. And then think through your suggestion again. With a little effort, I'm sure you will find a few more holes and why this wouldn't work out very well. And then later, if you have suggestions to fill those holes, we will entertain another response.

What we will not entertain are more suggestions that are defended by bashing the project claiming it to be too old school to impliment the suggestion. Suggestions need to be supported by facts, not criticisms.

Re: R/C/G

Posted: Fri May 02, 2008 6:22 pm
by RAH
I have read bruces post. Have you ever read the client log files? They are all minimal (small), since Pande has
performed the necessary changes to the core files. The files needed to input data are not large.
They aren't starting from scratch.

What you will entertain? I have not bashed anyone, or anything.
The facts are not coming from you, anymore then me.

I am assuming, I will admit this, but you are assuming also.

If fact, it would just be best, if anyone, posted anything in this type of question, no one, but the people in the know should
answer. Give your opinion, like the poster.

Never mind, i keep forgetting, you'll Pande.

Re: R/C/G

Posted: Fri May 02, 2008 6:36 pm
by 7im
Still waiting for some facts...

Re: R/C/G

Posted: Fri May 02, 2008 9:15 pm
by Ren02
Stanford needs to know that the donor is still there. It could be made to work though.
Say you complete Gen0, upload the results, yet at the same time you are already crunching Gen1. If Stanford gets notified that Gen0 is complete and it sends a confirmation that processing Gen1 is OK, then you have saved quite a bit of time. If Stanford sends a message to cancel Gen1 and download something else, then it takes exactly the same time as it does now, so no harm.

It does not look like Stanford particularly values this time between the WUs though. :evil: If they did, they could do something real easy and download a new WU and start processing before uploading the results of a previous one. So many donors have DSL with decent download and abysmal upload speeds. About 10-20 minutes of folding time are lost every time I complete a WU. :x

Re: R/C/G [push WU creation out to the client level?]

Posted: Sat May 03, 2008 7:14 pm
by bruce
RAH wrote:
bruce wrote:The logic to generate the next Gen is not the same for every core. It's more complicated than just continuing from the same point.
Irrelevant if it can become part of the client.
It can't. See below.
bruce wrote:What is really saved other than a few extra downloads? That's a pretty small advantage for such a major change.
And maybe hundreds of hours.
And where does "hundreds of hours" appear on this chart? http://fah-web.stanford.edu/cgi-bin/mai ... pe=osstats
Some of the current programming projects are working on stability issues for things like Win-SMP and GPU2. These will add orders of magnitude more processing power than "hundreds of hours" Other projects are working on changes to several of the FahCore_xx which will increase their speed quite significantly or increase their scientific capability ("priceless").

None of us are saying that your suggestion is totally bad, but neither is it totally good, and compared to the other things that might be changed, it's an expensive change with a particularly small benefit to FAH so it's not likely to be incorporated in V7. So far, you're only looking at the good side of the suggestion, and we're trying to point out that there is another side to the issue.

Facts:

Some projects are LIFO, some are FIFO. That means on some projects, avoiding the upload/download processing would, in fact, save some time. In other projects it would be bad. I can think of two examples of badness.

1) In some projects, all Clones are for Gen X are collected and processed before generating the Gen X+1. Important data is interchanged between Gens at every step of the way. Clearly these projects must be processed on the server, not in the client.

2) A project is not complete until a certain number of Gens reach a certain stage related to the Markov properties. If WUs are assigned in blocks of 10 rather than blocks of 1, statistics say that some blocks of 10 will be done by someone with a fast computer and other blocks will be done by someone with a slow computer. If they are assigned individually, there is a high probability that some will be done by fast computers and others will be done by slow computers, so there will be much less variation at the end of 10 WUs. Keeping the various clones "together" is a distinct advantage.

Re: R/C/G

Posted: Sat May 03, 2008 8:02 pm
by John Naylor
Ren02 wrote:It does not look like Stanford particularly values this time between the WUs though. :evil: If they did, they could do something real easy and download a new WU and start processing before uploading the results of a previous one. So many donors have DSL with decent download and abysmal upload speeds. About 10-20 minutes of folding time are lost every time I complete a WU. :x
What about that suggestion though? Program the client to start downloading a new unit when the current unit reaches 99%, and to start processing the new unit while uploading the old one and there would be no time lag at all. Assuming that all users are on DSL (yes, I know that isn't true), and it takes 10 mins to do upload/download: 270,000 machines x10minutes is 2.7 million minutes lost for the entire system to get a new unit. Even accounting for old machines and saying that the system refreshes itself on average every 5 days, that's still 540,000 processing minutes lost per day, or 187.5 units per day (assuming every unit takes 2 days to finish). Over a year that is >68,000 units lost. Whichever way you look at it that is a lot of wasted time.

Re: R/C/G

Posted: Sat May 03, 2008 8:36 pm
by 7im
And the work Stanford did over the last year on the PS3 client has added 30,000 active clients, each capable of doing 2 WUs a day, at 20x the speed of PCs. How many more WUs does that add, vs. how many were not folded by upload/download delays?

Like Bruce said, it's not that they don't value the lost time, but that they have bigger fish to fry at the moment.

Also, even if Pande Group never has the time to work on reducing the upload/download delay, the problem is ever going away / getting fixed. People continue to upgrade their network connections over time. On most new DSL connections, that 10 minute lag is more like 1 minute. And the new Cable TV internet connections are even faster. So there is little incentive to fix a problem that is fixing itself, especially when there are bigger improvements possible in other areas.

Re: R/C/G

Posted: Sat May 03, 2008 9:24 pm
by bruce
John Naylor wrote:What about that suggestion though? Program the client to start downloading a new unit when the current unit reaches 99%, and to start processing the new unit while uploading the old one and there would be no time lag at all. Assuming that all users are on DSL (yes, I know that isn't true), and it takes 10 mins to do upload/download: 270,000 machines x10minutes is 2.7 million minutes lost for the entire system to get a new unit. Even accounting for old machines and saying that the system refreshes itself on average every 5 days, that's still 540,000 processing minutes lost per day, or 187.5 units per day (assuming every unit takes 2 days to finish). Over a year that is >68,000 units lost. Whichever way you look at it that is a lot of wasted time.
There was a long discussion of this option a year or two ago (on the old forum). Fundamentally I think the idea of anticipatory downloading is an excellent idea, but if it ever made it to the wish-list, it's still pretty low priority so we may never see it.

As an aside, the FAH client does measure your upload and download rates. I don't know if it uses those numbers or they were added based on plans that are still somewhere in the future, but they could do better than picking 99% as the time to start a download. There's no way to predict the size of the download, so some kind of guess needs to be made, but that guess divided by your measured speed gives an estimate for how long the download should take. Compare that to a projected time when the current WU will be finished (not just 100%, but including something for the time needed to complete the end-of-wu processing, and the client can predict when the download should start. With a dial-up line, it needs to start much earlier than on DSL/cable.

Re: R/C/G

Posted: Sat May 03, 2008 10:32 pm
by John Naylor
7im wrote:And the work Stanford did over the last year on the PS3 client has added 30,000 active clients, each capable of doing 2 WUs a day, at 20x the speed of PCs. How many more WUs does that add, vs. how many were not folded by upload/download delays?
I appreciate your point about priorities and I as much as the next contributor want to see all of the betas go Gold, but your point about PS3s, whilst counteracting the problem by increasing the speed at which units are completed, also adds to it - I don't know how big PS3 units are but that's a lot more delays while uploads and downloads are dealt with between PS3 units due to the increased numbers of units handled. I would guess the same applies to GPU and SMP units as well...

As for shortening timelags, my upload speed is theoretically 448Kilobytes/s, but I only actually get around 50KB/s for F@H (it's around 400KB/s for UK websites and uploads). For people within the US/Canada(/Mexico?) upload times may be more like the actual maximums of their wires/cables but for those outside North America we are limited by the intercontinental links' speeds and so would get more benefit from pre-emptive uploads. (unsurprisingly, the same applies for downloads, but the extra speed reduces this problem as F@H downloads are almost always smaller than uploads - I get 170KB/s for F@H versus 850KB/s for UK-based websites/downloads.. yet bizarrely, Microsoft's US based servers achieve the same speed as UK servers so maybe that is more F@H server related rather than intercontinental speeds?)