Thoughts on optimizing the assignment process

sptn. · Post by **sptn.** » Wed Mar 17, 2021 1:04 pm

sekramer10 wrote:[...] every computer produced should have F@H pre-installed on it and it runs without users even knowing it's there. [...]

You have to understand, that a normal Computer/Notebook in idle does not consume nearly as much energy as if it would run under full load with FAH. There are reasons other than disinterest. Sound level, money to spend on energy, heat ... and so on.

In my opinion FAH will be run mostly from people who are interested in computers or science in general. So you should try to activate scientific or computer newspapers/magazines. Thats how I found out about distributed computing 15 years ago.
Furthermore you have to point out what was achieved so far. a lot of ppl do not spend money on something with no near goal to reach.

For example: I emailed a widely know radio station that runs several science and knowledge programs. I did not even get an answer. Imo the topic is to specific and not catchy enough for the wide public... unfortunately.

Edit by Mod:
These posts were split off of another topic where the discussion diverged from the original topic being discussed.

ajm · Post by **ajm** » Wed Mar 17, 2021 1:20 pm

sptn. wrote:
sekramer10 wrote:[...] every computer produced should have F@H pre-installed on it and it runs without users even knowing it's there. [...]
You have to understand, that a normal Computer/Notebook in idle does not consume nearly as much energy as if it would run under full load with FAH. There are reasons other than disinterest. Sound level, money to spend on energy, heat ... and so on.

If it ever gets close to pre-installing FAH on "normal" computers, there will be solutions for these problems, developed by companies like Intel, AMD, nVidia, Microsoft, Apple, etc.
From where we are, I'd say the biggest hurdle is to gather some trust for this system outside of the geek microcosm. And this means information of the kind the OP would like to spread.

Post by **bruce** » Fri Mar 19, 2021 7:59 am

FAH will always be a geeky alternative. You'll never get the kind of support you'd need from Microsoft/Apple. They both tend to deprecate tools (like OpenCL) and anything that involves heavy computing (i.e.- tends to produce heat) in favor of things that actively depend on a person who sits in front of a screen for hours at a time with their hand on the mouse, ready to click on the next product being advertised.

[Yes, OpenCL is old, but it's still useful as a Compute Language.]

sptn. · Post by **sptn.** » Fri Mar 19, 2021 9:03 am

ajm wrote:
sptn. wrote:
sekramer10 wrote:[...]
If it ever gets close to pre-installing FAH on "normal" computers, there will be solutions for these problems, [...]

And how do you think will these solutions look like? I mean the "information" is "created" with electricity. So less heat means less electricity and therefore less information. Unless the semiconductors get more efficient the emitted heat will ever be the same... IMO. Please correct me if I am wrong!

ajm · Post by **ajm** » Fri Mar 19, 2021 9:34 am

Well, not sure I'm the one to ask that. But first there are the tweaks that some are dreaming to implement in the present version of FAH to better modulate its way of functioning. Say a series of options that would allow FAH to work or to send specific WUs only to certain users at certain times or in certain situations (time of day, day of the week, vacations, activity of other software, ambiant temperature, etc.), all this in accordance with the capacity of the users' system(s). This is intricate and requires a lot of work and testing. Almost out of reach for the current FAH team, but hardly for a dev team within a large company.

Then there are solutions that the chip makers could implement, to regulate the wattage/heat depending on several factors, some automatic, some user-defined, when using FAH (or OpenCL/CUDA). I don't have a clue how this would work, sorry, but I'm confident that it is at least possible.

Then there are the users' demands. If enough regular people (or say regular business owners, or sysadmins) do want to use their hardware to "save the world" with FAH, implementing ways to satisfy them could become a selling point, at which moment we would have Microsoft and Apple on board.

Post by **bruce** » Fri Mar 19, 2021 8:34 pm

Looking at it from another perspective, less heat means the result may not be turned in by the deadline so it will be discarded and zero information is nothing but wasted electricity.

ajm · Post by **ajm** » Fri Mar 19, 2021 8:59 pm

bruce wrote:Looking at it from another perspective, less heat means the result may not be turned in by the deadline so it will be discarded and zero information is nothing but wasted electricity.

With the present rather rigid system, yes, definitely.
But let's imagine we have 1000 times more machines available, or 1.000.000 times more, and a really evolved system, with several large dev teams, able to distribute and collect work according to local effective capacities, taking into account lesser or only intermittent output, and on the other hand easily grouping whole farms of very powerful kits when idle. Close to all deadlines would be respected and science would be immensely richer. We really would have THE supercomputer, which capacity would grow simply together with the world's computing capacity.

Maybe it's just a dream, simply not atteignable. But it certainly is a good direction to point to when deciding to further develop the system, looking for new partners, and broadly communicate about the whole thing.

Post by **bruce** » Fri Mar 19, 2021 9:53 pm

FAH cannot use 1000 machines on the same WU. Each WU that is completed generates a new one. The WUs must be processed serially.

Suppose you have 1000 people enter a relay race and each person quits after running 500 meters. The race will never be completed.

FAH builds trajectories and your lap starts from the point that somebody else finished their assignment but we won't know where that will be until they finish their segment of the trajectory. You have to simulate ALL of the atoms since varying forces exist between every pair of atoms. We can't give you 0.001 of the atoms and you can't redistribute that portion of the protein to all 1000 other people who want to work on it.

ajm · Post by **ajm** » Sat Mar 20, 2021 6:55 am

But assuming that FAH knows the users' kits and their kind of utilization, as dynamically as possible, slower or only intermittently available systems would only get WUs, or projects, adapted to their effective possibilities.

As of right now, for the projects currently running, there is a difference from 1 to 15 for the timeout and from 2 to 20 for the deadline. If WUs are distributed according to the real local capacities and availabilities, more deadlines will be met. All the more if the system can chose among a much greater number of available kits and thus optimize the chances of running smoothly.

Then, those availabilities will also be known to the researchers, also in advance (vacancies, etc.), so that they can tweak their projects accordingly. Thus, as an example, urgent projects would run only on very powerful kits, and long-term projets would rather make use of slower, but more numerous kits. I'm sure they will be able to figure out how to optimize those ressources' use if the information flows more fluidly.

Same thing for the servers.

If you at least aim at such a solution, you have better chances to see it emerge.

Post by **bruce** » Sun Mar 21, 2021 12:27 am

It sounds simple, but it's not. Suppose two kits are identical except for local tweaks (power limits/overclocking/etc/etc.) They won't produce identical performance even though they might be "close enough."

Now suppose one person runs 24x7 and the other runs N hours per day (N<<24) or maybe one is shut down on weekends. FAH certainly cannot assume those systems should get assignments from the same pool of projects. There still will be non-optimum WUs assigned. All that really can be accomplished is to minimize the assignment of WUs that will miss the deadline.

Then, too, If my system has a problem requiring FAH to be unexpectedly off-line, something is probably going to be dumped.

ajm · Post by **ajm** » Sun Mar 21, 2021 7:26 am

All true, but again, a new and enhanced FAH would allow users to indicate (options) that their systems are only intermittently available, or only at n% of their nominal capacities. And that new FAH would also be able to learn that by experience if users don't bother.

Then, thanks to this, the deadlines and timeouts for time-sensitive projects could be much more precise. For example, a high-end GPU working 100% 24/7 can handle WUs much faster than the present deadlines, say in an hour, or half an hour. If the deadline are thus adapted, it would take say max. an hour for dumping a WU when a kit is in trouble, instead of at least a whole day presently. Then kits with problems would not get time-sensitive WUs until the user has at least acknowledged the issue and taken measures. Or FAH could then send a less time-sensitive WU and check whether the kit had regained its nominal (or announced) capacity or not, and act accordingly. And so on.

This could be in large part automatized, too. That is, there would be regular handshakes between FAH and the kit, with an exchange on informations essential for the folding. If we can work on that with chip makers and OS developers, the sky's the limit.

ajm · Post by **ajm** » Sun Mar 21, 2021 8:18 am

Then for slower systems, there should be a possibility to transfer an initiated WU using checkpoints when it appears that the machine won't make it on time, for some reason, instead of just waiting for the deadline, dumping and reassigning.
Assuming we'll still have projects that run on more than a day's work per WU, that function would help save a lot of time and power. Such scenarios are likely on laptops, most of which won't be used 100%. But laptops are extremely common computers nowadays, so that we really should use them. Such partial WUs could be exchanged more freely on a local network, for example, without cluttering FAH's servers. And people with broadband Internet (in my region, 10Gbit/s at home already is a thing) could offer some sort of swapping space on their system for such situations, a bit like they now offer CPUs and GPUs.

Post by **bruce** » Sun Mar 21, 2021 11:21 pm

ajm wrote:Then for slower systems, there should be a possibility to transfer an initiated WU using checkpoints when it appears that the machine won't make it on time, for some reason, instead of just waiting for the deadline, dumping and reassigning.
Assuming we'll still have projects that run on more than a day's work per WU, that function would help save a lot of time and power. Such scenarios are likely on laptops, most of which won't be used 100%. But laptops are extremely commun computers nowadays, so that we really should use them. Such partial WUs could be exchanged more freely on a local network, for example, without cluttering FAH's servers. And people with broadband Internet (in my region, 10Gbit/s at home already is a thing) could offer some sort of swapping space on their system for such situations, a bit like they now offer CPUs and GPUs.

All assignments are considered to be time-sensitive. FAH wants every result returned as quickly as possible. The QRB bonuses are designed specifically to reward anybody who can do better than they've been doing. Moreover they don't to waste any more of anybody's processing time if it can be avoided. Also consider the fact that you're volunteering your resources and you can change your operating hours any time you feel like it or dump a WU any time you feel like it or simply quit. Now design a system that can meet all those requirements ... but also try to optimize the assignments.

A whole series of segments of each trajectory must be completed as soon as the previous segment is returned. How long should it take to maximize the length of the trajectory before the results much be inspected. WUs should be assigned only once and every one of them has to be completed.

The timeout should reassign every WU that's lost because they have to force the work to be unnecessarily repeated if it's just "late" Thus a precise assignment can only be achieved if the number of hours per day that your system processes the WU can be predicted. The cost in wasted/duplicated work becomes a tradeoff forced on FAH when that prediction turns out to be wrong.

Suppose I have a precise benchmark for your machine and my prediction becomes imprecise because you happen to alter your operating hours. (We can't force you to adhere to my prediction.) What happens after I make the prediction, if you change your mind about something.

At this point, we don't have a field to record a prediction for your each of your slots and it's probably not worth creating one. This makes the benchmark information only slightly better that simply using random assignments because the timeout/reassign method has to accommodate whatever corrections are necessary.

Post by **bruce** » Sun Mar 21, 2021 11:36 pm

Oh, I forgot to mention: How many trajectories should I create?

If I know that there are (N +/- K) people who will be working on my project and K is a small number, I will create exactly N trajectories with a few extra. When each WU is returned to the server, it will be immediately reassigned to that same person. Otherwise there will be a lot of idle WUs sitting on the server, wasting good calendar time.

In other words, the project owner has to predict how many folks will be looking for work at the same time. I have little doubt that the variations, K, are impossible to predict, donor statistics notwithstanding.

Post by **bruce** » Mon Mar 22, 2021 2:30 am

ajm wrote:Then for slower systems, there should be a possibility to transfer an initiated WU using checkpoints when it appears that the machine won't make it on time, for some reason, instead of just waiting for the deadline, dumping and reassigning.

That sounds like a great idea, but as an enhancement, I doubt FAH's development department will work on it. FAH is designed to be a set-it-and-forget-it activity, and manipulating files is not going to be supported. In fact, if a WU starts on one computer and is returned from a different computer, then somebody is trying to hack somebody else's points (even if you own both computers).

It used to be called "sneakernetting" before the advent of home-based LANs.

Folding Forum

Thoughts on optimizing the assignment process

Thoughts on optimizing the assignment process

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Thoughts or optimizing the assignment process

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H

Re: Promotional Materials for F@H