Page 1 of 4

Spread the word...

Posted: Tue Feb 14, 2023 5:12 am
by vahid.rakhshan
Perhaps the most crucial help to these disease-fighting supercomputers (F@H, Rosetta, GPUGRID, etc.) is to spread the word as much as possible.

I mentioned them on ResearchGate.

Possibly leave a comment there and share the link with others (or share any other link you feel fit). Perhaps Twitter, Facebook, Instagram, and other networks would reach a much broader audience. I know the FAH community is already doing this, but the fact that none of my friends and colleagues knew about it shows that there is a lot of room for improvement.

I imagine a day all libraries or school computers or other public resources that are usually used for mundane tasks such as reading documents (and hence have about 100% spare CPU capacities) join this noble cause and help fight diseases. It would be much nicer if all computers around the world (public or private) could become a part of these supercomputers. Of course, after taking into consideration the greenness and other limitations of such expansions.

Re: Spread the word...

Posted: Tue Feb 14, 2023 5:30 am
by vahid.rakhshan
But aside from spreading the word, I think FAH should go easier on donors by making the computations more user-friendly. I have the following suggestions for this (recapping my previous suggestions):

1. More flexible algorithms that allow a processed work unit to be used even if it is not 100% finished until its deadline. For example, if my CPU couldn't finish the task at 100% but reached 86%, the whole amount of calculation should not be discarded and the next person should not start crunching from scratch; instead, she should continue from 86%. This seems very useful for preventing the waste of any unfinished computations and allowing ANY CPUs and GPUs (even the weak ones) to contribute.

Currently, only powerful-enough CPUs/GPUs can be used. And even with them, many users would need to keep their computers always on. This makes the whole process not user-friendly anymore. Many donors would prefer to live their routine daily life and help science in parallel; and not to disrupt their life routines by babysitting the program to make sure it finishes its assigned Work Unit on time.

2. Besides that, I think it is technically possible to break down the current Work Units into smaller pieces. So please implement it and help weaker computers join the grid. By smaller, I don't mean smaller sizes of files to be sent to FAH clients over the internet. I mean smaller amounts of computations needed to finish a task (Work Unit).

3. Also expanding the deadlines would help a lot. It would again allow the user to relax a little bit more and live his routine life, without worrying about finishing the task on time.

4. It would be very good if both the CPU and GPU (or all the processor units within one computer or within one computer network) could process the very same Work Unit together.

Re: Spread the word...

Posted: Tue Feb 14, 2023 3:17 pm
by kiore
I concur with your first post, yes share the news of how these projects can contribute to science.
For your second post, well I understand wanting to make this as broader base as possible to encourage contributions, but think it more effective to let the researchers set the size of the information packets they require to move onto the next steps, this will be self limiting of itself if the pool of contributors can't manage them.

Re: Spread the word...

Posted: Tue Feb 14, 2023 7:06 pm
by vahid.rakhshan
kiore wrote: Tue Feb 14, 2023 3:17 pm For your second post, well I understand wanting to make this as broader base as possible to encourage contributions, but think it more effective to let the researchers set the size of the information packets they require to move onto the next steps, this will be self limiting of itself if the pool of contributors can't manage them.
Thanks kiore. Let's spread the word. :)

Regarding your disagreements, thanks. My suggestion of lowering the demand was not only for the sake of broadening the donor base. It was also to make donors happier and more comfortable, or to make the project more friendly (which is not much now).

I am almost aware of (some of) the limitations the researchers may have, from their website. But at the same time, I know that if a very large task can be broken into hundreds or thousands of Work Units (hence, the term "distributed computing"), it is also definitely possible to break down many of those more demanding Work Units into yet smaller sub-Work Units.

I think the only reason this has not been implemented yet is the lack of adequate funding and paid programmers and not any technical (e.g., computational or mathematical) difficulties.

Even if it was something mathematical-wise (which might not be really the case), I think even in that case too, some compromise would be definitely necessary. The researchers should at least find a middle ground. The current level of demanding computations compounded with this hostile all-or-nothing strategy (that simply truncates Any task other than the 100%-completed ones) really converts many donors into full-time babysitters. This counterintuitive strategy defeats the very purpose of this project, which is to use the Excess capacity of home computers.

In other words, it is no more "using the Excess capacity" the way it was supposed to. Instead, it hungrily demands donors to give more and more resources to the project, or face their whole contribution going to waste (because of not being totally finished on time). As an example, my CPU has been assigned a Work Unit that can be completed in about 3 to 4 days if running 24/7 at full speed. And the deadline is 4 days too! The only solution to fulfill this demanding task is to maximize the CPU speed and keep my laptop charging constantly (which can wear down its battery). This not only stretches my laptop's health but also makes me uncomfortable and preoccupied with an always-on and running-at-full-speed computer. You know, humans need to have some calm and relaxed hours, which is usually associated with a lack of any sympathetic activities and an emergence of parasympathetic activities. Now imagine someone wants to relax while at the same time he knows that his computer is actually doing a tremendous amount of calculations at full speed to catch a deadline, not to mention the fan noise. All of these suppress any parasympathetic activity and bring back the full-power sympathetic activity. That may not be the case for many users, but some of them like me definitely get over-excited and agitated by such unhealthy habits. That is far from optimum or user-friendly and also not anymore "using the Excess capacity".

As an alternative solution, this very task of mine could be broken down into 10 or 20 sub-tasks distributed to 10 or 20 happy donors, all of who could run the client in the background and totally forget about it. They could continue their lives without even thinking about the client anymore. They could simply turn off their computer or unplug it whenever they want, and yet know that at the end of the day, they are really contributing to science.

As yet another alternative solution, this hostile all-or-nothing strategy could be revoked and any sub-100% contributions could matter. In such a good scenario, I would know that even if I can handle only 40% of a Work Unit, it will matter and will be used to fight diseases (and that the next donor would start from 40%). Again, I would totally forget about the client and would go on with my life as usual. Totally user-friendly.

Please note that I am not complaining. By my tone, I am just trying to ensure that my feedback is being heard by the developers.

Re: Spread the word...

Posted: Thu Feb 16, 2023 3:05 am
by kiore
Feel free to complain, this not prohibited and you are a donor so you can ask questions about your donation of resources. I understand your point and will not indulge in any special pleading that F@H is not like X other distributed computing programs.
I do if fact agree that the original "do research on your screen saver" is nolonger how things work for any of the distributed computing programs, as well screen savers now reside where dial up modems went to die.. Now people are being asked to donate their excess computing power in a more active way, by actively running available hardware for a cause, whatever that might be. Some people build dedicated hardware just to do this, others contribute hardware while they not using it or not using it to its full potential. The screen saver idea of millions of donors passively generating tiny pieces was great but its time has gone and now many people have vast resources available and can choose to donate or not.

Re: Spread the word...

Posted: Thu Feb 16, 2023 6:30 am
by vahid.rakhshan
Thanks for your response. No, I really was not complaining. I was just being enthusiastic. But I agree it is becoming agitated, frustrated, and bordering on complaining!
The screen saver idea of millions of donors passively generating tiny pieces was great but its time has gone and now many people have vast resources available and can choose to donate or not.
What you gave as an example was too extreme and also not at all what I said. I was not talking about the screensaver option or about lowering the expectations from donors to this. I repeatedly said "middle ground" and "compromise". So I was talking about some very practical suggestions to make this computation
(1) much more user-friendly, --> favoring the donor
(2) much more efficient, --> favoring FAH
and (3) available to more users. --> favoring FAH again

Is this bad?

(On a side note regarding the screensaver thing: Well, the very default setting of the latest version of FAH is still some sort of "screensaver computation", i.e, "idle processing". So I don't agree that the screensaver option is not anymore used.)
Now people are being asked to donate their excess computing power in a more active way, by actively running available hardware for a cause, whatever that might be. Some people build dedicated hardware just to do this, others contribute hardware while they not using it or not using it to its full potential.
I too set the FAH client to run when I am working and also set it to run at full capacity. But I need to sleep sometimes, and during that time, I don't like my machine crunching data and making constant fan noises. Is this too much? :) Is this not active enough?

However, FAH is not pleased with the above; according to the FAH server, I am not allowed to turn off my computer or even throttle it down 24/7, because if I do so, my Work Unit will expire, and subsequently my whole relentless effort would simply go to waste (because FAH refuses to use sub-100% WUs and simply discards them). I don't call this strategy "excess computing in a more active way". I just call it, dare I say, exploiting, hostile, and too demanding, if not evil and malicious. The server literally tries to enslave the donor in some way! This is not by any means "excess computing in a more active way". ;)

All I am asking for is some FLEXIBILITY: The system can be very demanding to those who can or want to dedicate full-time powerful machines to it but at the same time, flexibly less demanding to those who can't or don't want to become babysitters to the FAH client.

And my suggestions are both practical and beneficial not just to the donor but also to the system. I am not convinced that they are not technically good or practical.

Of course, FAH can choose its strategy. But if it chooses to remain too demanding the way it is now (or even becomes more demanding than this), at least it should not claim anymore that it will be using the EXCESS capacity of the donors' computers; instead, it should simply warn the potential donors beforehand about the hassle (that they will be stuck in an all-or-nothing situation with very computation-heavy tasks).

Re: Spread the word...

Posted: Sun Feb 19, 2023 2:15 am
by Lazvon
I didn’t quite understand Vahid’s point before, but now I do.

By handing out a WU with an expiration that is unachievable if not using nearly full resources, the limitations are great for the casual user.

I see where they are coming from. Is that the way it works? Or does a casual user get a longer timeout?

Re: Spread the word...

Posted: Sun Feb 19, 2023 10:30 am
by BobWilliams757
Lazvon wrote: Sun Feb 19, 2023 2:15 am I didn’t quite understand Vahid’s point before, but now I do.

By handing out a WU with an expiration that is unachievable if not using nearly full resources, the limitations are great for the casual user.

I see where they are coming from. Is that the way it works? Or does a casual user get a longer timeout?
Time limits are set regardless of which machines pick it up. For the most part, if a person has hardware that puts them at the lower end of the selection limits, the amount of time folding on that machine is a greater percentage of the given time to complete. In the case of CPU folding it's a bit more forgiving I think, as even turning off PBO and such most modern machines will complete the WU's with plenty of extra time to spare, based on the number of cores they use.

With GPU's the breakdown of species tries to avoid the time crunch, but with things advancing so quickly it's hard for them to keep up and keep them in the proper species category. But there is also the catch that if they reclassify the species designations, then there are people who were willing to fold 24/7 and make deadlines, who are now upset that the new category they are put in means they no longer have work units to process.

I do understand the point being made, and until I picked up a GPU I was in the same situation. The work units had grown to the point that my onboard iGPU would barely meet the timeouts on a regular basis. I could keep it running and let it complete before the deadline and still get points, but possibly just be doing duplicate work. Or I could CPU fold, but CPU units do at times get more demanding as well. At this point I don't think folding is really user friendly for those that only want to run their systems 6-8 hours a day.... unless those users have higher end GPU's. Since the quick stuff finishes work so quickly it's easier to only fold part of the day. People with older stuff are going to pick up work units that take much longer to finish because their hardware is slower.

Re: Spread the word...

Posted: Sun Feb 19, 2023 11:51 am
by Lazvon
Certainly sounds like the “excess capacity” wording should stopped being used described as what folks can “donate” to help. Or maybe change it to “excess capacity of high end graphic cards”.

Obviously I dedicate whole machines (well, not their CPUs) to the cause because advancing research is important to me, I enjoy building systems, and I can afford it - so why not. But agree to describe it as “distributed” as it once was isn’t quite the way things work out these days it sounds like.

Re: Spread the word...

Posted: Tue Feb 21, 2023 3:04 am
by BobWilliams757
Lazvon wrote: Sun Feb 19, 2023 11:51 am Certainly sounds like the “excess capacity” wording should stopped being used described as what folks can “donate” to help. Or maybe change it to “excess capacity of high end graphic cards”.

Obviously I dedicate whole machines (well, not their CPUs) to the cause because advancing research is important to me, I enjoy building systems, and I can afford it - so why not. But agree to describe it as “distributed” as it once was isn’t quite the way things work out these days it sounds like.
I think it really comes down to those with already limited system resources wanting to keep folding. In some cases it's a matter of hardware that is really obsolete, but certain versions, or similar systems used only for folding 24/7 can meet deadlines with the same hardware. In cases like that they can either let it continue to run and anger some of the users, or remove it from any folding which again angers some of the users.

If there were options for those that only want to fold part time it would be a great thing if in fact the effort in changes resulted in more work units being folded. But it would take programming effort, changes through the system, etc. I think part of the problem is the development paths are somewhat set, and with limited developer time to change things, only so much can be changed.

Just the term "excess capacity" can be taken in many ways. If you have a decent graphics card, are on the system 6-8 hours a day, and want to fold a work unit you probably can, and usually complete it within that time frame. On the flip side, if you pick up a long work unit that doesn't meet your usual computer time you have already exceeded that idea of "excess capacity" and have to leave the system on, possibly pause it until the next day, or in some cases lose the work unit do to the timeout not being met.

I suspect with all the higher end hardware in use for folding, they have to set priorities towards making sure those users have work to process vs spending equal time making sure users in need or hardware upgrades have work to process. It makes sense to use the quicker methods to finish the science.


I folded for about two and a half years on a system with integrated graphics. When it became obvious that F@H wouldn't have work for it much longer, I grabbed a GPU that should keep things moving along for at least a few years. But at the end of the day more users is a good thing, and if they could readily cater to a larger crowd it would never hurt.

Re: Spread the word...

Posted: Fri Feb 24, 2023 10:43 am
by vahid.rakhshan
I think many of the donors are quite good programmers themselves. Perhaps the F@H project can call for such donors to donate, this time, not only their CPUs but also their own brain and expertise and time to the project and recoding its algorithms.

If the problem with the current lack of flexibility and friendliness is the limited number of programmers, perhaps volunteer programmers can join the project and help with re-programming the whole system into something much more flexible than the current one.

Such a donation (donating one's programming skills and time) would boost the efficacy and speed of FAH computations so much more (maybe thousands of times more) than just receiving WUs and solving them.

Re: Spread the word...

Posted: Fri Feb 24, 2023 3:38 pm
by Joe_H
They already are getting volunteer programming help, however that still requires management. There is a discord for potential contributors to join.

The current open beta for version 8 of the client is open source. That move to open source was delayed by COVID. CPU and GPU folding cores are based on open source code, GROMACS and OpenMM respectively. Both are highly optimized already, there is not going to be 1000s of times improvement in the speed of processing.

Re: Spread the word...

Posted: Fri Feb 24, 2023 8:26 pm
by vahid.rakhshan
That's awesome news that there are volunteer programmers!

I was not talking about just optimizing the local computations being run on each single computer (i.e., GROMACS and OpenMM). I was talking about new abilities such as those I suggested above (or any other ones) to increase the FLEXIBILITY of FAH and optimize the efficacy of the whole distributed computation. Let me recap them once more:

1. Unfinished WUs should not be truncated. Instead, the next donor should start from where the previous donor had left off. For example, if I can manage to finish 73% of a WU, the next donor should do the computations from that point (73%) and not from 0.

2. There should be some option to allow both the CPU and GPU to simultaneously solve the same WU --> to help mediocre computers manage WUs better. The current strategy that doesn't have this option looks like the "divide and conquer" strategy but in a bad way. :) It divides the power of the CPU and GPU and disallows them to work together effectively. I know this is much better for very strong GPUs that Can work separately because each of them has a different architecture and resources. But in moderate computers, it is better to join them into one single computing body.

3. Many WUs are unnecessarily too large. The system Can and Should break them down into much smaller WUs. A large work unit should be broken down into 100 new WUs. A very large WU should be broken down into for example 10,000 small WUs.

Comparing a single large WU with a scenario in which it has been broken down into 100 smaller WUs: The end result will not differ at all for a single machine, but it will increase perhaps thousands of times or more the efficacy and speed of the whole distributed network; how?

For a single computer: If instead of one large WU, I have 100 smaller WUs, the speed of solving those 100 smaller chunks will be almost equal to solving that large unit because the extent of computations needed will be the Same. So a single computer will not be negatively affected.

At the same time, the whole grid will benefit a lot. For the whole system, having 100 small WUs will allow many new donors (with moderate computers who currently can't join the grid) to join the computations.

Besides, this strategy of smaller WUs will make the "wasted" calculations and electricity smaller (upon expiration). Because if my WU is incomplete and thus truncated, it will be a much smaller piece of computation that is being thrown out. Again, favoring the whole system.

***********************************************************

This point #3 also can be seen as a way to do point #1 (i.e., Unfinished WUs should not be truncated. Instead, the next donor should start from where the previous donor had left off.)

***********************************************************


3.5. It is possible to keep the amount of the calculations per WU roughly the same and standard for All WUs. Instead of having different "difficulty" levels of WUs, all WUs should be similar in terms of difficulty and amount of needed computations. And any simulation that exceeds that threshold of computation should be automatically broken down into a number of smaller pieces (WUs) until each of those smaller WUs become lighter than that certain threshold. This guarantees that all users can participate in All computations or projects, while at the same time, this strategy does not have any drawbacks in terms of the speed or efficacy of computations done by each computer.

Perhaps, it may even boost such computations because "divide and conquer" in a good way. Divide the WUs into smaller ones and conquer them!

4. Besides all the above, extending the deadlines would be good to help people finalize their WUs without any worries or waste.

Re: Spread the word...

Posted: Fri Feb 24, 2023 11:36 pm
by jonault
Breaking down large work units into hundreds or thousands of smaller work units just adds a huge amount of overhead to the whole process for faster computers.

Suppose I spend 30 seconds downloading & prepping a WU, 3 hours folding it on my RTX2080Ti, and then 30 seconds packing up the results & uploading them. Out of 181 minutes I'm spending 1 minute on overhead & 180 on folding, roughly 99.5% utilization.

Break that up into 100 WUs and now my computer spends 100 minutes on overhead and 3 hours folding. It now take 280 minutes to get the same results and it spends almost 1/3 of its time not folding.

Break it up into a thousand WUs and now it spends 1000 minutes on overhead and 180 folding. It takes 1180 minutes to do the same amount of work - over 6 times longer - and my computer spends 85% of its time on overhead.

Break it up into ten thousand WUs and now it spends 10000 minutes on overhead and 180 folding. It takes 10180 minutes to do the same amount of work - that's ONE WEEK - and my computer spends over 98% of its time on overhead. Not only that, each WU is now taking roughly 180 minutes/10000 - or 1 second - to fold.

That is not an improvement.

And that's completely ignoring the effect this has on the server side. Increasing the server workload by making it handle 10000x the number of WUs is not going to go well at all.

You should remember that the ultimate goal of folding@home is not to give people something to to with their computers, it's to assist researchers in doing their research. Old & slow computers that can't generate results in a useful time frame just aren't worth the effort to include because they aren't going to generate better results or faster results for the research, and the changes needed to include them are just going to slow everything else down. In the case of old GPUs, they don't even have the necessary floating point precision to do the math that's now required.

The one idea that's interesting is that when a WU needs to be abandoned the partial results could be uploaded & handed off to someone else so they don't have to start over from scratch. I don't know how practical that is but it sounds promising. OTOH, computers that consistently fail to meet deadlines should probably not be encouraged to continue participating & this would run counter to that.

Re: Spread the word...

Posted: Sat Feb 25, 2023 12:09 am
by jonault
And as for using CPUs and GPUs to work on the same WU, I don't know how practical that is either. It's my understanding that CPU and GPU work units are very different from each other - that CPUs are being used to perform certain kinds of simulations that are not well suited to the highly parallel architecture of GPUs. So they might not be able to effectively cooperate on the same WU.