Allocation of WUs algorithm

Foxter · Post by **Foxter** » Thu Jun 04, 2020 9:07 am

Hello

A friend of mine has a "Folding Home" project and he has multiple PCs all running GPU only folding (using CPUs would eat too much power and his electric bill is already huge):

- AMD 1920X processor and 2 x RTX 2080 Super GPUs
- AMD 1900X processor and RTX 2080 Ti + RTX 2060S GPUs
- Intel 6850K processor and RTX 2080 Ti + RTX 2070S GPUs
- Intel 3820 processor and 2 x RTX 2060 GPUs
- Intel 4590 procesor and RTX 2060S + GTX 1650 (Non Super, no power supply connector) GPUs
- Intel 4790K processor and GTX 1660 Ti + GTX 1660 Super GPUs
- Intel 7500 processor and GTX 1650 Super GPU
- Intel G4560 processor and GTX 1650 Super GPU

As you can see that is a lot of folding power to run 24/7, however the issue here is the allocation of WUs, while his most powerful cards receive WU from small projects that barely use the RTX cards and generate a low PPD, his less powerful GTX cards are flooded with large WU that take a long time to process and also provide a low PPD.

My buddy tells me that his RTX cards get WUs with 8500-12500 base credit and the GTX cards get WUs with 49500-53000-56000-59500 base credit.

Since (as far as I know) a higher PPD means more help for science, would it be possible to create a algorithm that detects the video card model and assign large WU to powerful GPUs and WUs that require less powerful GPUs to less powerful video cards?

Post by **PantherX** » Thu Jun 04, 2020 9:18 am

Welcome to the F@H Forum Foxter,

The current system was designed several years ago with tweaks and modifications to keep status quo. Over the last few years, there has been a huge diversity of hardware and each generation of hardware can be more powerful than the previous which has created a rather unusual situation. There has been requests for a better system:
https://github.com/FoldingAtHome/fah-issues/issues/1479
https://github.com/FoldingAtHome/fah-issues/issues/1504

However, that would be a massive undertaking by the Development team. With all the recent attention and volunteers, it might be possible but there's no ETA or confirmation that it will happen. I am sure that this would be a win-win situation where researchers can better target their Projects at specific devices and Donors would get WUs optimized for their setup.

Foxter · Post by **Foxter** » Thu Jun 04, 2020 9:29 am

Thank you for your reply.

I am glad that my suggestion is already taken into consideration, maybe, when time will allow it, it will be implemented.

JimboPalmer · Post by **JimboPalmer** » Thu Jun 04, 2020 9:37 am

Welcome to Folding@Home!

I will tell you what I think, and other will correct me with facts. <G>

Lets start with CPUs:

The client passes to the server the number of threads (F@H calls them CPUs) it will fold with. The server looks for a Work Unit that is enabled for that number of threads, and assigns it to download.
The server does not know how strong your CPU is, an Intel Atom gets the same WU as a Intel Xeon with the same number of threads. The client then detects if the CPU can do AVX_256 or is older and can only do SSE2.

For GPUs:

The Client passes the Vendor and the Species of the GPU. (In your example, they are all the same species because they are all Turing GPUs)

If you had a GTX 1080, it could tell a RTX 2080 Super from a GTX 1080, which is Pascal, Not because the GTX 1080 is slower, only because it is older.

So the server does not send WUs a card can't do, but it may send WUs the card can't complete on time. This would be more an issue with old Mobile GPUs.

Using a AMD example, Navi cards use the RDNA instruction set, where most AMD cards use GCN. Core_22 can run on a Navi, but Core_21 cannot. So your Turing cards may be assigned Core_21 or Core_22, the very newest AMD cards must get Core_22. And the server handles that.

But nothing the client passes the server has to do with the speed of your GPU, only it's features.

https://en.wikipedia.org/wiki/Advanced_ ... Extensions
https://en.wikipedia.org/wiki/SSE2
https://en.wikipedia.org/wiki/Turing_(m ... hitecture)
https://en.wikipedia.org/wiki/Pascal_(m ... hitecture)
https://en.wikipedia.org/wiki/RDNA_(microarchitecture)
https://en.wikipedia.org/wiki/Graphics_Core_Next

Post by **PantherX** » Thu Jun 04, 2020 9:48 am

The only addition to JimboPalmer's post would be that the Server also knows what OS you run. Occasionally, some Projects are limited to OS due to technical reasons. Same applies to GPUs, i.e. under some conditions, Nvidia GPUs get certain Projects while AMD GPUs don't. These are edge cases and are rather rare.

JimboPalmer · Post by **JimboPalmer** » Thu Jun 04, 2020 9:59 am

PantherX wrote:The Server also knows what OS you run. Occasionally, some Projects are limited to OS due to technical reasons. Same applies to GPUs, i.e. under some conditions, Nvidia GPUs get certain Projects while AMD GPUs don't. These are edge cases and are rather rare.

Let me tag team off PantherX!.

For CPU folding as an example, there are times you are running a CPU with the AVX_256 instruction set, but Windows 7 and before did not initialize or save the AVX registers, so that CPU will do bad math with that OS. Older versions of Linux and MacOS as well. So the server has to refuse to send work to the old clients, as it won't succeed. When Core_a4 was active, you could send an a4 WU, as they were all SSE2, AVX is new in Core_a7.

CPUs back to 2000 can fold, (They may not finish in time!) But they need an OS from Windows 7 Service Pack 1 to fold.

Foxter · Post by **Foxter** » Thu Jun 04, 2020 10:04 am

From what can I see in the first of the threads PantherX provided, the best solution would be to benchmark each model of GPU and based on the score, assign WUs fit for that.

I am not talking about benchmarking each model from all the manufacturers, just choose a card that has the clocks as close as possible to the reference specifications of the GPU manufacturer and use it as a reference.

Of course different manufacturers may decide to factory overclock that GPU but overclocking should not influence that much the results.

So when the server communicates with the client, it could also ask, please provide the GPU code for the used video card, receive the code and say, I see, here is a WU best fit for your video card.

Neil-B · Post by **Neil-B** » Thu Jun 04, 2020 11:58 am

Add in the complication of "folding/usage patterns" and it starts to get interesting as an AI problem

Even with the "hard data" about specs of kit, the "soft data" about usage patterns - either simple such as "24/7/365 dedicated use" or "only for a couple of hours each evening Monday to Friday" through the to the truly complex such "as on idle during work days, full time at weekends except when raiding and off when the weather is too hot or the great aunt is visiting (as she stays the spare room the kit is in) and with a totally different pattern during school or public holidays" is probably the biggest "unknown" … It is actually quite a nice ML/AI challenge to look at the patterns of existing returns, identify the relevance lurking within the existing data returns/captures and "auto-tune" an allocation pattern based purely on existing patterns with some form of temporal cycle and drop off - oh yes, and with a suitable self learning/adjusting element that detects and responds appropriately to significant deviations from the "expected profile"

From one perspective massive challenge to actually enable/change existing workflows to allow this - but from the other probably quite simple is approached in the right manner to develop and lighttouch enough not to eat up massive compute resource and hammer compute on the AS.

Not saying this should be done … but actually simply tuning an ML/AI off experienced usage patterns and track record even without any hard data input might get better throughput and allocation than simply trying to allocate based on the hard data about the kit ??

Foxter · Post by **Foxter** » Thu Jun 04, 2020 2:30 pm

That could be solved a bit easier if you implement some questions, in the customer client upon install, like "Is this a dedicated folding PC, that will fold 24/7, if not how many hours do you expect to fold daily in general or you could implement in the client a new option for the user to choose from like -Dedicated folder (24/7 folding most of the time), Standard folder, (8 hours a day most of the time), Light folder ( 1-2 hours a day most of the time).

Neil-B · Post by **Neil-B** » Thu Jun 04, 2020 3:13 pm

Problem is with "some questions" is that in my experience these tend to be answered from sometimes very different perspectives and understandings of what the questions actually mean - and for the most part people want just to do what they are trying to do rather than answer a list of questions - so at best they answer them as quickly as possible, with little thought or without really understanding (or caring) why the answers might matter and at worst they user the answers as an opportunity to try and game the process

... even over a very short time period a learning algorithm have the potential to spot/track the WU turnaround patterns actually occurring and adjust assignments accordingly.

It doesn't really matter if it is a blazingly quick GPU only running for just 1/2 hr a day or an old toaster of a card running 24/7/365 ... what matters is "are the WUs being returned within the deadlines?" and possibly in greater details "which endpoints are returning WUs the quickest" ... a ML/AI approach also has the potential to respond to issues before people know they are issues and alert people to these - A certain WU regularly faulting on a specific endpoint could be off listed for assignment and a message sent to the folder that investigation it required ... It could also be trained to spot the more complex patterns such as "no weekend folding" and alert AS not to issue urgent short timeout WUs to an endpoint where this has been observed before the weekend - but at other times issue them to this endpoint as a priority is actually blazingly fast when it is up

Data about the hardware is important for faults - but again this can be an automated collect as part of the error reporting (it may be already for all I know) and so spotting issues with certain types/makes/models of GPU with failures of certain WUs would be possible - and without too much effort required could probably make a damn good stab at identifying the OC'd / UC'd kit, non-vendor drivers, even hardware failure cases ... OK ... So I trivialise a little the coding needed to get an ML/AI to be all singing/dancing, but to do a first pass at the simple stuff would be far easier and imho a heck of a sight more accurate/reliable than resorting a list of questions the answers to which will likely be of dubious reliability.

Don't get me wrong - I really believe there is massive efficiency/improvement to be gained from taking the assignment process to a new level - I just think the observational and automated data generation approach of ML/AI might work better/more accurately and require less pain and intrusion than a Q&A style approach.

JimboPalmer · Post by **JimboPalmer** » Thu Jun 04, 2020 3:18 pm

I am going to point out that just keeping track of the features the servers already need to decide has overwhelmed them for the last 3 months. If you dream up more things the servers need to do, start by dreaming up more funding for more or more powerful servers.

I do not see that any of your schemes will get more science done, but that is me.

Carry On.

Neil-B · Post by **Neil-B** » Thu Jun 04, 2020 3:30 pm

Actually most if not all the data work needed for an ML/AL approach is already done today - just not being utilised - the additional processing load need not be significant tbh ... If one has the data using it is usually a good idea

... The first stage of simply prioritising assignment of WUs to kit that is likely to fold quickly/within timeout isn't even really ML/AI just a simple flag setting exercise ... The gain to science is that WUs would be returned significantly more consistently within timeouts and higher throughput endpoints could be prioritised over those unlikely to meet deadlines ... If even a small percentage of the timeout reissues where avoided (and this approach could deliver just that) then actually the servers would be less loaded and spare compute might then be "found" to start to deliver the cleverer (but still lightweight) possibilities.

Tbbh ... I don't actually see any approach to improve assignments happening soon as in the big scheme of things it is generally seen as "not worth it" and so given no real consideration ... but that is just me

Foxter · Post by **Foxter** » Thu Jun 04, 2020 3:40 pm

@JimboPalmer

I did mention "when time will allow it" in one of my previous posts.

Most likely when that time will be available, enough servers will be available too. I am offering now just some suggestions for future implementations since I understood after reading PantherX reply to my first post that it will take quite some time to implement a new allocation of WUs algorithm.

I am quite aware of the server overload, it just happens that the saying "be careful what you wish for" applies here: "We would like to have a million folders" Now you have that.

It's bad that it took an epidemic to get that number of folders, it's good to see that so many people have volunteered to help.

ajm · Post by **ajm** » Thu Jun 04, 2020 4:00 pm

There are now enough servers. What is still missing is programming manpower. How is FAH 8.0 doing?

It would be nice to have at least some easy possibility to adapt one's installation to one's needs. That it just works right out of the box is okay for a start, but it also should offer some tweaking features for people who, after a while, would be happy to use them. For example the possibility to only accept large or small jobs for certain slots. A weekly calendar for people who know that they will only use their system some days at some hours (the servers can then prevent sending them huge jobs 15 minutes before they shut down for a long week-end). Of course, if they don't want to take advantage of it, that's their problem. And I would implement some alert system, emailing people whose kits return too many faulty WUs (see CERN problem in another thread). For example.

There sure are many such features that would make life easier for donors as well as for the FAH team and their servers. But for all that, what is needed is programmers and a software that is built from the ground up for integrating this kind of tweaking.

Post by **PantherX** » Thu Jun 04, 2020 8:02 pm

Neil-B wrote:...Data about the hardware is important for faults - but again this can be an automated collect as part of the error reporting (it may be already for all I know) and so spotting issues with certain types/makes/models of GPU with failures of certain WUs would be possible...

Unfortunately, when a bad WU is reported, it doesn't send any details of the hardware. Hence, the science.log file is needed to see what happened to the WU.

Speaking purely based on observations, the AS doesn't store any information about the client except the CPUid. That's why every time a new WU is requested, the same "data" is sent and then the AS assigns the client the best WS and then the client connects to the WS to download the WU.

Folding Forum

Allocation of WUs algorithm

Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm

Re: Allocation of WUs algorithm