Page 2 of 2

Re: 3.21.157.11 overloaded?

Posted: Wed Aug 12, 2020 9:36 am
by Neil-B
The WU needs to return to the WS that deployed it so the next gen can be created ... for the most part under normal loads this happens seemlessly ... these is an option for the researchers to state a CS(s) which can temporarily hold the WU until the WS can receive it - but it still has to go back to the WS.

So the process already exists and works if a CS has been set - but actually that just moves the problem as now the WS is trying to receive both the folders contributions and work from the CS ... Originally I believe this option was designed for WS failures or service outages - not to balance load.

There are no doubt ways to re-architect the whole way FaH infrastructure works ... but an easier solution is to work on balancing out the loads so the servers aren't under stress - and iirc this is a huge server to get balanced correctly.

Re: 3.21.157.11 overloaded?

Posted: Wed Aug 12, 2020 10:06 am
by PantherX
Neil-B wrote:...Originally I believe this option was designed for WS failures or service outages - not to balance load...
That's correct. Given that the original design was about 20 years old, there's technical debt and legacy decisions which needs to be addressed. Work was in the pipeline but that got side tracked with the pandemic and it will take time for things to get back on track. In the meantime, we can all fold COVID WUs to help out everyone :)

BTW, there is a new V8 client in development (no ETA) prior to the pandemic arriving so there will be an opportunity to "start fresh". Keep in mind that V7 was the first proper client that was written from the ground up and has aged well (about 10 years) and V1 to V6 were all written by researchers. I am not sure if V8 is a fresh re-write or not but my guess is that it might be a new code base. Time will tell what happens :eugeek:

Re: 3.21.157.11 overloaded?

Posted: Wed Aug 12, 2020 5:04 pm
by bruce
Some of your assumptions are weak.

For downloading, the client reports your hardware description to the Assignment Server. The AS looks at the list of Work Servers that have WUs that can be assigned to your hardware and chooses one. Then it the process continues be handled between that WS and your Client. If there happens to be only one WS with compatible WUs, then that's the server that will be assigned. If there are several, other factors are considered. If there are zero, then you'll get an error saying (in essence) there are no WU that can be assigned to your client.

If some WS are off-line, so be it. If one has a temporary shortage, that might change as soon as somebody uploads a completed WU and the WS can generate Gen (N+1).