Page 1 of 1
3 GPUS STOP downloading - 140.163.4.200
Posted: Mon Oct 12, 2020 2:10 am
by chabgood
Last week 3 of my 7 GPUS stop downloading. I ended up deleting them then added them then they downloaded. Then today the same 3 slots(not the same GPUS) did they same thing. same WS: 140.163.4.200. they finish and never stop downloading.
Ubuntu 20.
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Mon Oct 12, 2020 8:09 am
by PantherX
Can you please post the log file so we can understand what happened from the client's end?
I checked and 140.163.4.200 has an uptime of about an hour. Thus, if it restarted, you might have encountered this bug:
https://github.com/FoldingAtHome/fah-issues/issues/983
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Mon Oct 12, 2020 9:10 am
by HaloJones
I had this too at 00:40 GMT but worked fine on a re-start this morning.
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Mon Oct 12, 2020 3:06 pm
by chabgood
PantherX wrote:Can you please post the log file so we can understand what happened from the client's end?
I checked and 140.163.4.200 has an uptime of about an hour. Thus, if it restarted, you might have encountered this bug:
https://github.com/FoldingAtHome/fah-issues/issues/983
Would it be better to put the works servers on a LB?
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Tue Oct 13, 2020 6:45 am
by PantherX
chabgood wrote:...Would it be better to put the works servers on a LB?
Unfortunately, the Work Servers are owned by various labs and in most cases, are physical machines. Having a physical load-balancer might not be an option for most labs
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Tue Oct 13, 2020 2:24 pm
by chabgood
PantherX wrote:chabgood wrote:...Would it be better to put the works servers on a LB?
Unfortunately, the Work Servers are owned by various labs and in most cases, are physical machines. Having a physical load-balancer might not be an option for most labs
A Load Balancer points to an IP or DNS entry. I am not sure how that is limiting.
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Tue Oct 13, 2020 3:00 pm
by Joe_H
Typically a load balancer is used for connections to multiple servers with identical content or services. That does not apply to the work servers used by F@h. Each WS will have different projects hosted.
What balancing of the load that does exist in F@h is at the Assignment Server level. The AS has information, regularly updated, on what types of projects have WUs available, their priority, and how many. There are also limits set on how many connections can be made for each WS.
Re: 3 GPUS STOP downloading - 140.163.4.200
Posted: Tue Oct 13, 2020 3:02 pm
by chabgood
oh ok.