Page 2 of 2

Re: can't get new wu from all assignment server

Posted: Thu Jun 24, 2021 4:45 pm
by jjmiller
Hi all, thanks for the alerts on 128.252.203.11. As some have noted- 128.252.203.11 (highland1) was under high load as we were trying to generate many more CPU WUs to push out to folders. This was happening towards the end of last week and into the weekend, likely leading to issues with slow download/upload speeds at the end of last week/into the weekend. At some point over the weekend, we also unexpectedly ran out of storage on our server in the partition that houses our log files which led to errors in assigning points. At that time I turned off highland1's ability to assign new jobs (only allowing returns) so that we could resolve any remaining server issues. As of Tuesday we believed we had solved all errors so we reopened jobs for assignment. To try and limit future errors on highland1 I actually set a fairly low assignment rate (2-4 jobs/s). I think the problem that we're currently seeing (as highlighted by mgetz):
mgetz wrote:As of posting this serverstats is showing no GPU work units at this time. They do seem to be bursting occasional failed units back out to be retried but that's all I'm seeing right now.
is that we're actually facing a GPU WU shortage due to all of the generosity in donating GPUs to the COVID Moonshot project. We're trying to bring more GPU and CPU projects online now, but are also trying to do so in a way that we don't overwhelm our work servers at the same time. We're working at the moment to set up some additional work servers on our end so that we can open these jobs up for folders.

Please continue to let us know about errors- I'm doing my best to chase down each of the issues with the servers involved with my project series (182XX) and will continue to let other folks know about other issues that are arising!

Re: can't get new wu from all assignment server

Posted: Thu Jun 24, 2021 11:41 pm
by krilenko
jjmiller wrote:Hi all, thanks for the alerts on 128.252.203.11. As some have noted- 128.252.203.11 (highland1) was under high load as we were trying to generate many more CPU WUs to push out to folders. This was happening towards the end of last week and into the weekend, likely leading to issues with slow download/upload speeds at the end of last week/into the weekend. At some point over the weekend, we also unexpectedly ran out of storage on our server in the partition that houses our log files which led to errors in assigning points. At that time I turned off highland1's ability to assign new jobs (only allowing returns) so that we could resolve any remaining server issues. As of Tuesday we believed we had solved all errors so we reopened jobs for assignment. To try and limit future errors on highland1 I actually set a fairly low assignment rate (2-4 jobs/s). I think the problem that we're currently seeing (as highlighted by mgetz):
mgetz wrote:As of posting this serverstats is showing no GPU work units at this time. They do seem to be bursting occasional failed units back out to be retried but that's all I'm seeing right now.
is that we're actually facing a GPU WU shortage due to all of the generosity in donating GPUs to the COVID Moonshot project. We're trying to bring more GPU and CPU projects online now, but are also trying to do so in a way that we don't overwhelm our work servers at the same time. We're working at the moment to set up some additional work servers on our end so that we can open these jobs up for folders.

Please continue to let us know about errors- I'm doing my best to chase down each of the issues with the servers involved with my project series (182XX) and will continue to let other folks know about other issues that are arising!
Thanks a lot! :)

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 2:16 am
by Luscious
Throughout today I'm seeing "Empty work server assignment" in my log from 18.218.241.186:80 and 65.254.110.245:80. At the moment I have one out of four GPU's folding, the rest are sitting idle.

Not sure if this is a related problem.

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 9:48 am
by tvdsluis
Luscious wrote:Throughout today I'm seeing "Empty work server assignment" in my log from 18.218.241.186:80 and 65.254.110.245:80. At the moment I have one out of four GPU's folding, the rest are sitting idle.

Not sure if this is a related problem.
Same here, sitting idle, No GPU WUs available from 18.218.241.186:80 and 65.254.110.245:80
At the same time over at WCG they are also running low on GPU jobs.
Also cpu jobs are getting scarce as well here and there, so it looks like folders are out pacing scientist everywhere.

Maybe i should do some mining in the meantime.

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 11:10 am
by JimF
tvdsluis wrote:At the same time over at WCG they are also running low on GPU jobs.
Also cpu jobs are getting scarce as well here and there, so it looks like folders are out pacing scientist everywhere.

Maybe i should do some mining in the meantime.
That is very true. But there are still some worthwhile CPU projects:
https://quchempedia.univ-angers.fr/athome/ (best on Linux, but can be used on Windows with VirtualBox)
http://gene.disi.unitn.it/test/index.php
https://www.climateprediction.net/ (usually only Linux work)

If you install VirtualBox, then LHC has work (though I run the native Linux version where possible):
https://lhcathome.cern.ch/lhcathome/

And WCG still has both MCM and ARP available.
And the great Rosetta project still has work for the moment.

The new BlackHoles project is coming up shortly:
http://astro.phys.wvu.edu/bhathome/

There is Universe, but they seem to be at the limit of server capacity:
https://universeathome.pl/universe/

And Einstein has both CPU and GPU work (best on AMD)
https://einsteinathome.org/

Finally, the MLC project has GPU work for the moment (I think maybe Nvidia only).
https://www.mlcathome.org/mlcathome/

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 3:58 pm
by mgetz
At the time of posting... Serverstats is showing 154,374 new Openmm_22 WUs to distribute... so let your GPUs run.

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 5:19 pm
by bikeaddict
The last two WUs assigned were for Moonshot project 13454.

Re: can't get new wu from all assignment server

Posted: Fri Jun 25, 2021 6:16 pm
by superpan
All GPUs back up to operating temperature :D

Re: can't get new wu from all assignment server

Posted: Sat Jun 26, 2021 8:33 am
by Luscious
bikeaddict wrote:The last two WUs assigned were for Moonshot project 13454.
Copy that - I've got the same running right now :D
superpan wrote:All GPUs back up to operating temperature :D
All is well here too. Hopefully it was just a temporary glitch.

Re: can't get new wu from all assignment server

Posted: Sat Jun 26, 2021 2:18 pm
by psaam0001
This is the full list of projects that are available through BOINC. Just in case you need something for your GPU's to do, while the F@H researchers are preparing more projects.

https://boinc.berkeley.edu/projects.php

Paul