can't get new wu from all assignment server

Moderators: Site Moderators, FAHC Science Team

jjmiller
Scientist
Posts: 139
Joined: Fri Apr 09, 2021 4:43 pm

Re: can't get new wu from all assignment server

Post by jjmiller »

Hi all, thanks for the alerts on 128.252.203.11. As some have noted- 128.252.203.11 (highland1) was under high load as we were trying to generate many more CPU WUs to push out to folders. This was happening towards the end of last week and into the weekend, likely leading to issues with slow download/upload speeds at the end of last week/into the weekend. At some point over the weekend, we also unexpectedly ran out of storage on our server in the partition that houses our log files which led to errors in assigning points. At that time I turned off highland1's ability to assign new jobs (only allowing returns) so that we could resolve any remaining server issues. As of Tuesday we believed we had solved all errors so we reopened jobs for assignment. To try and limit future errors on highland1 I actually set a fairly low assignment rate (2-4 jobs/s). I think the problem that we're currently seeing (as highlighted by mgetz):
mgetz wrote:As of posting this serverstats is showing no GPU work units at this time. They do seem to be bursting occasional failed units back out to be retried but that's all I'm seeing right now.
is that we're actually facing a GPU WU shortage due to all of the generosity in donating GPUs to the COVID Moonshot project. We're trying to bring more GPU and CPU projects online now, but are also trying to do so in a way that we don't overwhelm our work servers at the same time. We're working at the moment to set up some additional work servers on our end so that we can open these jobs up for folders.

Please continue to let us know about errors- I'm doing my best to chase down each of the issues with the servers involved with my project series (182XX) and will continue to let other folks know about other issues that are arising!
krilenko
Posts: 34
Joined: Sat Sep 29, 2012 2:40 am
Hardware configuration: i5-3570k / Gigabyte Z77X-D3H / G.Skill DDR3 2x4GB Sniper / Galax GTX1660 Super 6GB 192 bits / CoolerMaster Hyper 212 + / Corsair HX750W modular / Seagate 500GB Sata 3 + Seagate 160GB Sata 2 + Samsung 1TB Sata 2/ Flatron W2253V / Windows 7 Ultimate 64bits / Corsair Obsidian 650D
Location: Porto Alegre - RS - Brazil

Re: can't get new wu from all assignment server

Post by krilenko »

jjmiller wrote:Hi all, thanks for the alerts on 128.252.203.11. As some have noted- 128.252.203.11 (highland1) was under high load as we were trying to generate many more CPU WUs to push out to folders. This was happening towards the end of last week and into the weekend, likely leading to issues with slow download/upload speeds at the end of last week/into the weekend. At some point over the weekend, we also unexpectedly ran out of storage on our server in the partition that houses our log files which led to errors in assigning points. At that time I turned off highland1's ability to assign new jobs (only allowing returns) so that we could resolve any remaining server issues. As of Tuesday we believed we had solved all errors so we reopened jobs for assignment. To try and limit future errors on highland1 I actually set a fairly low assignment rate (2-4 jobs/s). I think the problem that we're currently seeing (as highlighted by mgetz):
mgetz wrote:As of posting this serverstats is showing no GPU work units at this time. They do seem to be bursting occasional failed units back out to be retried but that's all I'm seeing right now.
is that we're actually facing a GPU WU shortage due to all of the generosity in donating GPUs to the COVID Moonshot project. We're trying to bring more GPU and CPU projects online now, but are also trying to do so in a way that we don't overwhelm our work servers at the same time. We're working at the moment to set up some additional work servers on our end so that we can open these jobs up for folders.

Please continue to let us know about errors- I'm doing my best to chase down each of the issues with the servers involved with my project series (182XX) and will continue to let other folks know about other issues that are arising!
Thanks a lot! :)
Luscious
Posts: 49
Joined: Sat Oct 13, 2012 6:38 am

Re: can't get new wu from all assignment server

Post by Luscious »

Throughout today I'm seeing "Empty work server assignment" in my log from 18.218.241.186:80 and 65.254.110.245:80. At the moment I have one out of four GPU's folding, the rest are sitting idle.

Not sure if this is a related problem.
tvdsluis
Posts: 42
Joined: Mon Apr 24, 2017 11:12 am

Re: can't get new wu from all assignment server

Post by tvdsluis »

Luscious wrote:Throughout today I'm seeing "Empty work server assignment" in my log from 18.218.241.186:80 and 65.254.110.245:80. At the moment I have one out of four GPU's folding, the rest are sitting idle.

Not sure if this is a related problem.
Same here, sitting idle, No GPU WUs available from 18.218.241.186:80 and 65.254.110.245:80
At the same time over at WCG they are also running low on GPU jobs.
Also cpu jobs are getting scarce as well here and there, so it looks like folders are out pacing scientist everywhere.

Maybe i should do some mining in the meantime.
JimF
Posts: 651
Joined: Thu Jan 21, 2010 2:03 pm

Re: can't get new wu from all assignment server

Post by JimF »

tvdsluis wrote:At the same time over at WCG they are also running low on GPU jobs.
Also cpu jobs are getting scarce as well here and there, so it looks like folders are out pacing scientist everywhere.

Maybe i should do some mining in the meantime.
That is very true. But there are still some worthwhile CPU projects:
https://quchempedia.univ-angers.fr/athome/ (best on Linux, but can be used on Windows with VirtualBox)
http://gene.disi.unitn.it/test/index.php
https://www.climateprediction.net/ (usually only Linux work)

If you install VirtualBox, then LHC has work (though I run the native Linux version where possible):
https://lhcathome.cern.ch/lhcathome/

And WCG still has both MCM and ARP available.
And the great Rosetta project still has work for the moment.

The new BlackHoles project is coming up shortly:
http://astro.phys.wvu.edu/bhathome/

There is Universe, but they seem to be at the limit of server capacity:
https://universeathome.pl/universe/

And Einstein has both CPU and GPU work (best on AMD)
https://einsteinathome.org/

Finally, the MLC project has GPU work for the moment (I think maybe Nvidia only).
https://www.mlcathome.org/mlcathome/
mgetz
Posts: 57
Joined: Tue Aug 11, 2020 6:23 pm

Re: can't get new wu from all assignment server

Post by mgetz »

At the time of posting... Serverstats is showing 154,374 new Openmm_22 WUs to distribute... so let your GPUs run.
Image
bikeaddict
Posts: 210
Joined: Sun May 03, 2020 1:20 am

Re: can't get new wu from all assignment server

Post by bikeaddict »

The last two WUs assigned were for Moonshot project 13454.
superpan
Posts: 11
Joined: Mon Dec 07, 2020 7:31 am

Re: can't get new wu from all assignment server

Post by superpan »

All GPUs back up to operating temperature :D
--
Pete

Image
Luscious
Posts: 49
Joined: Sat Oct 13, 2012 6:38 am

Re: can't get new wu from all assignment server

Post by Luscious »

bikeaddict wrote:The last two WUs assigned were for Moonshot project 13454.
Copy that - I've got the same running right now :D
superpan wrote:All GPUs back up to operating temperature :D
All is well here too. Hopefully it was just a temporary glitch.
psaam0001
Posts: 378
Joined: Mon May 18, 2020 2:02 am
Location: Ruckersville, Virginia, USA

Re: can't get new wu from all assignment server

Post by psaam0001 »

This is the full list of projects that are available through BOINC. Just in case you need something for your GPU's to do, while the F@H researchers are preparing more projects.

https://boinc.berkeley.edu/projects.php

Paul
Post Reply