Page 2 of 4

Re: covid moonshot bad wu setup

Posted: Sun Aug 23, 2020 7:05 am
by bruce
The 2 or 3 hour number is probably based on the GPUSpecies values that apply to nV GPUs. Partioning GPUs into fast/slow by Species is in a different place for AMD and for NV. I need to convince John to accept different AMD species than NV species when he assigns the even/odd projects.

Re: covid moonshot bad wu setup

Posted: Sun Aug 23, 2020 10:00 am
by muziqaz
Guys,
13420 has ~90k atoms, and is suited for powerful GPUs
13421 has ~4k atoms and is very suited for small GPUs (integrated, APUs, and some very low end stuff). All the APUs are very slow, so let's not kid ourselves and let's not look for problems where there is none.

13422 has ~90k atoms same as 13420
13423 has ~4k atoms 13421

The TPF everyone is seeing is what you have to expect.
Averages quoted by John are for high end GPUs, since labs do not have all the GPUs in the world ;)

Re: covid moonshot bad wu setup

Posted: Sun Aug 23, 2020 2:27 pm
by BobWilliams757
Agreed that even modern APU's are slow, I'm not at all unaccustomed to 24 hour runs. :lol:

If they get assigned to my rig and they fold, I fold them. My input was only to verify that if the desire is quicker runs, then these aren't the rigs to do them.


I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.

Re: covid moonshot bad wu setup

Posted: Sun Aug 23, 2020 4:08 pm
by bruce
BobWilliams757 wrote:I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
The thing I didn't mention is the partitioning of GPU which goes along with the portioning of projects. We could have done better in separating the APUs from the faster devices ... and that's certainly part of the plan for the future. In the meantime, the moonshot is important and if you complete a 24hr run, THANK YOU.

Re: covid moonshot bad wu setup

Posted: Wed Aug 26, 2020 12:24 am
by BobWilliams757
bruce wrote:
BobWilliams757 wrote:I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
The thing I didn't mention is the partitioning of GPU which goes along with the portioning of projects. We could have done better in separating the APUs from the faster devices ... and that's certainly part of the plan for the future. In the meantime, the moonshot is important and if you complete a 24hr run, THANK YOU.
With a little APU, long work unit runs are the norm rather than the exception. I'll admit that some of the recent small atom count projects for Moonshot got my hopes up for two hour folds.... but I've been picking up more like 22 hour folds. But in the end, if they meet deadlines, they get folded regardless.

Re: covid moonshot bad wu setup

Posted: Wed Aug 26, 2020 12:36 am
by JohnChodera
> I've only picked up a couple of work units of this project number, but they perform slowly on my 2400G Ryzen (Vega 11 onboard graphics). This same setup blazed through the 13421 work units in about two hours each. Project 13422 is in the 20 hour range.

@BobWilliams757: We have tracked this down to two issues:
(1) there is a significant slowdown for some RUNs when there is a constraint group involving multiple atoms:
https://github.com/openmm/openmm/issues/2814
(2) there appear to be issues with VEGA for this workload:
https://github.com/openmm/openmm/issues/2817
https://github.com/openmm/openmm/issues/2813

We're working on trying to solve these, but we think we have a workaround for at least one of them for the next Moonshot sprint.

~ John Chodera // MSKCC

Re: covid moonshot bad wu setup

Posted: Sat Aug 29, 2020 5:44 pm
by markdotgooley
Getting lower points today (compared with the past week or two) on Moonshot WUs that take just over 3 hours on either my RTX 2060 or my RTX 2060 KO. Which is fine. I was usually getting near 3,000,000/day on mostly Mooonshot WUs; now it’s looking like roughly 2,500,000. I thought that the bigger figure was excessive.

Re: covid moonshot bad wu setup

Posted: Sat Aug 29, 2020 6:12 pm
by Neil-B
As I also posted in another thread - and agreeing with you re current ppd rates:

For the most part 13422/13423 PPD were very high (on my kit between 50%/100% depending on WU) to allow for the occasional low one ... now that the issue with low ones has been resolved I'd have expected PPDs on current sprint to be near normal ... I am seeing PPDs maybe 5% to 10% higher (if that) than I might have expected for my kit on this core so actually seem quite reasonable to me.

Re: covid moonshot bad wu setup

Posted: Sat Aug 29, 2020 6:43 pm
by JohnChodera
@Neil-B is exactly right---we had tried to even things out to account for the slow RUNs, but now that we've fixed this, we're trying to bring the base credit to being more in scale with other projects.

Thanks again for helping us identify and fix these issues, and for sticking with us!

~ John Chodera // MSKCC

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 12:04 am
by NineVolt
I just wanted to note that I'm getting WUs for project 13422 that bring my GTX1070 down to ~5% of its usual PPD. I'm working through my second one right now -- they take 2-3 days to get through, which I've never seen for WUs from any other projects.

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 2:53 am
by bruce
It's good to. hear that things are stabilizing. With the workaround for the projects that were especially slow related to constraints fixed and a permanent fix planned for some time in the future, I understand the PPDs have moved significantly toward "normal." I have no doubt there will be additional steps toward even more stability can be expected for the sprint projects.

Don't get upset if you run into longer projects in the future. The sprint projects represent a very narrow class of analysis. (Above, somebody objected to a WU that unexpectedly ran for 24 hrs.) In the future, you may to see projects that run for several days ( & associated with correspondingly longer deadlines) but probably not on Moonshot projects. We shall see.

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 3:25 am
by JohnChodera
@NineVolt: Eep, so sorry for that. We exhausted the 13424-5 projects, but had some backlog of 13422-3 on low priority, which is why you're seeing them now.

We're launching Sprint 4 in about an hour. This set includes the workaround for the constraints issue, and future projects will be sure to include it as well.

~ John Chodera // MSKCC

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 4:44 am
by psaam0001
Hopefully when my GT 1030 arrives Thursday night (along w/fresh memory for the same Windows 7.0 SP1 system), I can get some idea as to whether or not I will be able to help more with the sprints.

However, when I get the GT 710 (on my Dell PowerEdge T105 w/Fedora) folding, I don't expect it to be able to help with the sprints. Only close monitoring will help me see if it can help.

Paul

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 10:09 am
by PantherX
psaam0001 wrote:...Windows 7.0 SP1 system...l
Assuming that it is 64-bit, you can download the latest version of the drivers from Nvidia (https://www.nvidia.com/Download/driverR ... 2977/en-us) to fold on it. I do think that if you fold on it 24/7, it could meet the timeout deadline without issues. However, give it a try and report back your findings to us :)

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 10:41 am
by psaam0001
I know the Windows system will be easy to get working, as I had it running F@H before with the GT 710 that is now in my Dell.... The Dell is the system that I will be re-installing Fedora v32.x from scratch, and following the Nvidia guide for removing the open-source drivers that are initially installed (before installing the Nvidia drivers).

All I can do is give it a shot.... I know there will be a WU that may overwhelm the GT 710, or the GT 1030... But those GPU's are likely to be better than the integrated Radeon R2-R5 series GPU on my AMD powered HP laptop.

Paul