covid moonshot bad wu setup
Moderators: Site Moderators, FAHC Science Team
Re: covid moonshot bad wu setup
The 2 or 3 hour number is probably based on the GPUSpecies values that apply to nV GPUs. Partioning GPUs into fast/slow by Species is in a different place for AMD and for NV. I need to convince John to accept different AMD species than NV species when he assigns the even/odd projects.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 952
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: covid moonshot bad wu setup
Guys,
13420 has ~90k atoms, and is suited for powerful GPUs
13421 has ~4k atoms and is very suited for small GPUs (integrated, APUs, and some very low end stuff). All the APUs are very slow, so let's not kid ourselves and let's not look for problems where there is none.
13422 has ~90k atoms same as 13420
13423 has ~4k atoms 13421
The TPF everyone is seeing is what you have to expect.
Averages quoted by John are for high end GPUs, since labs do not have all the GPUs in the world
13420 has ~90k atoms, and is suited for powerful GPUs
13421 has ~4k atoms and is very suited for small GPUs (integrated, APUs, and some very low end stuff). All the APUs are very slow, so let's not kid ourselves and let's not look for problems where there is none.
13422 has ~90k atoms same as 13420
13423 has ~4k atoms 13421
The TPF everyone is seeing is what you have to expect.
Averages quoted by John are for high end GPUs, since labs do not have all the GPUs in the world
FAH Omega tester
-
- Posts: 523
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: covid moonshot bad wu setup
Agreed that even modern APU's are slow, I'm not at all unaccustomed to 24 hour runs.
If they get assigned to my rig and they fold, I fold them. My input was only to verify that if the desire is quicker runs, then these aren't the rigs to do them.
I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
If they get assigned to my rig and they fold, I fold them. My input was only to verify that if the desire is quicker runs, then these aren't the rigs to do them.
I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
Fold them if you get them!
Re: covid moonshot bad wu setup
The thing I didn't mention is the partitioning of GPU which goes along with the portioning of projects. We could have done better in separating the APUs from the faster devices ... and that's certainly part of the plan for the future. In the meantime, the moonshot is important and if you complete a 24hr run, THANK YOU.BobWilliams757 wrote:I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 523
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: covid moonshot bad wu setup
With a little APU, long work unit runs are the norm rather than the exception. I'll admit that some of the recent small atom count projects for Moonshot got my hopes up for two hour folds.... but I've been picking up more like 22 hour folds. But in the end, if they meet deadlines, they get folded regardless.bruce wrote:The thing I didn't mention is the partitioning of GPU which goes along with the portioning of projects. We could have done better in separating the APUs from the faster devices ... and that's certainly part of the plan for the future. In the meantime, the moonshot is important and if you complete a 24hr run, THANK YOU.BobWilliams757 wrote:I realize that all the partitioning mentioned by Bruce is a monumental task, as is the beta testing and further input that takes place before that happens. In the end I just want to provide input that helps all the people working hard behind the scenes meet their goals. If the goal is a 2 hour work unit turnaround, or a 2 day turnaround, I'll still fold it 24/7 until completed.
Fold them if you get them!
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: covid moonshot bad wu setup
> I've only picked up a couple of work units of this project number, but they perform slowly on my 2400G Ryzen (Vega 11 onboard graphics). This same setup blazed through the 13421 work units in about two hours each. Project 13422 is in the 20 hour range.
@BobWilliams757: We have tracked this down to two issues:
(1) there is a significant slowdown for some RUNs when there is a constraint group involving multiple atoms:
https://github.com/openmm/openmm/issues/2814
(2) there appear to be issues with VEGA for this workload:
https://github.com/openmm/openmm/issues/2817
https://github.com/openmm/openmm/issues/2813
We're working on trying to solve these, but we think we have a workaround for at least one of them for the next Moonshot sprint.
~ John Chodera // MSKCC
@BobWilliams757: We have tracked this down to two issues:
(1) there is a significant slowdown for some RUNs when there is a constraint group involving multiple atoms:
https://github.com/openmm/openmm/issues/2814
(2) there appear to be issues with VEGA for this workload:
https://github.com/openmm/openmm/issues/2817
https://github.com/openmm/openmm/issues/2813
We're working on trying to solve these, but we think we have a workaround for at least one of them for the next Moonshot sprint.
~ John Chodera // MSKCC
-
- Posts: 101
- Joined: Tue Apr 21, 2020 11:46 am
Re: covid moonshot bad wu setup
Getting lower points today (compared with the past week or two) on Moonshot WUs that take just over 3 hours on either my RTX 2060 or my RTX 2060 KO. Which is fine. I was usually getting near 3,000,000/day on mostly Mooonshot WUs; now it’s looking like roughly 2,500,000. I thought that the bigger figure was excessive.
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: covid moonshot bad wu setup
As I also posted in another thread - and agreeing with you re current ppd rates:
For the most part 13422/13423 PPD were very high (on my kit between 50%/100% depending on WU) to allow for the occasional low one ... now that the issue with low ones has been resolved I'd have expected PPDs on current sprint to be near normal ... I am seeing PPDs maybe 5% to 10% higher (if that) than I might have expected for my kit on this core so actually seem quite reasonable to me.
For the most part 13422/13423 PPD were very high (on my kit between 50%/100% depending on WU) to allow for the occasional low one ... now that the issue with low ones has been resolved I'd have expected PPDs on current sprint to be near normal ... I am seeing PPDs maybe 5% to 10% higher (if that) than I might have expected for my kit on this core so actually seem quite reasonable to me.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: covid moonshot bad wu setup
@Neil-B is exactly right---we had tried to even things out to account for the slow RUNs, but now that we've fixed this, we're trying to bring the base credit to being more in scale with other projects.
Thanks again for helping us identify and fix these issues, and for sticking with us!
~ John Chodera // MSKCC
Thanks again for helping us identify and fix these issues, and for sticking with us!
~ John Chodera // MSKCC
-
- Posts: 6
- Joined: Tue Sep 08, 2020 12:00 am
- Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti - Location: Team #12369 - Folding@Undernet
Re: covid moonshot bad wu setup
I just wanted to note that I'm getting WUs for project 13422 that bring my GTX1070 down to ~5% of its usual PPD. I'm working through my second one right now -- they take 2-3 days to get through, which I've never seen for WUs from any other projects.
Re: covid moonshot bad wu setup
It's good to. hear that things are stabilizing. With the workaround for the projects that were especially slow related to constraints fixed and a permanent fix planned for some time in the future, I understand the PPDs have moved significantly toward "normal." I have no doubt there will be additional steps toward even more stability can be expected for the sprint projects.
Don't get upset if you run into longer projects in the future. The sprint projects represent a very narrow class of analysis. (Above, somebody objected to a WU that unexpectedly ran for 24 hrs.) In the future, you may to see projects that run for several days ( & associated with correspondingly longer deadlines) but probably not on Moonshot projects. We shall see.
Don't get upset if you run into longer projects in the future. The sprint projects represent a very narrow class of analysis. (Above, somebody objected to a WU that unexpectedly ran for 24 hrs.) In the future, you may to see projects that run for several days ( & associated with correspondingly longer deadlines) but probably not on Moonshot projects. We shall see.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: covid moonshot bad wu setup
@NineVolt: Eep, so sorry for that. We exhausted the 13424-5 projects, but had some backlog of 13422-3 on low priority, which is why you're seeing them now.
We're launching Sprint 4 in about an hour. This set includes the workaround for the constraints issue, and future projects will be sure to include it as well.
~ John Chodera // MSKCC
We're launching Sprint 4 in about an hour. This set includes the workaround for the constraints issue, and future projects will be sure to include it as well.
~ John Chodera // MSKCC
Re: covid moonshot bad wu setup
Hopefully when my GT 1030 arrives Thursday night (along w/fresh memory for the same Windows 7.0 SP1 system), I can get some idea as to whether or not I will be able to help more with the sprints.
However, when I get the GT 710 (on my Dell PowerEdge T105 w/Fedora) folding, I don't expect it to be able to help with the sprints. Only close monitoring will help me see if it can help.
Paul
However, when I get the GT 710 (on my Dell PowerEdge T105 w/Fedora) folding, I don't expect it to be able to help with the sprints. Only close monitoring will help me see if it can help.
Paul
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: covid moonshot bad wu setup
Assuming that it is 64-bit, you can download the latest version of the drivers from Nvidia (https://www.nvidia.com/Download/driverR ... 2977/en-us) to fold on it. I do think that if you fold on it 24/7, it could meet the timeout deadline without issues. However, give it a try and report back your findings to uspsaam0001 wrote:...Windows 7.0 SP1 system...l
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: covid moonshot bad wu setup
I know the Windows system will be easy to get working, as I had it running F@H before with the GT 710 that is now in my Dell.... The Dell is the system that I will be re-installing Fedora v32.x from scratch, and following the Nvidia guide for removing the open-source drivers that are initially installed (before installing the Nvidia drivers).
All I can do is give it a shot.... I know there will be a WU that may overwhelm the GT 710, or the GT 1030... But those GPU's are likely to be better than the integrated Radeon R2-R5 series GPU on my AMD powered HP laptop.
Paul
All I can do is give it a shot.... I know there will be a WU that may overwhelm the GT 710, or the GT 1030... But those GPU's are likely to be better than the integrated Radeon R2-R5 series GPU on my AMD powered HP laptop.
Paul