Proj 13420 same variability as 13418
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 96
- Joined: Thu May 28, 2020 8:06 am
- Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
- Location: Germany
Proj 13420 same variability as 13418
It looks like the project 13420 is showing the same variability as 13418 - means there are some fast folding and some slower folding.
@JohnChodera - would you mind to consider to increase the base credit as you did for the 13418?
13420 (3082, 36, 0) - this is a slow one, just folding, for example.
@JohnChodera - would you mind to consider to increase the base credit as you did for the 13418?
13420 (3082, 36, 0) - this is a slow one, just folding, for example.
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Re: Proj 13420 same variability as 13418
TBH, I've not seen much variability in 13420. I've done 16 of them so far and they've all been around the same ppd.
single 1070
-
- Posts: 96
- Joined: Thu May 28, 2020 8:06 am
- Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
- Location: Germany
Re: Proj 13420 same variability as 13418
13420 (3082, 36, 0)
Normal time for a 13420 is indeed 1:02 TPF
This one had 1:21, without me using the computer.
Normal time for a 13420 is indeed 1:02 TPF
This one had 1:21, without me using the computer.
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Proj 13420 same variability as 13418
Thanks for the heads-up. I suspect some GPUs see much more variability than others.
I've incremented the base credit for 13420-1 by 10% to help compensate for this variability.
We're still investigating how we can further minimize this in our setup or through changes to OpenMM.
Thanks for bearing with us!
~ John Chodera // MSKCC
I've incremented the base credit for 13420-1 by 10% to help compensate for this variability.
We're still investigating how we can further minimize this in our setup or through changes to OpenMM.
Thanks for bearing with us!
~ John Chodera // MSKCC
-
- Posts: 96
- Joined: Thu May 28, 2020 8:06 am
- Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
- Location: Germany
Re: Proj 13420 same variability as 13418
Thanks for listening ;- )
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Re: Proj 13420 same variability as 13418
Thank you, I noticed that 13421 was projected to make just 52k PPD on a GPU which usually does between 70k-95k PPD.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Proj 13420 same variability as 13418
hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Re: Proj 13420 same variability as 13418
My CPU normally makes twice as many points as that GPU, so CPUs don't feel irrelevant for folding for me. In fact, if the PPD drops much lower, it would be better to shut the GPU down and let the CPU run more threads. As long as the points are roughly equivalent to the science benefit of running the work units, they're doing their job of rewarding the most effective folding configurations.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
-
- Posts: 1996
- Joined: Sun Mar 22, 2020 5:52 pm
- Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21 - Location: UK
Re: Proj 13420 same variability as 13418
Sorry, but imho boosting base points to "make up for" a few variable WUs on some GPUs make a mockery of the equivalent science benefit argument - actually rewarding lack of performance !!
I use rolling ppd averages to monitor my kit (and to spot issues on beta testing) .. dropping 10-15 percent (275k per day to 250k per day) overnight helped me identify the performance impact of certain intel firmware patches ... since then (the last few months) a variety of projects have degraded "normal" for CPU points so that the 250k ppd is now under 200k most days ... so over the last few months obviously my server is delivering over 20% less scientific benefit - feels like the time will come sooner rather than later that it will not be considered to be delivering any scientific value at which point I'll retire it ... maybe the new ARM/Android folders can take up the slack
I use rolling ppd averages to monitor my kit (and to spot issues on beta testing) .. dropping 10-15 percent (275k per day to 250k per day) overnight helped me identify the performance impact of certain intel firmware patches ... since then (the last few months) a variety of projects have degraded "normal" for CPU points so that the 250k ppd is now under 200k most days ... so over the last few months obviously my server is delivering over 20% less scientific benefit - feels like the time will come sooner rather than later that it will not be considered to be delivering any scientific value at which point I'll retire it ... maybe the new ARM/Android folders can take up the slack
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070
(Green/Bold = Active)
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Proj 13420 same variability as 13418
We had adjusted the previous projects in the series (which are almost identical) upwards after lots of reports, and the internal testers saw less variation on a small number of test projects before we had to go live. I'm comfortable bringing the base credit back up since we had many reports of this before and no good data that things had improved _except_ for no reports of variation during testing.> hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous
We have some ideas for how to reduce variability going forward, but we've been focusing on the science until we can get the infrastructure fully automated and can turn our attention back to these issues.
~ John Chodera // MSKCC
Re: Proj 13420 same variability as 13418
I'm assuming, when a work unit is completed and the results are uploaded back to F@H, somewhere in there is a log of the actual hardware the work unit ran on. Hopefully you'll have a big enough sample to look at individual gpus and separate out the fast and slow units and figure out what makes them different.
Re: Proj 13420 same variability as 13418
These types of GPU work units - (low atom count?) - seem to benefit from being fed by a CPU with high single core clocks. Typical gaming/graphics oriented systems with a fast GPU and a stock cooled CPU might actually make more PPD by stopping CPU folding while these WUs are running on the GPU, so the CPU can clock up to max stock "boost" frequencies on the single core polling the GPU. PPD/watt would also be better.
Production oriented systems with a modest GPU and many CPU cores likely won't benefit from stopping CPU folding, especially if is well cooled and "boost" is switched off (it's running all cores at the same frequency). In those systems, the CPU can be faster than the GPU and in some cases adding more threads to the CPU gives more PPD than configuring the GPU for folding - at least until CUDA support hopefully reduces CPU usage while GPU folding.
Production oriented systems with a modest GPU and many CPU cores likely won't benefit from stopping CPU folding, especially if is well cooled and "boost" is switched off (it's running all cores at the same frequency). In those systems, the CPU can be faster than the GPU and in some cases adding more threads to the CPU gives more PPD than configuring the GPU for folding - at least until CUDA support hopefully reduces CPU usage while GPU folding.
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Re: Proj 13420 same variability as 13418
The variablity HAS been reduced. Taken as a group, projects 13420 and 13421 are less variable. The really short WUs are now being assigned to slower GPUs and the fater ones are retained in 13420. That allows the average points for each group to be consistent with the GPU performace of half of the spectrum of FAH GPUs. It does not remove all variability when you consider the overall variability of a spectrum of P134xx assignments.
In this case, the union of projects 13420 and 13421 represent a wide variety of projects just as Project MoonShot represents a wide variety of suggested protein fragments.
In this case, the union of projects 13420 and 13421 represent a wide variety of projects just as Project MoonShot represents a wide variety of suggested protein fragments.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 96
- Joined: Thu May 28, 2020 8:06 am
- Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
- Location: Germany
Re: Proj 13420 same variability as 13418
Got another slow one right now
13420 (6171, 23, 0)
For all the people who say there aren't any ;- )
13420 (6171, 23, 0)
For all the people who say there aren't any ;- )
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
-
- Pande Group Member
- Posts: 467
- Joined: Fri Feb 22, 2013 9:59 pm
Re: Proj 13420 same variability as 13418
Thanks, Ichbin3! We're still working on this.
~ John Chodera // MSKCC
~ John Chodera // MSKCC