Proj 13420 same variability as 13418

Moderators: Site Moderators, FAHC Science Team

Ichbin3
Posts: 96
Joined: Thu May 28, 2020 8:06 am
Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Location: Germany

Proj 13420 same variability as 13418

Post by Ichbin3 »

It looks like the project 13420 is showing the same variability as 13418 - means there are some fast folding and some slower folding.
@JohnChodera - would you mind to consider to increase the base credit as you did for the 13418?
13420 (3082, 36, 0) - this is a slow one, just folding, for example.
Image
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
HaloJones
Posts: 906
Joined: Thu Jul 24, 2008 10:16 am

Re: Proj 13420 same variability as 13418

Post by HaloJones »

TBH, I've not seen much variability in 13420. I've done 16 of them so far and they've all been around the same ppd.
single 1070

Image
Ichbin3
Posts: 96
Joined: Thu May 28, 2020 8:06 am
Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Location: Germany

Re: Proj 13420 same variability as 13418

Post by Ichbin3 »

13420 (3082, 36, 0)
Normal time for a 13420 is indeed 1:02 TPF
This one had 1:21, without me using the computer.
Image
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Proj 13420 same variability as 13418

Post by JohnChodera »

Thanks for the heads-up. I suspect some GPUs see much more variability than others.

I've incremented the base credit for 13420-1 by 10% to help compensate for this variability.

We're still investigating how we can further minimize this in our setup or through changes to OpenMM.

Thanks for bearing with us!

~ John Chodera // MSKCC
Ichbin3
Posts: 96
Joined: Thu May 28, 2020 8:06 am
Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Location: Germany

Re: Proj 13420 same variability as 13418

Post by Ichbin3 »

Thanks for listening ;- )
Image
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Post by gunnarre »

Thank you, I noticed that 13421 was projected to make just 52k PPD on a GPU which usually does between 70k-95k PPD.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Proj 13420 same variability as 13418

Post by Neil-B »

hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Post by gunnarre »

My CPU normally makes twice as many points as that GPU, so CPUs don't feel irrelevant for folding for me. In fact, if the PPD drops much lower, it would be better to shut the GPU down and let the CPU run more threads. As long as the points are roughly equivalent to the science benefit of running the work units, they're doing their job of rewarding the most effective folding configurations.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Proj 13420 same variability as 13418

Post by Neil-B »

Sorry, but imho boosting base points to "make up for" a few variable WUs on some GPUs make a mockery of the equivalent science benefit argument - actually rewarding lack of performance !!

I use rolling ppd averages to monitor my kit (and to spot issues on beta testing) .. dropping 10-15 percent (275k per day to 250k per day) overnight helped me identify the performance impact of certain intel firmware patches ... since then (the last few months) a variety of projects have degraded "normal" for CPU points so that the 250k ppd is now under 200k most days ... so over the last few months obviously my server is delivering over 20% less scientific benefit - feels like the time will come sooner rather than later that it will not be considered to be delivering any scientific value at which point I'll retire it ... maybe the new ARM/Android folders can take up the slack ;)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Proj 13420 same variability as 13418

Post by JohnChodera »

> hmmm ... I start to wonder ... bumping ppds up because of variability on some cards (and then only variability of some WUs) ... tbh begins to feel less and less point in even keeping track of points ... cpu projects delivering >20% less than was normal across the board ... gpu projects being bumped up by 10% on a single request ... perhaps the "cpu is irrelevant message" has some grounds ... probably close down the team tbh and just fold for anonymous :)
We had adjusted the previous projects in the series (which are almost identical) upwards after lots of reports, and the internal testers saw less variation on a small number of test projects before we had to go live. I'm comfortable bringing the base credit back up since we had many reports of this before and no good data that things had improved _except_ for no reports of variation during testing.

We have some ideas for how to reduce variability going forward, but we've been focusing on the science until we can get the infrastructure fully automated and can turn our attention back to these issues.

~ John Chodera // MSKCC
aetch
Posts: 436
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Proj 13420 same variability as 13418

Post by aetch »

I'm assuming, when a work unit is completed and the results are uploaded back to F@H, somewhere in there is a log of the actual hardware the work unit ran on. Hopefully you'll have a big enough sample to look at individual gpus and separate out the fast and slow units and figure out what makes them different.
Folding Rigs - None (25-Jun-2022)

ImageImage
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Proj 13420 same variability as 13418

Post by gunnarre »

These types of GPU work units - (low atom count?) - seem to benefit from being fed by a CPU with high single core clocks. Typical gaming/graphics oriented systems with a fast GPU and a stock cooled CPU might actually make more PPD by stopping CPU folding while these WUs are running on the GPU, so the CPU can clock up to max stock "boost" frequencies on the single core polling the GPU. PPD/watt would also be better.

Production oriented systems with a modest GPU and many CPU cores likely won't benefit from stopping CPU folding, especially if is well cooled and "boost" is switched off (it's running all cores at the same frequency). In those systems, the CPU can be faster than the GPU and in some cases adding more threads to the CPU gives more PPD than configuring the GPU for folding - at least until CUDA support hopefully reduces CPU usage while GPU folding.
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Proj 13420 same variability as 13418

Post by bruce »

The variablity HAS been reduced. Taken as a group, projects 13420 and 13421 are less variable. The really short WUs are now being assigned to slower GPUs and the fater ones are retained in 13420. That allows the average points for each group to be consistent with the GPU performace of half of the spectrum of FAH GPUs. It does not remove all variability when you consider the overall variability of a spectrum of P134xx assignments.

In this case, the union of projects 13420 and 13421 represent a wide variety of projects just as Project MoonShot represents a wide variety of suggested protein fragments.
Ichbin3
Posts: 96
Joined: Thu May 28, 2020 8:06 am
Hardware configuration: MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
Location: Germany

Re: Proj 13420 same variability as 13418

Post by Ichbin3 »

Got another slow one right now
13420 (6171, 23, 0)
For all the people who say there aren't any ;- )
Image
MSI H81M, G3240, RTX 2080Ti_Rev-A@220W, Ubuntu 18.04
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: Proj 13420 same variability as 13418

Post by JohnChodera »

Thanks, Ichbin3! We're still working on this.

~ John Chodera // MSKCC
Post Reply