Page 1 of 1
Let's cool things down a bit (GPU power limit to 50%)
Posted: Wed Mar 18, 2020 3:24 am
by Paragon
With the current influx of new donors, our compute power has outstripped work unit supply. We've got constant new topics of people asking where all the work units went, and new donors coming online every day saying "why is my computer just sitting idle?"
Anyone else think it might be good to throttle all the GPUs down to 50% power limit? This will slow individual work unit processing a bit, but should not impact overall throughput, because essentially there will just be less idle time on a per-machine basis. Going to 50% power limit can improve overall PPD/Watt by significantly dropping the GPU's power consumption. The only downside in the near-term is a 30% or so reduction in PPD.
The upside is a good amount of power savings for the active clients, and less perceived down time (since it will take a bit longer for these machines to finish their work units before going idle again waiting for a new one).
When the server capacity catches up and the work units start flowing continuously, people who crave the full QRB PPD boost can throttle the power limit back up.
https://greenfoldingathome.com/2019/02/ ... wer-limit/
Re: Let's cool things down a bit (GPU power limit to 50%)
Posted: Wed Mar 18, 2020 3:51 am
by JimboPalmer
1) every one here is having issues because if they are not having issues, they are not here. so it looks worse than it is.
2) I am going to make up round numbers and then pretend they are real.
Since last Thursday the number of people who volunteered their computers to fold is 8 to 1. 7 out of 8 Folders is 5 days into the project or less.
Also since last Thursday, 5.5 times as many WUs are being completed. Now it would be wonderful if that was also 8 to 1 because that would hint we had no bottlenecks. But we do. I think we had 3 but have eliminated 1.
1) the number of WUs stockpiled to make it through the weekend were depleted by Saturday morning, But I believe that has been addressed and the WUs are out there.
2) I am still seeing HTTP errors, and I think these hint that the network infrastructure in inadequate. This is relatively cheap and quick to address and Hope we quit seeing these errors soon. Days soon.
3) More worrisome, is the no WU matches your configuration error. I think this error is the server being overwhelmed by requests. Servers are not cheap, and getting new, bigger, better ones is not quick.
Now, I am on the outside looking in at a system I am not an expert at. I could be wrong, I am almost certainly wrong in detail. But getting rid of a bottleneck in servers may persist awhile, months awhile. I hope not but I think so.
Just my Silly Wild Ass Guess, your mileage may vary. I am not any part of F@H except i have no social life and so try to help here.
Re: Let's cool things down a bit (GPU power limit to 50%)
Posted: Wed Mar 18, 2020 4:08 am
by bruce
Once someone does get a WU, it needs to be analyzed by their hardware using a FAHCore program. Each FAHCore program was designed to use all of the resources that are being donated. The GPU works on the WU until it's finished and it gets returned. A person with a modern, expensive GPU will complete the WU faster than a person with more limited hardware (of course) but the sofware will use the GPU to the maximum possible.
GPUs do not run at 50%. The drivers in your computer can be tweaked to limit temperature, power, etc. but FAH cannot adjust those settings; you can.
Similarly, if you fold with your CPU, and it has (say) 8 "threads" (or "CPUs"), they can all be used in parallel or you can decide to configure the FAH software to us only 6 of them, leaving the other 2 to be idle or to work on other tasks.
The FAH server can be configured to only make assignments to specific classes of GPU or to CPUs running with specific numbers of threads, but if your system doesn't match those requirements, you'll get a message saying the server doesn't have any WUs that match your configuration. (You can also get that error message for other reasons.)
Re: Let's cool things down a bit (GPU power limit to 50%)
Posted: Wed Mar 18, 2020 10:24 am
by Paragon
The drivers in your computer can be tweaked to limit temperature, power, etc. but FAH cannot adjust those settings; you can.
That's exactly what I'm advocating. If we set the GPU power limit in the driver to 50 percent, it will improve folding efficiency while reducing the server load during this high demand time period. Throughput won't be affected until there are enough WUs being served to eliminate the idle gaps between work units.