Starman157 wrote:@MeeLee
I suspect that the Windows scheduling issues are more of a core quality issue than cache issues. I'm assuming of course that the Windows scheduler is completely ignorant of the programming of a particular thread and any possible parallelism that could be achieved by stuffing another thread onto an already consumed core. As far as the scheduler goes, I only sees that there is a process with something to do. Ok. Look around as to "who" can do it. This is where I think the scheduler gets it wrong. The quality scores (as reported by CTR "Clock Tuner for Ryzen") on my 3950x range from 193 down to 136. I suspect that Windows is taking these into account when loading up cores. As such, I'm "guessing" that it'd prefer to load up a second thread on a busy core than use one of the lower quality cores that isn't doing anything (or at least doing very little). All I'm doing is now forcing the issue by ensuring that 15 of my best cores are used, with the background Windows services et al being forced onto the last remaining core. I've run across a free program, Process Lasso, that allows me to prioritize the various processes since FAH doesn't have affinity control in the client specifically.
However, I am forcing the GPU threads onto the already busy CPU threads as a second thread since I suspect the workloads are quite different and shouldn't create a FP32 contention issue. Also, the GPU needs are fairly bursty at odd occasions (mainly at checkpoint times) so the overall impact of GPU interuptions should be minimal.
Yes, it's difficult measuring all this with so many moving "parts". As such, I'm only interested in maximizing efficiency with the least user input and "futzing". So running down your provided list:
1. Manual is not the way to go. PBO, set and forget. You don't get the same granular capability when going manual, and you only end up creating a LOT of heat. PBO does a much better job than I can and I've been overclocking CPUs since 1985.
2. BIOS automatics only when absolutely necessary. I've hand tuned memory timings along with IF timings too.
3. I don't have control over that other than to ensure it has the latest "production" BIOS. I've carefully selected the mobo for the 3950x believing that it's a more than adequate match (I'm using an Asus ROG Crosshair VIII Hero wifi). Power delivery stages, as determined by others, are more than adequate for stable overclocking a 3950x.
4. Power settings. Nope. Full power all the time.
5. Cooling the CPU. Thermaltake Water 3.0 360mm radiator using 3 120mm fans on full speed.
6. Never pause a WU unless absolutely necessary, after all, PPD is TIME calculated.
7. Already know about PPD differences.
8. Ah, the rest of the Windows crap. Turned off if I can do without, or relegated to the unused core if I cannot.
The case is a Lian Li O11 Dynamic XL with the sides off. It's really only meant to "hold" onto the components. Including the 3 fans for the rad, there are 9 total fans in the case moving air around to various areas. I learned early on that full time folding creates a lot of heat, so thermals are an important consideration in my builds (and always have been). Powering all this is a Seasonic Prime Titanium 850W, which also happens to be the lowest recommended power level for a Radeon 6900xt (which runs maxed at almost 2.7Ghz at 60C, 80C Tjunc) presently consuming 241W (although I've seen it as low as 200w).
The 3950x runs at 70-75C (depending on WU), 4.2Ghz (thanks PBO) at 1.3v or so.
I've taken many considerations into account for this Folding build. The only thing left to maximize the efficiency was the affinity control, hence my request figuring that native control within the application needing control would be better than other solutions (Process Lasso). I still would like the FAHClient to do what is needed since I'm brute forcing after that fact and there's a minor impact to performance on process startup (before Process Lasso get's its hands on things).
I'd have to disagree with you on some points.
Windows actually is aware of in what L-cache program data is buffered. It is more aware than we think!
It even predicts data to be loaded in the L-cache, before the program calls for it.
1- Manual overclocking on the Ryzen is a skill. So long you have a consistent data to crunch (eg: CPU folding of one specific WU or project), manual overclocking is much better than PBO.
You can increase the CPU frequency by about 5-15% over PBO. This, because the cores are fixed, rather than constantly fluctuating.
On my 3900x for instance, PBO runs around 3.85Ghz, while I can bump it to 3,92Ghz.
Other projects run 3,58Ghz, and I can bump them to 3,87Ghz with a manual overclock.
The problem is, when a project uses a power hungry extension in the CPU, like AVX or something, the CPU may hit an undervolt, and error.
That's where PBO becomes interesting. Especially if low CPU intensive projects are mixed with high intensive projects.
6- People pause WUs, when they try to measure performance between hardware, so they can for sure run the same project WU on both hardware. For instance, to measure if an Asus RTX 2060 is as fast as a MSI 2060 or other...
The small pause introduced (sometimes with a PC reset), will lower PPD, and jinx the score.