Ryzen 9 3950x Benchmark Machine: What should I test for you?

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

PantherX wrote:Great write-up and I look forward to the comparison between SMT on and off.

Back when i7-860 was released, I tested it using the same WUs* to see if the HT on/off would make a difference. I discovered that there's a 12% to 25% reduction in TPF when going from 4 threads to 8 threads (IIRC).

I have two suggestions:
1) Can you please compare the TPF from the same WU*? IMO, PPD is great but what I look at is TPF as it provides a better representation. PPD will vary on internet speed and any server issues (fingers crossed, there won't be any outage).
2) You can capture 2 WUs*, one from a small Project (14677) and one from a large Project (14236) to see how good/bad project scales. Of course, there's nothing that's preventing you from capturing WUs from different atom ranges if you feel like it.

*Generally speaking, you can't predict what WU your system will get. However, once you get a WU from a Project you like, you can capture it for benchmarking to your heart's content without any hindrance to F@H. The method I used was in early V7 release and I assume that it still works:
1) Once you have spotted the elusive WU you want to capture, pause the CPU slot, copy the entire %AppData%\FAHClient folder (let's call it Benchmark)
2) Resume the CPU Slot and set it to Finish
3) After the CPU Slot has finished, disconnect the LAN cable (this is a fail safe step)
4) Exit the FAHClient
5) Copy the contents of %AppData%\FAHClient folder again (let's call this Live)
6) Delete the contents of %AppData%\FAHClient and copy of the contents of Benchmark into it
7) Modify the config.xml file as follows (I have made comments next to each setting that I think might help you achieve consistency with the highest optimization):

Code: Select all

<config>
  <!-- Folding Core -->
  <checkpoint v='30'/>                     //Frequent checkpoints will slow down CPU processing. Setting it to the max will ensure the highest performance level
  <core-priority v='low'/>                 //Higher priority then idle ensure that any idle processes in Windows doesn't impact the CPU performance

  <!-- Slot Control -->
  <pause-on-start v='true'/>               //This allows you to start-up the client and ready the system before you pull the trigger
  <power v='full'/>                        //Just in case to ensure that the client doesn't do anything funny

  <!-- User Information -->
  <user v='Benchmarking_In_Progress'/>     //You can easily identify that this is a benchmark and not real WU.

  <!-- Work Unit Control -->
  <dump-after-deadline v='false'/>         //You can now fold this WU way past the deadline if you want to.
  <next-unit-percentage v='100'/>          //Prevents the FAHClient from disturbing the CPU folding at 99%

  <!-- Folding Slots -->
  <slot id='0' type='CPU'>
    <cpus v='4'/>                          //Start with 1 and use _r2w_ben's list of CPU values.
  </slot>
</config>
8) Once you have finished benchmarking, exit FAHClient, delete the contents of %AppData%\FAHClient and then copy Live
9) Plug the LAN cable in and then start up FAHClient which should resume as if nothing ever happened.
Do note that if the WU was downloaded when the CPU had 16 CPUs, it will not run on any value higher than 16 in the Benchmark phase.

I hope this helps you out and you can speed up the workflow :)
This is an excellent suggestion, and is exactly what I'd like to do. However, I tried it a few times, and upon client restart (which I open after I edit the config file to set the new CPU count), it loads the benchmark work unit with whatever the CPU setting happened to be when that work unit was downloaded. For example, if the work unit was downloaded with 16 threads, it doesn't matter what the cpus setting is in the config file. Upon re-launching, it will be folding with 16 threads. I tried with a 1, 2, 3, and 16 thread solve (work units downloaded with these settings). I let them finish, close the client, delete the FAHClient folder contents, paste the files from the benchmark folder, edit the config file to specify a different # of CPUs, and relaunch, only to find it running with the previous # of threads!

Any thoughts?
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

I'll add that once the benchmark unit finishes (I let one finish to see what happens even though the work unit was previously completed before I copied the guts of the folder), the client pulls down a new work unit and solves with the intended thread count
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by PantherX »

Humm... I would have expected that to work since you're using CPU values less than 16 so shouldn't be any issues. Can you please post the log file?

Alternatively, you can capture the WU and FahCore_a7 and create a folder called FahCore_Testing. In that folder, place the FahCore_a7 and a folder called "Work" which will have wudata_01.dat (the captured WU) and start up from the CMD prompt .\FahCore_a7 -suffix 01 -np X where X can be any value. I have tested that on my system with a few trial and error to get the directory sorted out but after that, it just runs without any issues and the use of the client. Thus, this could be a more portable version of benchmarking that you can use :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

Do we have some initial PPD values already? Interested to see how much the QRB affects these cpus.
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

MeeLee wrote:Do we have some initial PPD values already? Interested to see how much the QRB affects these cpus.
Sorry all, I've been busy with the day job and then had a vacation thrown in the mix. But yes, I have results! Still working on the TPF stuff, but here is part 1 (from before) and part 2 (new) of the article. Part 2 has some interesting plots of the work unit variation seen at each thread setting of the CPU. I did 5 tests at each # of threads setting to make sure I'm not getting too thrown off by work unit variation.

Part 1: https://greenfoldingathome.com/2020/05/ ... f-threads/

Part2: https://greenfoldingathome.com/2020/08/ ... variation/

All things considered, the 3950X is a beast! I am currently re-doing all the tests with SMT off to understand the effect of hyperthreading on the results. That will be part 3 of the review. Part 4 will look at the effect of Core Performance Boost (turbo boost) on efficiency and PPD.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

Also make sure you'll record average CPU frequency, and Wattage.
It'll be important for some people, to decide if SMT on vs off is worth the extra power consumption.
The average CPU frequency might rise with SMT off, and it would be interesting to see how much performance is affected by running at half the frequency, having double the L-cache per thread, and a higher boost frequency...
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

I forgot to ask,
I saw that you tested with PBO disabled.
The real benefits of SMT disabled, are from PBO enabled.
Simply disabling PBO, means you'll effectively half your CPU thread count.
However, with PBO enabled, on a motherboard that can boost the CPU to at least 3,8Ghz on all cores (150W, or 8pin CPU plug), in theory it should boost the CPU to 4,1Ghz with SMT disabled, and PBO enabled.
It's the difference between running higher core count, lower frequency, vs lower core count, higher frequency.

Then there's the question on how much performance is affected by RAM.
We know FAH doesn't need very fast RAM (save if you have an IGP using shared RAM),
However, it's still interesting to know what the average PPD is running memory on stock (2133Mhz, with XMP disabled), and with XMP enabled.
If you have one of the first batch of Ryzen 9 3900 series CPUs, when they just came out, your infinity speed might max out at 1600Mhz (max ram 3200Mhz).
If you have a more modern version, somewhere released half a year, to a year ago, the peak Infinity performance is around 1800Mhz (max ram 3600Mhz). Any faster RAM would not significantly impact performance.
One of the newest tested 3900-series CPUs actually get an infinity fabric that can run past 1866Mhz, (3733Mhz RAM).

The settings are endless, but if you have a way to test the CPU with stock 2133Mhz, vs 3600Mhz (make sure infinity fabric is set to half the RAM values, meaning 1800Mhz), that would be a setting where most people would run their CPUs at.

To do this, you'll need to disable 'spread spectrum', and in some cases, set the FSB speed to 100.00 MHz manually, as some motherboards from the factory overclock this by (a fraction of) a Mhz.
Normally, motherboard manufacturers should not play with this value, and setting it to 'auto', might adjust it, and skew your results.


To summarize, set values to most common settings:
FSB = 100.00 MHz,
Spread spectrum = OFF

Then record power values in below settings, with SMT ON/OFF:

- XMP off = 2133Mhz RAM, IF = 1066 MHz
- XMP on = 3600Mhz, IF = 1800 MHz

and:

- PBO = off (CPU frequency = 3.50Ghz)
- PBO = on (CPU frequency depends on motherboard, and needs to be recorded for XMP OFF, and XMP on).

Those should be 8 charts, and could become more, if you're wanting to test these values with different WU counts...
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

Sounds like a plan. I've already started running with the bios-controlled boost on. I'm just going to do it all over again and make two more curves (SMT on and off with boost enabled). Right now, with SMT on, running the 1st of 32 settings (CPU threads = 1), I am seeing around 12,500 PPD vs. the 8800 PPD it got before with 1 thread. Clocks on the active core are hovering around 4.35 GHz, CPU temp is around 72 C (as opposed to 55 C with boost off). Power is up to 106 watts (vs 75 watts with boost off).

So that means, for 1 thread, with SMT enabled:

Boost off (3.5 GHz) efficiency = 8840 PPD / 75 watts = 118 PPD/Watt
Boost on (4.35 GHz) efficiency = 12500 PPD / 106 watts = 118 PPD/Watt

So in other words, production is up by about 40% and the efficiency is the same. I wonder if this trend will hold all the way to 32 threads? It'll probably take me about a month to find out...

I can definitely run the memory / infinity fabric thing as well. I've got it all running linked at 3600 MHz / 1800 MHz (ram / fabric) at the moment. Should be relatively simple to go back to baseline. I'll save that nugget for the end (will probably do it with whatever thread / boost / smt setting is the best)
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

Ok, Part 3 of the review is up. This contains the plots of SMT (Hyperthreading) on vs. off for the Ryzen 9 3950x.

https://greenfoldingathome.com/2020/08/ ... threading/

The big takeaway here is that SMT's virtual cores (Hyperthreading) really helps folding on this Ryzen processor. Going from 16 threads (all physical cores cranking away) to 32 threads (all cores cranking with 2 threads per core) resulted in a 30% performance improvement. That's pretty significant. Even more interesting was that the efficiency went up as well (PPD/Watt @ the wall).

Finally, I found that running the CPU client with one physical core unloaded (i.e. 15 threads for SMT Off, 30 threads for SMT On) offered noticeably better performance and efficiency than fully maxing the processor out (16 / 32). Has anyone else noticed this? I'd normally say this was just due to work unit variation, but I ran two independent tests with 5 averages per test, and the trend is very clear.
gunnarre
Posts: 559
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by gunnarre »

MeeLee wrote: However, with PBO enabled, on a motherboard that can boost the CPU to at least 3,8Ghz on all cores (150W, or 8pin CPU plug), in theory it should boost the CPU to 4,1Ghz with SMT disabled, and PBO enabled.
PBO - Performance Boost Overdrive = Increasing the power limits to the CPU socket to higher than regular AM4 spec, but still within motherboard manufacturer supplied settings. Might not give measurable effect on a CPU that has plenty of power headroom (like a Ryzen 5 3600 on an budget B450 board), but might give some more multicore performance on a board where the CPU is close to the nominal power of the soket (like a Ryzen 9 3950x on an X570 board with good VRMs). PBO is not enabled by default.

Performance Boost = Dynamic frequency adjustment of individual cores based on load. Asus calls it "EZ Tuning: Normal mode" or "TPU off", MSI calls it "Cool'n'quiet", I think. Can give very higher single-core (and few-core) frequencies and efficiency than running all cores at the same frequency. Performance Boost is enabled by default.

We're talking about the second thing here, right, since it's about frequency?
Image
Online: GTX 1660 Super + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 1050 Ti 4G OC, RX580
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

gunnarre wrote:
MeeLee wrote: However, with PBO enabled, on a motherboard that can boost the CPU to at least 3,8Ghz on all cores (150W, or 8pin CPU plug), in theory it should boost the CPU to 4,1Ghz with SMT disabled, and PBO enabled.
PBO - Performance Boost Overdrive = Increasing the power limits to the CPU socket to higher than regular AM4 spec, but still within motherboard manufacturer supplied settings. Might not give measurable effect on a CPU that has plenty of power headroom (like a Ryzen 5 3600 on an budget B450 board), but might give some more multicore performance on a board where the CPU is close to the nominal power of the soket (like a Ryzen 9 3950x on an X570 board with good VRMs). PBO is not enabled by default.

Performance Boost = Dynamic frequency adjustment of individual cores based on load. Asus calls it "EZ Tuning: Normal mode" or "TPU off", MSI calls it "Cool'n'quiet", I think. Can give very higher single-core (and few-core) frequencies and efficiency than running all cores at the same frequency. Performance Boost is enabled by default.

We're talking about the second thing here, right, since it's about frequency?
That's correct. I disabled Core Performance Boost (frequency scaling on a per-core basis) to lock the processor at 3.5 GHz for all of my SMT On vs SMT off testing. I recently just turned it back on to let the clock rate climb to see the effect (rerunning all SMT on vs off tests). Another thing I can do after all of this testing is done is to enable PBO on my X570 board, which does have the sweet VRMs and 8 + 4 pin CPU power. I've already played a few games with PBO on with some manual frequency offsets and was able to sustain a 4.5 GHz, all-core Prime95 run (power consumption was nearly twice of stock and even my Noctua dual tower cooler was hitting 95C). I expect if I tried folding like that the efficiency would be horrible.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

It's kind of to be expected that running more cores, at a lower frequency is going to be more efficient.
My Atomic Pi has an Atom X5 Z3850, and runs 4 cores at 1,69Ghz.
Pairing 16 of them, costs about the same, is about as fast as a Ryzen 9 3900-3950x CPU, and consumes the same as well.
It's all about core efficiency.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by MeeLee »

Also,
You mentioned PBO was disabled, right?
By any chance, did you record the CPU frequency (not from Taskmanager, but from a program like CPU-Z or something), when enabling SMT?
I know with PBO, when there's between 50-75% load on the CPU, it cuts down on the CPU frequency, to preserve power.
Just wanting to know if the core frequency stays at 3,5Ghz in these moments.
Second thing to note,
If you are getting a WU, when your client is set to 15 cores, and you up the core count, there's a chance, that the client needs to finish the WU on those 15 cores, and only really uses the remaining cores when a new WU is downloaded.

These are the 2 possible reasons I can think of, why 16 cores could be faster vs 16-28 threads for folding.
In theory it wouldn't make sense that at 3,5Ghz fixed, more cores would be slower than fewer. Unless the client only keeps using 15/16 threads, until the WU is processed; and a loss of speed would be due to moving cache data around between Core Chiplet Dies.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Neil-B »

one thing to consider - and I am no way expert in this - but I believe the way Gromacs works with higher count slots and the use of PME means actually (for instance) on some WUs my 24 slot is actually folding as a 20 slot with 4 PME threads - now I don't quite know what this does for throughput or how this layer of complexity changes workload on the processor but it may explain some of the odder benchmarks I have spotted over time?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Paragon
Posts: 137
Joined: Fri Oct 21, 2011 3:24 am
Hardware configuration: Rig1 (Dedicated SMP): AMD Phenom II X6 1100T, Gigabyte GA-880GMA-USB3 board, 8 GB Kingston 1333 DDR3 Ram, Seasonic S12 II 380 Watt PSU, Noctua CPU Cooler

Rig2 (Part-Time GPU): Intel Q6600, Gigabyte 965P-S3 Board, EVGA 460 GTX Graphics, 8 GB Kingston 800 DDR2 Ram, Seasonic Gold X-650 PSU, Artic Cooling Freezer 7 CPU Cooler
Location: United States

Re: Ryzen 9 3950x Benchmark Machine: What should I test for

Post by Paragon »

MeeLee wrote:Also,
You mentioned PBO was disabled, right?
By any chance, did you record the CPU frequency (not from Taskmanager, but from a program like CPU-Z or something), when enabling SMT?
I know with PBO, when there's between 50-75% load on the CPU, it cuts down on the CPU frequency, to preserve power.
Just wanting to know if the core frequency stays at 3,5Ghz in these moments.
Second thing to note,
If you are getting a WU, when your client is set to 15 cores, and you up the core count, there's a chance, that the client needs to finish the WU on those 15 cores, and only really uses the remaining cores when a new WU is downloaded.

These are the 2 possible reasons I can think of, why 16 cores could be faster vs 16-28 threads for folding.
In theory it wouldn't make sense that at 3,5Ghz fixed, more cores would be slower than fewer. Unless the client only keeps using 15/16 threads, until the WU is processed; and a loss of speed would be due to moving cache data around between Core Chiplet Dies.
I watched the CPU frequency in AMD's built in monitoring tool (Ryzen Master) and confirmed it stayed at 3.5 GHz for all tests, SMT on and off. When folding is not running, the frequency does drop down, but it stays up at 3.5 GHz for cores loaded over 75%. So basically any cores with a Folding job were at 3.5, regardless of SMT setting.

Also for all tests, I don't record results after I switch a thread setting until the next work unit shows up. Most work units don't change the # of threads being used after changing the CPU slot config. So, all results reported are "fresh" work units that were downloaded with the new CPU thread setting.
Post Reply