Page 1 of 1

F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 3:17 pm
by John_Alan
How much is F@H able to take advantage of CPU memory bandwidth? I am folding with a 2nd generation AMD Epyc 7302p CPU and it has an 8-channel DDR4-3200 memory controller. I currently have only 4 sticks of DDR4 in the machine, thus the CPU is only using 4 memory channels to talk with the memory. With F@H, would I see any benefits in populating all 8 memory channels? I am also GPU folding on this same machine.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 3:57 pm
by JimboPalmer
I am WAY on the other end of computing capability, but I find Dual Channel to be 10% faster than Single Channel on my old laptops.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 4:59 pm
by Joe_H
You may see a minor improvement in speed of folding, but it might be hard to quantify. The effect may be larger when you get WUs consisting of many atoms, more data is being moved to and from RAM.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 7:33 pm
by John_Alan
Thank you for the replies. Has anyone run across any benchmarks or articles that explore these aspects of F@H? I'd gladly pick up another 4 sticks of DDR4 for this machine if you think the additional memory bandwidth will help performance.

On prior Xeon workstation CPUs, it was hard to justify folding on a E5-2680v3/v4 as I rarely saw more than 140,000 PPD, but the Epyc 7302p is pushing 350,000 to 400,000 PPD with similar power consumption, so CPU folding on this workstation seems worth it since some science needs CPU's for the work units.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 7:57 pm
by Callaghan
I am not a techie but the above posts promt me to ask ......
MY CPU Intel Core i5-8400 CPU @ 2.80GHz has 2 x memory channels. Does this mean that the CPU can only access memory from 2 of the 4 memory slots on the MB.
Currently only 2 memory slots are loaded.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 9:48 pm
by Neil-B
think the four slots will be paired into two channels ... if the two dimms are one in each of the pairs youe will have 2 channel but if they are in the slots next to each other (normally) then you might be running single channel ... say the sockets are a, b, c and d ... (sometime mobo vendors actually call them A1, A2, B1 and B2 ... you want to have one dim in a or b and the other in c or d ... or using the other naming convention use a1 and b1 or a2 and b2 ... mobo manuals usually make it clear which ones to use

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 10:15 pm
by JimboPalmer
Callaghan wrote:MY CPU Intel Core i5-8400 CPU @ 2.80GHz has 2 x memory channels. Does this mean that the CPU can only access memory from 2 of the 4 memory slots on the MB. Currently only 2 memory slots are loaded.
All four slots will work. In single channel mode, the wait to access RAM will take about 28 memory cycles to cycle to be ready again , (over 100 CPU cycles) In Dual channel Mode you can access each channel without waiting that long (11 cycles for mine) , so you have 50% odds of reaching the next memory location in a shorter time. (His quad channel allows a 75% chance of not waiting 8 channel would give him a 87% chance of not waiting)train
Only very select kinds of software are typically memory channel constrained, databases are one. F@H only has a minor improvement, as It is usually constrained by Floating Point math speed.

If you are interested, I use Speccy to examine RAM, there is tons of detail as you drill down. (click blue links)
https://www.ccleaner.com/speccy

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 10:18 pm
by Joe_H
John_Alan wrote:Has anyone run across any benchmarks or articles that explore these aspects of F@H?
I vaguely recall that someone or several someones did tests on memory size and speeds with F@h. They did see a small amount of speed improvement with faster RAM, but I don't recall numbers.

Making it a bit harder to characterize is differences in the size of WUs from different projects. Some of the smaller ones may fit a large amount of the data in the available L2 or L3 cache connected to a processor. Those will spend much less time accessing data in main RAM as compared to other much larger projects.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Thu Feb 25, 2021 10:23 pm
by bruce
It would be wrong to suggest that memory bandwidth has no effect on FAH's speed, but it would also be wrong to suggest that it's a critical feature. FAH's core process for CPUs uses a relatively small amount of RAM. Speed depends almost entirely on the speed at which the CPU can compute floating point numbers.

The speed of FAH for GPUs depends almost entirely on the speed at which the GPU can compute floating point numbers. A secondary limitation may pop up depending on the PCIe bus.

Bottom line: Upgrading your RAM is fine, but you probably won't notice a difference since it's a rather small percentage of a rather complex set of other limitations.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Sat Feb 27, 2021 3:23 pm
by John_Alan
Bruce and all, thank you for the info! The 7302p has 128MB of L3 cache, so I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Sat Feb 27, 2021 3:41 pm
by JimboPalmer
John_Alan wrote:I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.
This is why databases respond well to multiple channel RAM, they overwhelm the cache that works well with most applications.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Posted: Sun Feb 28, 2021 4:14 am
by bruce
John_Alan wrote:,... so I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.
"large parts of..." is a lot different that "all of..."

Unfortunately every force on every atom must be calculated during every step so cache performance is significantly degraded unless the entire protein fits.