Page 1 of 3
Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 07, 2020 8:08 pm
by Nilem
I've just found an old HP server with Dual Xeon X5650 and started to fold on it. I know its old, but since, each CPU has a cpu mark of 5863, I was thinking the result should be okay.
But it's been disapointing. I'm currently folding a 16810 project's FAHCore_a8 WU and only making 20k-ish PPD with all 24 thread on it. Meanwhile, I also have a single i5-8500B 6 cores (CPU mark of 9538) that got the same WU, and it could crush it at of speed that would end up with over 100k PPD.
Is this normal? Are my dual X5650 just that slow? Or is there something I should check to unleash their power?
So far, I've tried to split the thread in slots like 12/12, 8/8/6, etc, and couldn't really get much better results. But even when the 24 threads are in one slot, the monitor tells me most cores are pretty much in full use.
Also, sometimes it suddenly starts to fold very fast and get up to 150k PPD, but this is always just a glitch : when this happen, progress stop to be print in the log tab and after a few minutes, the progress bar just go down to where it was before the power up. (Example : in a fews minutes I get from 55% to 60%, but none of the progress was log and it suddently go back down to 55.40%). Not sure if this has anything to do with my main question.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 07, 2020 8:23 pm
by Joe_H
To start, yes the X5650 is that old. It is a 10 year old Westmere design and only supports SSE. Newer chips support AVX or AVX2 which gives a significant speedup over the SSE2 instructions used when on your processors.
You may find depending on the WU that using just the 12 main CPU threads will give nearly the highest PPD.
The PPD and ETA figures right after a start are not reliable estimates. The client needs to see at least 1-2% progress before they will settle down to accurate numbers.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 07, 2020 8:54 pm
by Nilem
Joe_H wrote: Newer chip support AVX or AVX2 which gives a significant speedup over the SSE2 instructions used when on your processors.
Well, here is an important details I totally miss! Thanks for the info.
Joe_H wrote: You may find depending on the WU that using just the 12 main CPU threads will give nearly the highest PPD.
You mean that running just 1 slot with 12 threads might get higher PPD than 1 slot with 24 threads? Whats would be the reason for not using all the HT?
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 07, 2020 9:24 pm
by _r2w_ben
Nilem wrote:You mean that running just 1 slot with 12 threads might get higher PPD than 1 slot with 24 threads? Whats would be the reason for not using all the HT?
FAHCore_a8 does a better job of keeping execution units busy than FAHCore_a7. Spawning and syncing the extra threads that use HT might be slowing things down more than they're helping.
The current build of FAHCore_a8 is de-optimized for multi-socket environments (referred to as -ntmpi 1
here). This results in extra syncing between sockets so less threads might be faster.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 07, 2020 9:28 pm
by Joe_H
Yes, 1 slot with 12 threads.
When using the 2 threads per core provided through HT, the processing threads are in constant contention for use of the single FPU available. Each thread needs to be synchronized with the others, so overall processing is limited by the speed of the slowest threads. In addition, while some WUs will process on all 24 threads, depending on the size in atoms, they may not process much faster past a llower thread count.
Depending on what else is running on the system, including system processes for the OS, leaving at least 1 or 2 threads unused by F@h can result in less interruption of the threads that are being used.
Finally, one other thing I did not mention before is the NUMA related settings in your BIOS. They can affect how threads on one physical CPU communicate with those on the other. Those settings may speed up or slow down folding, or have no discernible effect.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Thu Oct 08, 2020 6:18 am
by Neil-B
With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Thu Oct 08, 2020 1:28 pm
by sptn.
Nilem wrote:[...]I also have a single i5-8500B 6 cores (CPU mark of 9538) that got the same WU, and it could crush it at of speed that would end up with over 100k PPD.[...]
I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Thu Oct 08, 2020 1:43 pm
by JimboPalmer
sptn. wrote:I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
Have you entered a passkey for your Folding?
https://foldingathome.org/support/faq/points/passkey/
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Fri Oct 09, 2020 3:29 am
by Nilem
sptn. wrote:I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
The "i7" doesn't mean much. An i7-8550U is a CPU for notebook and it is nearly half as powerful as an i5-8500B, but consume a little 15w compare to 65w. The "i7" version that would compare better is the i7-8700B, which is a 6 cores HT (the i5-8500B isn't HT) which deliver 30% more,
https://www.cpubenchmark.net/compare/In ... 3064vs3388
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Fri Oct 09, 2020 3:34 am
by Nilem
So I got my best PPD with two slots like so : 12/10. I could than achieve around 25k PPD.
But I see that the lack of AVX makes this machine very inefficient under FAH. I've tried Rosetta on it, and the result where more proportional to the raw power of that dual X5650, so I will probably keep it there for now and run FAH on my more recent machine.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Fri Oct 09, 2020 11:48 am
by sptn.
Yes I did.
Nilem wrote:sptn. wrote:I can't help you with your Problem, but I would like to know how you squeeze 100k PPD out of an i5.
I have an i7 (i guess i7-8550U something) and getting max 36k (more likely 30) PPD.
The "i7" doesn't mean much. An i7-8550U is a CPU for notebook and it is nearly half as powerful as an i5-8500B, but consume a little 15w compare to 65w. The "i7" version that would compare better is the i7-8700B, which is a 6 cores HT (the i5-8500B isn't HT) which deliver 30% more,
https://www.cpubenchmark.net/compare/In ... 3064vs3388
Ah this makes sense.
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Sun Oct 18, 2020 6:07 am
by Nathan_P
If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Wed Oct 21, 2020 11:08 pm
by Nilem
Nathan_P wrote:If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Fri Oct 23, 2020 7:58 pm
by PantherX
Nilem wrote:...I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?
Currently, folding on Linux does generate more points, each Project would have a different increase. Do note, that while this is the current situation, it might change in the future. Here's a Google Sheets that have some information for you to make an informed decision:
Same set of hardware for each test:
https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Re: Dual X5650 giving me 20k PPD, why so slow?
Posted: Sat Oct 24, 2020 9:02 am
by MeeLee
The idea of running one slot per CPU is probably interesting.
I don't know how you'd do that in the slots setting.
But if you run 2 CPUs, and force a WU on them that utilizes both CPUs, the connection between both CPUs are most definitely going to slow each unit down.
Disabling HT is also a good way to save power.
If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
Or, if you have some money spare, get a Ryzen 3900X. They're not only more than twice as fast @ 3,8-4Ghz, they're also getting extra bonus PPDs (due to many threads and faster finishing WUs) AND they use less power (around 150W stock, to 200W with PBO for a system doing CPU folding only, at the wall).