Page 1 of 2
NUMA node affinity
Posted: Tue Mar 10, 2020 3:58 pm
by CommanderLake
How do I assign CPU slots to specific NUMA nodes to avoid memory accesses traversing the QPI bus?
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 4:10 pm
by HaloJones
I don't believe you can do this within FAH. You could do this within your OS assuming that the FAH slots are configured to use no more than the number of cpus within one of your nodes. But exactly how would depend on your OS.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 4:55 pm
by CommanderLake
Forgot to mention I'm using Linux Mint with kernel 5.3, on Windows I could do it but there's a weird thread limit even with multiple slots, no idea how on Linux unless I could replace the command that launches the cores for each slot with a numactl command.
Some work units don't like 56 threads and crash so its hard to permanently get full utilization.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 5:05 pm
by bruce
You can probably get better answers directly from gromacs.org. FAH has adapted their analysis package and uses it for folding using the CPUs on home computers.. As far as 56 threads is concerned, FAH is customized for @home computers and it's probably fair to assume that the typical home computer doesn't have 56 threads and NUMA. The work-around is to configure multiple CPU slots using a "reasonable" number of threads (for home computers).
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 5:15 pm
by CommanderLake
I've been messing around with multiple slots and threads per slot and 8 slots with 8 threads each seems to be fully utilizing all logical cores without crashing(so far).
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 6:51 pm
by CommanderLake
Still much lower PPD than one slot when its able to use 56 threads.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 7:07 pm
by foldy
I remember 32 threads being the maximum used by FAH. So shouldn't 2 slots with 32 threads each be enough? Is the numa node memory accesses traversing the QPI bus really a bottleneck for FAH?
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:16 pm
by CommanderLake
On windows its limited to 32 threads, I think its something to do with MPI or something being only 32 bit but there doesn't seem to be such a limit on Linux.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:30 pm
by bruce
For Windows, 32 + 24 = 56 although projects with smaller numbers of atoms in the protein will place some additional restrictions on GROMACS.
(Unfortunately my Windows machines aren't big enough to test this all out myself.)
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:38 pm
by CommanderLake
I tried multiple slots on windows and it just didn't want to cooperate.
The GPU core uses thousands of threads, the CPU version really needs to be updated especially with AMD's 32 and 64 core Threadripper.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:38 pm
by Joe_H
F@h is not using MPI.
The 32 thread limit is from Windows. You need specific version of the Windows license to exceed that, and the executable needs to be compiled with the right flags and other options set.
The same code base is used by F@h for the versions running on Windows, Linux and OS X. Some of the I/O and other OS specific modules are different. Under Linux the CPU Core_A7 has been tested with over 100 threads on larger systems being worked on. But projects with that many atoms in the simulation usually go to GPU folding now.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:43 pm
by CommanderLake
The 32 thread limit is not from windows, I code parallel stuff and have never found such a limit.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 8:48 pm
by bruce
You're probably not running on the home version of Windows which does have the 32 thread limit. M$ wants to sell a higher priced license. It may also depend on Win7 vs Win10 -- I don't remember.
I'm also not sure which compiler flags were used when FAHCore_a7 was compiled and if that would cause problems to somebody with an older/cheaper license.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 9:03 pm
by CommanderLake
I have the 32 thread limit on Server 2016.
Re: NUMA node affinity
Posted: Tue Mar 10, 2020 10:12 pm
by CommanderLake
Started up server 2016 again, one slot with 28 threads(14 cores per node + HT) runs fine.
Any more than 32 threads is an invalid option.
With one slot running 28 threads a second slot will not run, even with 8, 4 or even 1 thread, it keeps downloading then returning work units and marking the project as faulty.