NUMA node affinity
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
NUMA node affinity
How do I assign CPU slots to specific NUMA nodes to avoid memory accesses traversing the QPI bus?
Re: NUMA node affinity
I don't believe you can do this within FAH. You could do this within your OS assuming that the FAH slots are configured to use no more than the number of cpus within one of your nodes. But exactly how would depend on your OS.
single 1070
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
Forgot to mention I'm using Linux Mint with kernel 5.3, on Windows I could do it but there's a weird thread limit even with multiple slots, no idea how on Linux unless I could replace the command that launches the cores for each slot with a numactl command.
Some work units don't like 56 threads and crash so its hard to permanently get full utilization.
Some work units don't like 56 threads and crash so its hard to permanently get full utilization.
Re: NUMA node affinity
You can probably get better answers directly from gromacs.org. FAH has adapted their analysis package and uses it for folding using the CPUs on home computers.. As far as 56 threads is concerned, FAH is customized for @home computers and it's probably fair to assume that the typical home computer doesn't have 56 threads and NUMA. The work-around is to configure multiple CPU slots using a "reasonable" number of threads (for home computers).
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
I've been messing around with multiple slots and threads per slot and 8 slots with 8 threads each seems to be fully utilizing all logical cores without crashing(so far).
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
Still much lower PPD than one slot when its able to use 56 threads.
-
- Posts: 2040
- Joined: Sat Dec 01, 2012 3:43 pm
- Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441
Re: NUMA node affinity
I remember 32 threads being the maximum used by FAH. So shouldn't 2 slots with 32 threads each be enough? Is the numa node memory accesses traversing the QPI bus really a bottleneck for FAH?
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
On windows its limited to 32 threads, I think its something to do with MPI or something being only 32 bit but there doesn't seem to be such a limit on Linux.
Re: NUMA node affinity
For Windows, 32 + 24 = 56 although projects with smaller numbers of atoms in the protein will place some additional restrictions on GROMACS.
(Unfortunately my Windows machines aren't big enough to test this all out myself.)
(Unfortunately my Windows machines aren't big enough to test this all out myself.)
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
I tried multiple slots on windows and it just didn't want to cooperate.
The GPU core uses thousands of threads, the CPU version really needs to be updated especially with AMD's 32 and 64 core Threadripper.
The GPU core uses thousands of threads, the CPU version really needs to be updated especially with AMD's 32 and 64 core Threadripper.
Last edited by CommanderLake on Tue Mar 10, 2020 8:40 pm, edited 1 time in total.
-
- Site Admin
- Posts: 7951
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: NUMA node affinity
F@h is not using MPI.
The 32 thread limit is from Windows. You need specific version of the Windows license to exceed that, and the executable needs to be compiled with the right flags and other options set.
The same code base is used by F@h for the versions running on Windows, Linux and OS X. Some of the I/O and other OS specific modules are different. Under Linux the CPU Core_A7 has been tested with over 100 threads on larger systems being worked on. But projects with that many atoms in the simulation usually go to GPU folding now.
The 32 thread limit is from Windows. You need specific version of the Windows license to exceed that, and the executable needs to be compiled with the right flags and other options set.
The same code base is used by F@h for the versions running on Windows, Linux and OS X. Some of the I/O and other OS specific modules are different. Under Linux the CPU Core_A7 has been tested with over 100 threads on larger systems being worked on. But projects with that many atoms in the simulation usually go to GPU folding now.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
The 32 thread limit is not from windows, I code parallel stuff and have never found such a limit.
Re: NUMA node affinity
You're probably not running on the home version of Windows which does have the 32 thread limit. M$ wants to sell a higher priced license. It may also depend on Win7 vs Win10 -- I don't remember.
I'm also not sure which compiler flags were used when FAHCore_a7 was compiled and if that would cause problems to somebody with an older/cheaper license.
I'm also not sure which compiler flags were used when FAHCore_a7 was compiled and if that would cause problems to somebody with an older/cheaper license.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
I have the 32 thread limit on Server 2016.
-
- Posts: 15
- Joined: Tue Mar 03, 2020 11:11 am
Re: NUMA node affinity
Started up server 2016 again, one slot with 28 threads(14 cores per node + HT) runs fine.
Any more than 32 threads is an invalid option.
With one slot running 28 threads a second slot will not run, even with 8, 4 or even 1 thread, it keeps downloading then returning work units and marking the project as faulty.
Any more than 32 threads is an invalid option.
With one slot running 28 threads a second slot will not run, even with 8, 4 or even 1 thread, it keeps downloading then returning work units and marking the project as faulty.