kyleedwardsny wrote:Please bear with me, I am brand new to F@H and don't know much about it
You're dong great! I've asked other people for md.log and they generally don't provide it. Yours confirms a theory I've had.
Code: Select all
Will use 20 particle-particle and 4 PME only ranks
GROMACS, the molecular dynamics code, is splitting your 24 threads into 20 PP (particle-particle) and 4 PME threads/ranks. Lower thread counts only use PP threads, which makes it easier to guess what the domain decomposition will be.
Code: Select all
Optimizing the DD grid for 20 cells with a minimum initial size of 1.423 nm
GROMACS is now attempting to split the volume of the molecule into cells for each thread. There is a limit to how small a cell can be for it to still be efficient.
Code: Select all
The maximum allowed number of cells is: X 4 Y 4 Z 4
Based on the size limit, each axis could be split into 4 sections. I think this should allow for a maximum of 4x4x4 = 64 PP threads/ranks.
On your machine, it tries to factor 20 into these constraints. 20 factors to 2x2x5. From the source code, it looks like GROMACS combines multiple instances of the same factor because it needs to be a combination of 2 or 3 numbers. 2x2x5 would become 4x5x1. Notice that 5 is greater than the limit of 4 so the cell would be too small. This causes the domain decomposition to fail with the message about 20 ranks.
GROMACS does not retry the with a lower starting number because it was requested to use all 24 threads. It would be good if FAHClient would do this automatically.
For example, the following combinations might work depending on the ratio GROMACS uses for PP vs. PME. The GROMACS manual mentions 20-33% but your log mentions 4/24 = 17%. These breakdowns are accounting for the max limit of 4x4x4.
Code: Select all
23 threads = 18 PP (2x3x3) + 5 PME (22%)
22 threads = 18 PP (2x3x3) + 4 PME (18%)
21 threads = 18 PP (2x3x3) + 3 PME (14% might be too low)
21 threads = 16 PP (4x4x1) + 5 PME (23%)
20 threads = 18 PP (2x3x3) + 2 PME (10% might be too low)
20 threads = 16 PP (4x4x1) + 4 PME (25% PME is equal to one of the factors, which might be ideal)
19 threads = 16 PP (4x4x1) + 3 PME (15% might be too low)
18 threads = 18 PP (2x3x3) + 0 PME (0% PME is optional)
18 threads = 16 PP (4x4x1) + 2 PME (11% might be too low)
17 threads = 16 PP (4x4x1) + 1 PME (6% too low)
17 threads = 12 PP (4x3x1) + 5 PME (29% might be too high)
16 threads = 16 PP (4x4x1) + 0 PME (0% PME is optional)
16 threads = 12 PP (4x3x1) + 4 PME (25%)
15 threads = 12 PP (4x3x1) + 3 PME (20%)
14 threads = 12 PP (4x3x1) + 2 PME (14% might be too low)
13 threads = 12 PP (4x3x1) + 1 PME (7% too low)
13 threads = 9 PP (3x3x1) + 4 PME (31% might be too high)
12 threads = 12 PP (4x3x1) + 0 PME (0% PME is optional)
12 threads = 9 PP (3x3x1) + 3 PME (25% PME is equal to one of the factors, which might be ideal)
I'm not sure what the minimum thread count is before GROMACS starts to omit PME ranks.
PantherX, are the researchers able to limit specific CPU counts or is it only a maximum? The response to these reports generally seems to be to set a max and eliminate multiples of 5. If someone could confirm my breakdowns, it might be possible to limit thread counts more precisely and provide more work for high thread count machines.