Testing domain decomposition for high CPU counts

Moderators: Site Moderators, FAHC Science Team

_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Testing domain decomposition for high CPU counts

Post by _r2w_ben »

Posts about domain decomposition errors are fairly common on machines with 20+ cores. Sometimes it's mentioned that researchers don't have those configurations or the time to test to every combination of work unit and thread count. I wondered whether it was possible to simplify this process.

Core A7 is based on GROMACS 5.0.4. I started with that source code and found where it builds the domain decomposition. I added a loop to try every thread count between 2 and 128 to see how it would break down a work unit. Each place where GROMACS would throw a fatal error and quit, I made it return instead and considered that count bad. What follows is the thread counts that should work for 3 different work units I tested. (I didn't think to write down the project number of one of them.)

All three work units have different maximum allowed number of cells based on the volume of the atoms. Estimated PME load also differs. When using more than 18 threads, some of the energy calculations are moved to separate PME threads. The algorithm first finds the minimum number of threads to carry the PME load and then splits the remaining threads. These two variables make it nearly impossible to guess whether a particular work unit will run on x threads.

p16501 - max 11x11x10 - PME load 0.10

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  7 = 7x1x1
  8 = 8x1x1
  9 = 3x3x1
 10 = 5x2x1
 11 = 11x1x1
 12 = 4x3x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 2x3x3  18 +  2 PME
 21 = 3x3x2  18 +  3 PME
 24 = 4x5x1  20 +  4 PME
 25 = 5x2x2  20 +  5 PME
 27 = 3x4x2  24 +  3 PME
 28 = 4x3x2  24 +  4 PME
 30 = 3x3x3  27 +  3 PME
 32 = 8x3x1  24 +  8 PME
 35 = 5x3x2  30 +  5 PME
 36 = 4x4x2  32 +  4 PME
 40 = 4x3x3  36 +  4 PME
 42 = 6x3x2  36 +  6 PME
 44 = 4x3x3  36 +  8 PME
 45 = 5x4x2  40 +  5 PME
 48 = 8x5x1  40 +  8 PME
 50 = 5x3x3  45 +  5 PME
 52 = 5x4x2  40 + 12 PME
 54 = 6x4x2  48 +  6 PME
 55 = 7x7x1  49 +  6 PME
 56 = 7x7x1  49 +  7 PME
 60 = 6x3x3  54 +  6 PME
 63 = 7x4x2  56 +  7 PME
 64 = 8x7x1  56 +  8 PME
 65 = 5x5x2  50 + 15 PME
 66 = 4x7x2  56 + 10 PME
 70 = 7x3x3  63 +  7 PME
 72 = 8x4x2  64 +  8 PME
 75 = 5x3x4  60 + 15 PME
 77 = 7x3x3  63 + 14 PME
 78 = 5x7x2  70 +  8 PME
 80 = 8x3x3  72 +  8 PME
 81 = 9x4x2  72 +  9 PME
 84 = 6x4x3  72 + 12 PME
 85 = 5x5x3  75 + 10 PME
 88 = 8x3x3  72 + 16 PME
 90 = 9x3x3  81 +  9 PME
 91 = 7x5x2  70 + 21 PME
 95 = 5x4x4  80 + 15 PME
 96 = 4x3x7  84 + 12 PME
 98 = 7x4x3  84 + 14 PME
 99 = 3x7x4  84 + 15 PME
100 = 10x3x3 90 + 10 PME
102 = 6x5x3  90 + 12 PME
104 = 6x5x3  90 + 14 PME
105 = 5x6x3  90 + 15 PME
108 = 6x4x4  96 + 12 PME
110 = 7x7x2  98 + 12 PME
112 = 7x2x7  98 + 14 PME
114 = 4x5x5 100 + 14 PME
115 = 5x5x4 100 + 15 PME
117 = 5x7x3 105 + 12 PME
119 = 7x5x3 105 + 14 PME
120 = 6x6x3 108 + 12 PME
125 = 5x5x4 100 + 25 PME
126 = 7x4x4 112 + 14 PME
128 = 4x4x7 112 + 16 PME
p??? - max 6x6x5 - PME load 0.20

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  8 = 4x2x1
  9 = 3x3x1
 10 = 5x2x1
 12 = 6x2x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 4x2x2  16 +  4 PME
 21 = 2x4x2  16 +  5 PME
 24 = 6x3x1  18 +  6 PME
 25 = 5x2x2  20 +  5 PME
 27 = 3x3x2  18 +  9 PME
 28 = 5x2x2  20 +  8 PME
 30 = 6x2x2  24 +  6 PME
 32 = 3x4x2  24 +  8 PME
 35 = 5x5x1  25 + 10 PME
 36 = 3x3x3  27 +  9 PME
 40 = 4x4x2  32 +  8 PME
 42 = 4x4x2  32 + 10 PME
 44 = 4x4x2  32 + 12 PME
 45 = 3x3x4  36 +  9 PME
 48 = 6x2x3  36 + 12 PME
 50 = 4x5x2  40 + 10 PME
 52 = 5x4x2  40 + 12 PME
 54 = 6x3x2  36 + 18 PME
 55 = 5x4x2  40 + 15 PME
 56 = 5x4x2  40 + 16 PME
 60 = 6x4x2  48 + 12 PME
 64 = 3x4x4  48 + 16 PME
 65 = 5x5x2  50 + 15 PME
 66 = 5x5x2  50 + 16 PME
 72 = 6x3x3  54 + 18 PME
 75 = 5x3x4  60 + 15 PME
 80 = 4x4x4  64 + 16 PME
 81 = 3x6x3  54 + 27 PME
 85 = 5x6x2  60 + 25 PME
 90 = 6x3x4  72 + 18 PME
 95 = 5x5x3  75 + 20 PME
 96 = 6x4x3  72 + 24 PME
 99 = 5x5x3  75 + 24 PME
100 = 5x4x4  80 + 20 PME
110 = 5x4x4  80 + 30 PME
114 = 5x6x3  90 + 24 PME
115 = 5x6x3  90 + 25 PME
117 = 3x6x5  90 + 27 PME
120 = 6x4x4  96 + 24 PME
125 = 5x5x4 100 + 25 PME
128 = 4x6x4  96 + 32 PME
p14378 - max 5x5x5 - PME load 0.37

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 3x2x1
  8 = 4x2x1
  9 = 3x3x1
 10 = 5x2x1
 12 = 4x3x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 3x3x2
 20 = 4x3x1  12 +  8 PME
 21 = 4x3x1  12 +  9 PME
 24 = 5x3x1  15 +  9 PME
 25 = 5x3x1  15 + 10 PME
 27 = 3x5x1  15 + 12 PME
 28 = 3x3x2  18 + 10 PME
 30 = 3x2x3  18 + 12 PME
 32 = 4x5x1  20 + 12 PME
 35 = 5x4x1  20 + 15 PME
 36 = 4x5x1  20 + 16 PME
 40 = 5x5x1  25 + 15 PME
 42 = 3x3x3  27 + 15 PME
 44 = 4x2x3  24 + 20 PME
 45 = 3x3x3  27 + 18 PME
 48 = 3x5x2  30 + 18 PME
 50 = 4x4x2  32 + 18 PME
 52 = 4x4x2  32 + 20 PME
 54 = 3x5x2  30 + 24 PME
 55 = 5x2x3  30 + 25 PME
 56 = 4x3x3  36 + 20 PME
 60 = 4x3x3  36 + 24 PME
 63 = 3x4x3  36 + 27 PME
 64 = 4x5x2  40 + 24 PME
 65 = 5x4x2  40 + 25 PME
 66 = 3x4x3  36 + 30 PME
 70 = 5x3x3  45 + 25 PME
 72 = 3x5x3  45 + 27 PME
 75 = 4x4x3  48 + 27 PME
 78 = 4x4x3  48 + 30 PME
 80 = 5x2x5  50 + 30 PME
 81 = 4x4x3  48 + 33 PME
 85 = 5x2x5  50 + 35 PME
 95 = 5x4x3  60 + 35 PME
 96 = 4x5x3  60 + 36 PME
100 = 4x4x4  64 + 36 PME
104 = 4x4x4  64 + 40 PME
108 = 4x4x4  64 + 44 PME
117 = 5x5x3  75 + 42 PME
120 = 5x3x5  75 + 45 PME
125 = 5x4x4  80 + 45 PME
128 = 4x5x4  80 + 48 PME
All data for 1-128 threads. 1 indicates success while 0 is failure.
Note: There do appear to be additional restrictions baked into the assignment server and core A7 that limit possibilities that could work but are not attempted.

Code: Select all

# Threads p16501 p???   p14378
  1       1      1      1
  2       1      1      1
  3       1      1      1
  4       1      1      1
  5       1      1      1
  6       1      1      1
  7       1      0      0
  8       1      1      1
  9       1      1      1
 10       1      1      1
 11       1      0      0
 12       1      1      1
 13       0      0      0
 14       0      0      0
 15       1      1      1
 16       1      1      1
 17       0      0      0
 18       1      1      1
 19       0      0      0
 20       1      1      1
 21       1      1      1
 22       0      0      0
 23       0      0      0
 24       1      1      1
 25       1      1      1
 26       0      0      0
 27       1      1      1
 28       1      1      1
 29       0      0      0
 30       1      1      1
 31       0      0      0
 32       1      1      1
 33       0      0      0
 34       0      0      0
 35       1      1      1
 36       1      1      1
 37       0      0      0
 38       0      0      0
 39       0      0      0
 40       1      1      1
 41       0      0      0
 42       1      1      1
 43       0      0      0
 44       1      1      1
 45       1      1      1
 46       0      0      0
 47       0      0      0
 48       1      1      1
 49       0      0      0
 50       1      1      1
 51       0      0      0
 52       1      1      1
 53       0      0      0
 54       1      1      1
 55       1      1      1
 56       1      1      1
 57       0      0      0
 58       0      0      0
 59       0      0      0
 60       1      1      1
 61       0      0      0
 62       0      0      0
 63       1      0      1
 64       1      1      1
 65       1      1      1
 66       1      1      1
 67       0      0      0
 68       0      0      0
 69       0      0      0
 70       1      0      1
 71       0      0      0
 72       1      1      1
 73       0      0      0
 74       0      0      0
 75       1      1      1
 76       0      0      0
 77       1      0      0
 78       1      0      1
 79       0      0      0
 80       1      1      1
 81       1      1      1
 82       0      0      0
 83       0      0      0
 84       1      0      0
 85       1      1      1
 86       0      0      0
 87       0      0      0
 88       1      0      0
 89       0      0      0
 90       1      1      0
 91       1      0      0
 92       0      0      0
 93       0      0      0
 94       0      0      0
 95       1      1      1
 96       1      1      1
 97       0      0      0
 98       1      0      0
 99       1      1      0
100       1      1      1
101       0      0      0
102       1      0      0
103       0      0      0
104       1      0      1
105       1      0      0
106       0      0      0
107       0      0      0
108       1      0      1
109       0      0      0
110       1      1      0
111       0      0      0
112       1      0      0
113       0      0      0
114       1      1      0
115       1      1      0
116       0      0      0
117       1      1      1
118       0      0      0
119       1      0      0
120       1      1      1
121       0      0      0
122       0      0      0
123       0      0      0
124       0      0      0
125       1      1      1
126       1      0      0
127       0      0      0
128       1      1      1
Next steps?
I'd like to get a thread count test into the hands of researchers. If there is a particular format that would work well for setting limits on the assignment server, I can work on that.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Testing domain decomposition for high CPU counts

Post by bruce »

That's interesting information.

A number of years ago, a GROMACS expert (who is no longer part of FAH) recommended a specific setup setting which blocked the creation of a separate allocation for PME. Somehow the PME calculations were (apparently) performed in-line rather than in parallel with the calculations of forces in real space, but I never understood it. Maybe we should re-enable that setting but somebody who understands it should make that choice.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

Running without PME would make it easier to guess if a thread count will work. It's limited by whether n threads can be expressed as a product of 3 integers, with each integer <= the maximum allowed number of cells.

I reran my tests requesting 0 PME ranks and there are fewer valid combinations. :(

p16501 - max 11x11x10

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  7 = 7x1x1
  8 = 8x1x1
  9 = 3x3x1
 10 = 5x2x1
 11 = 11x1x1
 12 = 4x3x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 5x4x1
 21 = 7x3x1
 24 = 6x4x1
 25 = 5x5x1
 27 = 9x3x1
 28 = 7x4x1
 30 = 6x5x1
 32 = 8x4x1
 35 = 7x5x1
 36 = 6x6x1
 40 = 8x5x1
 42 = 7x6x1
 44 = 11x4x1
 45 = 9x5x1
 48 = 8x6x1
 49 = 7x7x1
 50 = 10x5x1
 54 = 9x6x1
 55 = 11x5x1
 56 = 8x7x1
 60 = 10x6x1
 63 = 9x7x1
 64 = 8x8x1
 66 = 11x6x1
 70 = 10x7x1
 72 = 6x6x2
 75 = 5x5x3
 77 = 11x7x1
 80 = 10x8x1
 81 = 9x9x1
 84 = 7x6x2
 88 = 11x8x1
 90 = 10x9x1
 96 = 8x6x2
 98 = 7x7x2
 99 = 11x9x1
100 = 10x10x1
105 = 7x5x3
108 = 9x6x2
110 = 11x10x1
112 = 7x4x4
120 = 5x6x4
121 = 11x11x1
125 = 5x5x5
126 = 9x7x2
128 = 8x4x4
p??? - max 6x6x5

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  8 = 4x2x1
  9 = 3x3x1
 10 = 5x2x1
 12 = 6x2x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 5x4x1
 24 = 6x4x1
 25 = 5x5x1
 27 = 3x3x3
 30 = 6x5x1
 32 = 4x4x2
 36 = 6x6x1
 40 = 5x4x2
 45 = 5x3x3
 48 = 6x4x2
 50 = 5x5x2
 54 = 6x3x3
 60 = 5x4x3
 64 = 4x4x4
 72 = 6x4x3
 75 = 5x5x3
 80 = 5x4x4
 90 = 6x5x3
 96 = 6x4x4
100 = 5x5x4
108 = 6x6x3
120 = 5x6x4
125 = 5x5x5
p14378 - max 5x5x5

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 3x2x1
  8 = 4x2x1
  9 = 3x3x1
 10 = 5x2x1
 12 = 4x3x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 3x3x2
 20 = 5x4x1
 24 = 4x3x2
 25 = 5x5x1
 27 = 3x3x3
 30 = 5x3x2
 32 = 4x4x2
 36 = 4x3x3
 40 = 4x5x2
 45 = 5x3x3
 48 = 4x4x3
 50 = 5x5x2
 60 = 5x4x3
 64 = 4x4x4
 75 = 5x5x3
 80 = 4x5x4
100 = 5x5x4
125 = 5x5x5
All data for 1-128 threads. 1 indicates success while 0 is failure.

Code: Select all

# Threads p16501 p???   p14378
  1       1      1      1
  2       1      1      1
  3       1      1      1
  4       1      1      1
  5       1      1      1
  6       1      1      1
  7       1      0      0
  8       1      1      1
  9       1      1      1
 10       1      1      1
 11       1      0      0
 12       1      1      1
 13       0      0      0
 14       0      0      0
 15       1      1      1
 16       1      1      1
 17       0      0      0
 18       1      1      1
 19       0      0      0
 20       1      1      1
 21       1      0      0
 22       0      0      0
 23       0      0      0
 24       1      1      1
 25       1      1      1
 26       0      0      0
 27       1      1      1
 28       1      0      0
 29       0      0      0
 30       1      1      1
 31       0      0      0
 32       1      1      1
 33       0      0      0
 34       0      0      0
 35       1      0      0
 36       1      1      1
 37       0      0      0
 38       0      0      0
 39       0      0      0
 40       1      1      1
 41       0      0      0
 42       1      0      0
 43       0      0      0
 44       1      0      0
 45       1      1      1
 46       0      0      0
 47       0      0      0
 48       1      1      1
 49       1      0      0
 50       1      1      1
 51       0      0      0
 52       0      0      0
 53       0      0      0
 54       1      1      0
 55       1      0      0
 56       1      0      0
 57       0      0      0
 58       0      0      0
 59       0      0      0
 60       1      1      1
 61       0      0      0
 62       0      0      0
 63       1      0      0
 64       1      1      1
 65       0      0      0
 66       1      0      0
 67       0      0      0
 68       0      0      0
 69       0      0      0
 70       1      0      0
 71       0      0      0
 72       1      1      0
 73       0      0      0
 74       0      0      0
 75       1      1      1
 76       0      0      0
 77       1      0      0
 78       0      0      0
 79       0      0      0
 80       1      1      1
 81       1      0      0
 82       0      0      0
 83       0      0      0
 84       1      0      0
 85       0      0      0
 86       0      0      0
 87       0      0      0
 88       1      0      0
 89       0      0      0
 90       1      1      0
 91       0      0      0
 92       0      0      0
 93       0      0      0
 94       0      0      0
 95       0      0      0
 96       1      1      0
 97       0      0      0
 98       1      0      0
 99       1      0      0
100       1      1      1
101       0      0      0
102       0      0      0
103       0      0      0
104       0      0      0
105       1      0      0
106       0      0      0
107       0      0      0
108       1      1      0
109       0      0      0
110       1      0      0
111       0      0      0
112       1      0      0
113       0      0      0
114       0      0      0
115       0      0      0
116       0      0      0
117       0      0      0
118       0      0      0
119       0      0      0
120       1      1      0
121       1      0      0
122       0      0      0
123       0      0      0
124       0      0      0
125       1      1      1
126       1      0      0
127       0      0      0
128       1      0      0
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

I caught a smaller project. This one would fail for any thread count greater than 64 with the exception of 80.

p16423 - max 4x4x4 - PME load 0.19

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  6 = 3x2x1
  8 = 4x2x1
  9 = 3x3x1
 12 = 4x3x1
 16 = 4x4x1
 18 = 2x3x3
 20 = 4x4x1  16 +  4 PME
 21 = 4x4x1  16 +  5 PME
 24 = 3x2x3  18 +  6 PME
 27 = 2x3x3  18 +  9 PME
 30 = 3x4x2  24 +  6 PME
 32 = 3x4x2  24 +  8 PME
 36 = 3x3x3  27 +  9 PME
 40 = 4x4x2  32 +  8 PME
 42 = 4x4x2  32 + 10 PME
 44 = 3x4x3  36 +  8 PME
 45 = 4x3x3  36 +  9 PME
 48 = 4x3x3  36 + 12 PME
 54 = 3x3x4  36 + 18 PME
 60 = 4x3x4  48 + 12 PME
 64 = 4x4x3  48 + 16 PME
 80 = 4x4x4  64 + 16 PME
All data for 1-128 threads. 1 indicates success while 0 is failure.

Code: Select all

# Threads p16501 p???   p14378 p16423
  1       1      1      1      1
  2       1      1      1      1
  3       1      1      1      1
  4       1      1      1      1
  5       1      1      1      0
  6       1      1      1      1
  7       1      0      0      0
  8       1      1      1      1
  9       1      1      1      1
 10       1      1      1      0
 11       1      0      0      0
 12       1      1      1      1
 13       0      0      0      0
 14       0      0      0      0
 15       1      1      1      0
 16       1      1      1      1
 17       0      0      0      0
 18       1      1      1      1
 19       0      0      0      0
 20       1      1      1      1
 21       1      1      1      1
 22       0      0      0      0
 23       0      0      0      0
 24       1      1      1      1
 25       1      1      1      0
 26       0      0      0      0
 27       1      1      1      1
 28       1      1      1      0
 29       0      0      0      0
 30       1      1      1      1
 31       0      0      0      0
 32       1      1      1      1
 33       0      0      0      0
 34       0      0      0      0
 35       1      1      1      0
 36       1      1      1      1
 37       0      0      0      0
 38       0      0      0      0
 39       0      0      0      0
 40       1      1      1      1
 41       0      0      0      0
 42       1      1      1      1
 43       0      0      0      0
 44       1      1      1      1
 45       1      1      1      1
 46       0      0      0      0
 47       0      0      0      0
 48       1      1      1      1
 49       0      0      0      0
 50       1      1      1      0
 51       0      0      0      0
 52       1      1      1      0
 53       0      0      0      0
 54       1      1      1      1
 55       1      1      1      0
 56       1      1      1      0
 57       0      0      0      0
 58       0      0      0      0
 59       0      0      0      0
 60       1      1      1      1
 61       0      0      0      0
 62       0      0      0      0
 63       1      0      1      0
 64       1      1      1      1
 65       1      1      1      0
 66       1      1      1      0
 67       0      0      0      0
 68       0      0      0      0
 69       0      0      0      0
 70       1      0      1      0
 71       0      0      0      0
 72       1      1      1      0
 73       0      0      0      0
 74       0      0      0      0
 75       1      1      1      0
 76       0      0      0      0
 77       1      0      0      0
 78       1      0      1      0
 79       0      0      0      0
 80       1      1      1      1
 81       1      1      1      0
 82       0      0      0      0
 83       0      0      0      0
 84       1      0      0      0
 85       1      1      1      0
 86       0      0      0      0
 87       0      0      0      0
 88       1      0      0      0
 89       0      0      0      0
 90       1      1      0      0
 91       1      0      0      0
 92       0      0      0      0
 93       0      0      0      0
 94       0      0      0      0
 95       1      1      1      0
 96       1      1      1      0
 97       0      0      0      0
 98       1      0      0      0
 99       1      1      0      0
100       1      1      1      0
101       0      0      0      0
102       1      0      0      0
103       0      0      0      0
104       1      0      1      0
105       1      0      0      0
106       0      0      0      0
107       0      0      0      0
108       1      0      1      0
109       0      0      0      0
110       1      1      0      0
111       0      0      0      0
112       1      0      0      0
113       0      0      0      0
114       1      1      0      0
115       1      1      0      0
116       0      0      0      0
117       1      1      1      0
118       0      0      0      0
119       1      0      0      0
120       1      1      1      0
121       0      0      0      0
122       0      0      0      0
123       0      0      0      0
124       0      0      0      0
125       1      1      1      0
126       1      0      0      0
127       0      0      0      0
128       1      1      1      0
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

More data for another 6x6x5 but with a slightly different PME load. The thread breakdown is the same until 44 threads where the 0.01 difference starts to have an effect.

p13832 - max 6x6x5 - PME load 0.19

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  8 = 4x2x1
  9 = 3x3x1
 10 = 5x2x1
 12 = 6x2x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 4x2x2  16 +  4 PME
 21 = 4x2x2  16 +  5 PME
 24 = 6x3x1  18 +  6 PME
 25 = 5x2x2  20 +  5 PME
 27 = 3x3x2  18 +  9 PME
 28 = 5x2x2  20 +  8 PME
 30 = 6x4x1  24 +  6 PME
 32 = 3x4x2  24 +  8 PME
 35 = 5x5x1  25 + 10 PME
 36 = 3x3x3  27 +  9 PME
 40 = 4x4x2  32 +  8 PME
 42 = 4x4x2  32 + 10 PME
 44 = 3x4x3  36 +  8 PME
 45 = 3x3x4  36 +  9 PME
 48 = 6x2x3  36 + 12 PME
 50 = 4x5x2  40 + 10 PME
 52 = 5x4x2  40 + 12 PME
 54 = 6x3x2  36 + 18 PME
 55 = 3x5x3  45 + 10 PME
 56 = 5x4x2  40 + 16 PME
 60 = 6x4x2  48 + 12 PME
 64 = 3x4x4  48 + 16 PME
 65 = 5x5x2  50 + 15 PME
 66 = 6x3x3  54 + 12 PME
 72 = 6x3x3  54 + 18 PME
 75 = 5x3x4  60 + 15 PME
 78 = 4x4x4  64 + 14 PME
 80 = 4x4x4  64 + 16 PME
 81 = 3x6x3  54 + 27 PME
 85 = 5x6x2  60 + 25 PME
 88 = 6x4x3  72 + 16 PME
 90 = 6x3x4  72 + 18 PME
 95 = 5x5x3  75 + 20 PME
 96 = 6x4x3  72 + 24 PME
 98 = 4x5x4  80 + 18 PME
100 = 5x4x4  80 + 20 PME
110 = 5x6x3  90 + 20 PME
114 = 5x6x3  90 + 24 PME
115 = 5x6x3  90 + 25 PME
117 = 6x4x4  96 + 21 PME
120 = 6x4x4  96 + 24 PME
125 = 5x5x4 100 + 25 PME
128 = 4x6x4  96 + 32 PME
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Testing domain decomposition for high CPU counts

Post by Neil-B »

Without PME even given the fewer core options that work, is it more predictable? Might that have been why the suggestion was made?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

Neil-B wrote:Without PME even given the fewer core options that work, is it more predictable? Might that have been why the suggestion was made?
It definitely is more predictable.

You could take the maximum allowed number of cells from md.log and generate all permutations. Then sort and remove duplicates.
Here's an example for 4x4x4:
4x4x4 = 64
4x4x3 = 48
4x4x2 = 32
4x4x1 = 16
4x3x3 = 36
4x3x2 = 24
4x3x1 = 12
4x2x2 = 16
4x2x1 = 8
4x1x1 = 4
3x3x3 = 27
3x3x2 = 18
3x3x1 = 9
3x2x2 = 12
3x2x1 = 6
3x1x1 = 3
2x2x2 = 8
2x2x1 = 4
2x1x1 = 2
1x1x1 = 1
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

More data for an even smaller project.

p14574 - max 4x4x3 - PME load 0.18

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  6 = 3x2x1
  8 = 4x2x1
  9 = 3x3x1
 12 = 4x3x1
 16 = 4x4x1
 18 = 3x3x2
 20 = 4x4x1  16 +  4 PME
 21 = 4x4x1  16 +  5 PME
 24 = 3x2x3  18 +  6 PME
 27 = 3x3x2  18 +  9 PME
 30 = 4x3x2  24 +  6 PME
 32 = 3x4x2  24 +  8 PME
 36 = 3x3x3  27 +  9 PME
 40 = 4x4x2  32 +  8 PME
 42 = 4x4x2  32 + 10 PME
 44 = 3x4x3  36 +  8 PME
 45 = 3x4x3  36 +  9 PME
 48 = 4x3x3  36 + 12 PME
 54 = 3x4x3  36 + 18 PME
 60 = 4x4x3  48 + 12 PME
 64 = 4x4x3  48 + 16 PME
uyaem
Posts: 219
Joined: Sat Mar 21, 2020 7:35 pm
Location: Esslingen, Germany

Re: Testing domain decomposition for high CPU counts

Post by uyaem »

I've been running a single 21core slot for 8 days and have yet to encounter any failure.

I've seen logs posted previously which detailed the PME load, but I cannot find any mention of PME in my logs.
The question would be: Have I been on certain projects only, or do I need a certain log level to see the domain decomposition?
Image
CPU: Ryzen 9 3900X (1x21 CPUs) ~ GPU: nVidia GeForce GTX 1660 Super (Asus)
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

uyaem wrote:I've been running a single 21core slot for 8 days and have yet to encounter any failure.
21 seems to be a good number. All 6 projects analyzed so far have supported 21 threads.
uyaem wrote:I've seen logs posted previously which detailed the PME load, but I cannot find any mention of PME in my logs.
The question would be: Have I been on certain projects only, or do I need a certain log level to see the domain decomposition?
Domain decomposition will only show up in the main log file if it fails with a fatal error. That's what you generally see people posting.

If you go into the work folder to the lowest level, there is a file called md.log. Search "domain decomposition" in there and you'll see something like this.

Code: Select all

Initializing Domain Decomposition on 21 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.401 nm, LJ-14, atoms 886 896
  multi-body bonded interactions: 0.401 nm, Proper Dih., atoms 896 886
Minimum cell size due to bonded interactions: 0.441 nm
Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.098 nm
Estimated maximum distance required for P-LINCS: 1.098 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.18
Will use 16 particle-particle and 5 PME only ranks
This is a guess, check the performance at the end of the log file
Using 5 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 16 cells with a minimum initial size of 1.372 nm
The maximum allowed number of cells is: X 4 Y 4 Z 3
Domain decomposition grid 4 x 4 x 1, separate PME ranks 5
PME domain decomposition: 5 x 1 x 1
"Guess for relative PME load" will only be included when there are more than 18 threads assigned to a slot.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

An unexpected curve ball, two different runs of the same project have different maximum allowed number of cells. They're still high values so this doesn't have any effect until way beyond 128 threads.

p14628 Run 554

Code: Select all

Initializing Domain Decomposition on 20 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.435 nm, LJ-14, atoms 128 133
  multi-body bonded interactions: 0.435 nm, Proper Dih., atoms 128 133
Minimum cell size due to bonded interactions: 0.479 nm
Maximum distance for 13 constraints, at 120 deg. angles, all-trans: 0.219 nm
Estimated maximum distance required for P-LINCS: 0.219 nm
Guess for relative PME load: 0.36
Will use 12 particle-particle and 8 PME only ranks
This is a guess, check the performance at the end of the log file
Using 8 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 12 cells with a minimum initial size of 0.598 nm
The maximum allowed number of cells is: X 13 Y 13 Z 13
Domain decomposition grid 4 x 3 x 1, separate PME ranks 8
PME domain decomposition: 4 x 2 x 1
p14628 Run 814

Code: Select all

Initializing Domain Decomposition on 20 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.431 nm, LJ-14, atoms 1297 1302
  multi-body bonded interactions: 0.431 nm, Proper Dih., atoms 1297 1302
Minimum cell size due to bonded interactions: 0.474 nm
Maximum distance for 13 constraints, at 120 deg. angles, all-trans: 0.219 nm
Estimated maximum distance required for P-LINCS: 0.219 nm
Guess for relative PME load: 0.36
Will use 12 particle-particle and 8 PME only ranks
This is a guess, check the performance at the end of the log file
Using 8 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 12 cells with a minimum initial size of 0.592 nm
The maximum allowed number of cells is: X 14 Y 14 Z 14
Domain decomposition grid 4 x 3 x 1, separate PME ranks 8
PME domain decomposition: 4 x 2 x 1
p14628 Run 814 - max 14x14x14 - PME load 0.36
p14628 Run 554 - max 13x13x13 - PME load 0.36

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  5 = 5x1x1
  6 = 6x1x1
  7 = 7x1x1
  8 = 8x1x1
  9 = 9x1x1
 10 = 10x1x1
 11 = 11x1x1
 12 = 12x1x1
 15 = 5x3x1
 16 = 4x4x1
 18 = 6x3x1
 20 = 4x3x1  12 +  8 PME
 21 = 6x2x1  12 +  9 PME
 24 = 3x5x1  15 +  9 PME
 25 = 5x3x1  15 + 10 PME
 27 = 5x3x1  15 + 12 PME
 28 = 3x3x2  18 + 10 PME
 30 = 9x2x1  18 + 12 PME
 32 = 5x2x2  20 + 12 PME
 35 = 5x4x1  20 + 15 PME
 36 = 4x5x1  20 + 16 PME
 40 = 5x5x1  25 + 15 PME
 42 = 3x3x3  27 + 15 PME
 44 = 4x6x1  24 + 20 PME
 45 = 3x3x3  27 + 18 PME
 48 = 6x5x1  30 + 18 PME
 50 = 4x4x2  32 + 18 PME
 52 = 4x4x2  32 + 20 PME
 54 = 6x5x1  30 + 24 PME
 55 = 5x6x1  30 + 25 PME
 56 = 4x3x3  36 + 20 PME
 60 = 4x3x3  36 + 24 PME
 63 = 3x6x2  36 + 27 PME
 64 = 4x5x2  40 + 24 PME
 65 = 5x4x2  40 + 25 PME
 66 = 6x6x1  36 + 30 PME
 70 = 5x3x3  45 + 25 PME
 72 = 3x5x3  45 + 27 PME
 75 = 3x8x2  48 + 27 PME
 77 = 7x7x1  49 + 28 PME
 78 = 6x4x2  48 + 30 PME
 80 = 5x2x5  50 + 30 PME
 81 = 3x8x2  48 + 33 PME
 84 = 6x3x3  54 + 30 PME
 85 = 5x2x5  50 + 35 PME
 88 = 4x7x2  56 + 32 PME
 90 = 6x3x3  54 + 36 PME
 91 = 7x4x2  56 + 35 PME
 95 = 5x4x3  60 + 35 PME
 96 = 3x7x3  63 + 33 PME
 98 = 7x3x3  63 + 35 PME
 99 = 3x7x3  63 + 36 PME
100 = 4x4x4  64 + 36 PME
102 = 3x7x3  63 + 39 PME
104 = 4x4x4  64 + 40 PME
105 = 7x3x3  63 + 42 PME
108 = 4x4x4  64 + 44 PME
110 = 5x7x2  70 + 40 PME
112 = 4x6x3  72 + 40 PME
114 = 5x5x3  75 + 39 PME
115 = 5x3x5  75 + 40 PME
117 = 5x5x3  75 + 42 PME
119 = 7x2x5  70 + 49 PME
120 = 5x3x5  75 + 45 PME
125 = 5x4x4  80 + 45 PME
126 = 9x3x3  81 + 45 PME
128 = 4x7x3  84 + 44 PME
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

Data for p14576 that has failed on 48, 30 and 24 threads.
The slight difference in PME compared to 0.18 for p14574 eliminates 24, 30, 36, 48, 54, 60 as viable options for this project.

p14576 - max 4x4x3 - PME load 0.17

Code: Select all

  2 = 2x1x1
  3 = 3x1x1
  4 = 4x1x1
  6 = 3x2x1
  8 = 4x2x1
  9 = 3x3x1
 12 = 4x3x1
 16 = 4x4x1
 18 = 3x3x2
 20 = 4x4x1  16 +  4 PME
 21 = 4x4x1  16 +  5 PME
 27 = 3x3x2  18 +  9 PME
 32 = 4x2x3  24 +  8 PME
 40 = 4x4x2  32 +  8 PME
 42 = 4x4x2  32 + 10 PME
 44 = 4x3x3  36 +  8 PME
 45 = 3x4x3  36 +  9 PME
 64 = 4x4x3  48 + 16 PME
All data for 1-128 threads. 1 indicates success while blank is failure.

Code: Select all

# Threads p14336   p14365   p16501   p???    p13832  p14378  p16423  p14574  p14576
      Box 18x18x18 13x13x13 11x11x10 6x6x5   6x6x5   5x5x5   4x4x4   4x4x3   4x4x3
      PME 0.22     0.36     0.1      0.2     0.19    0.37    0.19    0.18    0.17
  1       1        1        1        1       1       1       1       1       1
  2       1        1        1        1       1       1       1       1       1
  3       1        1        1        1       1       1       1       1       1
  4       1        1        1        1       1       1       1       1       1
  5       1        1        1        1       1       1                        
  6       1        1        1        1       1       1       1       1       1
  7       1        1        1                                                 
  8       1        1        1        1       1       1       1       1       1
  9       1        1        1        1       1       1       1       1       1
 10       1        1        1        1       1       1                        
 11       1        1        1                                                 
 12       1        1        1        1       1       1       1       1       1
 13                                                                           
 14                                                                           
 15       1        1        1        1       1       1                        
 16       1        1        1        1       1       1       1       1       1
 17                                                                           
 18       1        1        1        1       1       1       1       1       1
 19                                                                           
 20       1        1        1        1       1       1       1       1       1
 21       1        1        1        1       1       1       1       1       1
 22                                                                           
 23                                                                           
 24       1        1        1        1       1       1       1       1        
 25       1        1        1        1       1       1                        
 26                                                                           
 27       1        1        1        1       1       1       1       1       1
 28       1        1        1        1       1       1                        
 29                                                                           
 30       1        1        1        1       1       1       1       1        
 31                                                                           
 32       1        1        1        1       1       1       1       1       1
 33                                                                           
 34                                                                           
 35       1        1        1        1       1       1                        
 36       1        1        1        1       1       1       1       1        
 37                                                                           
 38                                                                           
 39                                                                           
 40       1        1        1        1       1       1       1       1       1
 41                                                                           
 42       1        1        1        1       1       1       1       1       1
 43                                                                           
 44       1        1        1        1       1       1       1       1       1
 45       1        1        1        1       1       1       1       1       1
 46                                                                           
 47                                                                           
 48       1        1        1        1       1       1       1       1        
 49                                                                           
 50       1        1        1        1       1       1                        
 51                                                                           
 52       1        1        1        1       1       1                        
 53                                                                           
 54       1        1        1        1       1       1       1       1        
 55       1        1        1        1       1       1                        
 56       1        1        1        1       1       1                        
 57                                                                           
 58                                                                           
 59                                                                           
 60       1        1        1        1       1       1       1       1        
 61                                                                           
 62                                                                           
 63       1        1        1                        1                        
 64       1        1        1        1       1       1       1       1       1
 65       1        1        1        1       1       1                        
 66       1        1        1        1       1       1                        
 67                                                                           
 68                                                                           
 69                                                                           
 70       1        1        1                        1                        
 71                                                                           
 72       1        1        1        1       1       1                        
 73                                                                           
 74                                                                           
 75       1        1        1        1       1       1                        
 76                                                                           
 77       1        1        1                                                 
 78       1        1        1                1       1                        
 79                                                                           
 80       1        1        1        1       1       1       1                
 81       1        1        1        1       1       1                        
 82                                                                           
 83                                                                           
 84       1        1        1                                                 
 85       1        1        1        1       1       1                        
 86                                                                           
 87                                                                           
 88       1        1        1                1                                
 89                                                                           
 90       1        1        1        1       1                                
 91       1        1        1                                                 
 92                                                                           
 93                                                                           
 94                                                                           
 95       1        1        1        1       1       1                        
 96       1        1        1        1       1       1                        
 97                                                                           
 98       1        1        1                1                                
 99       1        1        1        1                                        
100       1        1        1        1       1       1                        
101                                                                           
102       1        1        1                                                 
103                                                                           
104       1        1        1                        1                        
105       1        1        1                                                 
106                                                                           
107                                                                           
108       1        1        1                        1                        
109                                                                           
110       1        1        1        1       1                                
111                                                                           
112       1        1        1                                                 
113                                                                           
114       1        1        1        1       1                                
115       1        1        1        1       1                                
116                                                                           
117       1        1        1        1       1       1                        
118                                                                           
119       1        1        1                                                 
120       1        1        1        1       1       1                        
121                                                                           
122                                                                           
123                                                                           
124                                                                           
125       1        1        1        1       1       1                        
126       1        1        1                                                 
127                                                                           
128       1        1        1        1       1       1                        
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Testing domain decomposition for high CPU counts

Post by MeeLee »

I wonder why with 7 or 11 cores, they don't just do 4 cores + 3 core WUs together?
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Testing domain decomposition for high CPU counts

Post by _r2w_ben »

MeeLee wrote:I wonder why with 7 or 11 cores, they don't just do 4 cores + 3 core WUs together?
Setting up two slots is a manual option for maximum utilization. Faster returns are rewarded more so it's a trade-off between using all cores vs. a higher QRB from more cores per work unit.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Testing domain decomposition for high CPU counts

Post by Neil-B »

Currently for 7 cpu using a single 6 slot is probably better for the science than a 4 slot and a 3 slot .. for 11 cpu an 8 slot and a 3 slot would most likely be best for the science .. but you are free to make own choices :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply