Page 1 of 1

Sometimes only using 21 cores out of 24.

Posted: Thu Jul 09, 2020 12:58 pm
by Hopfgeist
Hi there.

I have an old-ish but still relatively powerful server with 2 6-core CPUs with 2 threads per core, so 24 threads total.

Sometimes the corewrapper gets started with -np 23 instead of 24, although the number of CPUs is configured as "-1": "use as many as possible". Performance is also set to "full", or "100%", so it should use all of the cores. I explicitly disabled GPU folding; the machine has no GPU.

It tells me:

Code: Select all

WARNING:WU00:FS00:AS lowered CPUs from 24 to 23
but does not give any indication why. The system is mostly idle, with light domestic file server duties measured in a few GB/day.

Now 23 threads is rejected because it is prime, and 22 is rejected because it has a large prime factor, so the core proper is started with "only" 21 threads. It is still fine, it is not a large reduction in performance, but it seems that it should be running on 24 cores, and sometimes does, but not at other times.

I do all configuration with command-line options, and don't use any graphical control frontend for various reasons.

Maybe this has been asked before, but I cannot search for it: "24" and "23" are rejected as search terms, because they are too short.


Cheers,
HG.

Re: Sometimes only using 21 cores out of 24.

Posted: Thu Jul 09, 2020 3:28 pm
by Joe_H
Does the system have a GPU? One CPU thread would be reserved for that.

But one more likely cause is that sometimes the slider setting is detected as Medium instead of Full, that reduces the thread count used by one. Another possibility is at the time of the WU request there were none available for a CPU thread count of 24, and the server reduced to the next available number and let the client do the final adjustments.

CPU thread assignments get a little involved in the Gromacs code used in the folding core for counts above 16-18. Depending on the dimensions of the bounding box and the estimates made by the code for the usage needs of PME calculations, some threads will be assigned to PME. So for example, 24 might get split to 20 threads for the main calculations and 4 for PME. 20 is sometimes problematic as depending on the dimensions a multiple of 5 is not usable. So they can only control avoiding assignment that would be done this way by excluding 24 as a usable thread count.

The ones running at a thread count of 21 are probably split with 18 for the main and 3 threads for PME.

There has been some discussion under "domain decomposition", looks for posts by _r2w_ben. He has examined the Gromacs code involved and developed some testing methods to determine usable thread counts for some projects and their bounding boxes.

Re: Sometimes only using 21 cores out of 24.

Posted: Thu Jul 09, 2020 3:48 pm
by JimboPalmer
Welcome to Folding@Home!

From a pragmatic level, if the Client want to give you 21 thread Work Units, I would hard code that into my configuration, then add a second CPU slot with 3 threads.

That makes the Client happy, and uses all Your CPUs.

Re: Sometimes only using 21 cores out of 24.

Posted: Thu Jul 09, 2020 4:39 pm
by Neil-B
An AS reduction is a relatively rare occurrence - permanently adjusting count to 21 (which isn't actually a particularly "good number" as multiple of 5) is probably not warranted tbh ... currently reels bad as the there is a Project that has slipped through beta without 24 being spotted as an issue - this would normally be spotted and the project not released to 24 slots - and a 24 is much better for the science than two slots combinations.

Re: Sometimes only using 21 cores out of 24.

Posted: Thu Jul 09, 2020 11:10 pm
by bruce
I don't recommend two slots of 21 + 3 but I do recommend two slots. I'd probably settle on 12 + 12.

Set the power slider to the maximum setting and then forget it. Do all your configuration settings yourself with FAHControl.

Re: Sometimes only using 21 cores out of 24.

Posted: Fri Jul 10, 2020 6:34 am
by Hopfgeist
Thanks for the suggestions.

Neil-B,

now that you mention it, it seemed to start becoming much more frequent after I set my client-type to "beta". I will set it back to "advanced" after the current WU and see what happens.

I now found that it seems to be a problem on the client side: In the cases I mentioned it seems that the client announces itself to the server as having only 23 cores:

Code: Select all

Requesting new work unit for slot 00: RUNNING cpu:23 from XX.XX.XX.XX
But the configuration says explicitly "cpus=-1".

Here's the command-line I use to start the client:

Code: Select all

FAHClient --user=XXXXXX --team=XXXXXX \
        --passkey=XXXXXX \
        --gpu=false --smp=true \
        --cpus=-1 \
        --client-type advanced
Maybe I will try setting it to 24 explicitly after the current WU finishes.

The core itself correctly identifies all 24 CPUs:

Code: Select all

06:15:52:WU00:FS00:0xa7:************************************ System ************************************
06:15:52:WU00:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
06:15:52:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
06:15:52:WU00:FS00:0xa7:       CPUs: 24

Cheers,
HG

Re: Sometimes only using 21 cores out of 24.

Posted: Fri Jul 10, 2020 6:56 am
by Joe_H
From the command line I would recommend changing the '-1' to '24'.

The -1 in the client settings with a standard install leaves the number of CPU threads to the software and its defaults. One of those defaults is a Medium setting which uses one less thread than the total available.

Re: Sometimes only using 21 cores out of 24.

Posted: Fri Jul 10, 2020 7:05 am
by Hopfgeist
Joe_H wrote:From the command line I would recommend changing the '-1' to '24'.

The -1 in the client settings with a standard install leaves the number of CPU threads to the software and its defaults. One of those defaults is a Medium setting which uses one less thread than the total available.
Ah, that might explain it. However, that contradicts the online help of the command-line client, which says:

Code: Select all

  cpus <integer=-1>
    How many CPUs a slot should use. <= 0 will use all the CPUs detected in the
    system.
... which also agrees with the behaviour most of the time, which is to announce 24 cores to the server.

But indeed, setting it to 24 explicitly seems to work. I'll keep watching it.

Thanks,
HG.