Recompile for better timings

juvann · Post by **juvann** » Wed Apr 22, 2020 10:06 am

I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work

Code: Select all

The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.

What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?
I think this project should release the source code or maybe I better stop making my contribution.

I have another question.
Sometimes I set medium power so some work unit start with 6 cores, but sometimes while the process is running I set light power, so I found this message in science.log

Code: Select all

  #ranks mismatch,
    current program: 3
    checkpoint file: 6

  #PME-ranks mismatch,
    current program: -1
    checkpoint file: 0

Gromacs patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.

What does means?
The result of wu is still good or I'm lost time to process this wu?

Post by **Joe_H** » Wed Apr 22, 2020 4:04 pm

The first one is a message from the Gromacs package that is of more use to an individual researcher running a simulation on a dedicated computer. In a setup like that, the hardware is known and the generated code can be optimized for it.

The CPU folding core is based on Gromacs, and the settings used in creating it are more generic so that it will work successfully on a wide range of hardware and also in VMs. So in some cases a small bit of performance is lost, but with the benefit of working for almost everyone.

Similarly, the second message reflects the change in number of CPU threads being used. A researcher running on a dedicated machine would not be doing this, and the message is merely informational in most cases.

Post by **PantherX** » Thu Apr 23, 2020 7:05 am

Welcome to the F@H Forum juvann,

Please note that the FahCore_a7 and FahCore_22 are built using GROAMCS (for CPU) and OpenMM using OpenCL (for GPU). The building blocks are open source so you can contribute to it if you want to:
GROMACS: https://gitlab.com/gromacs/gromacs
OpenMM: https://github.com/openmm/openmm
OpenCL: https://github.com/KhronosGroup/Khronos ... sources.md

_r2w_ben · Post by **_r2w_ben** » Thu Apr 23, 2020 11:22 am

juvann wrote:I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work
Code: Select all
The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.
What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?

GROMACS tracks time it spent in different parts of the simulation and uses that to re-balance work between CPU cores. RDTSCP provides more accurate timing but it crashed within VMs and was turned off, hence the information message in the log.

Code: Select all

	M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           111.971616         111.972     0.0
 Pair Search distance check            3123.030504       28107.275     0.4
 NxN Ewald Elec. + LJ [F]             94353.287776     7359556.447    93.5
 NxN Ewald Elec. + LJ [V&F]            1022.964144      131962.375     1.7
 NxN Ewald Elec. [F]                    682.964112       41660.811     0.5
 NxN Ewald Elec. [V&F]                    7.368528         618.956     0.0
 1,4 nonbonded interactions             158.656296       14279.067     0.2
 Calc Weights                           443.867016       15979.213     0.2
 Spread Q Bspline                      9469.163008       18938.326     0.2
 Gather F Bspline                      9469.163008       56814.978     0.7
 3D-FFT                               17915.127336      143321.019     1.8
 Solve PME                                6.111232         391.119     0.0
 Reset In Box                             5.850794          17.552     0.0
 CG-CoM                                   5.949960          17.850     0.0
 Bonds                                   23.354276        1377.902     0.0
 Propers                                188.050188       43063.493     0.5
 Impropers                                1.784432         371.162     0.0
 Virial                                   7.550296         135.905     0.0
 Stop-CM                                  1.586656          15.867     0.0
 Calc-Ekin                               59.400434        1603.812     0.0
 Lincs                                   36.570381        2194.223     0.0
 Lincs-Mat                              245.487456         981.950     0.0
 Constraint-V                           165.844035        1326.752     0.0
 Constraint-Vir                           6.584986         158.040     0.0
 Settle                                  30.901091        9981.052     0.1
 (null)                                   0.410300           0.000     0.0
-----------------------------------------------------------------------------
 Total                                                 7872987.115   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 52794.0
 av. #atoms communicated per step for LINCS:  2 x 3840.0

 Average load imbalance: 2.5 %
 Part of the total run time spent waiting due to load imbalance: 1.8 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    1         59       1.865         23.078   0.5
 DD comm. load          4    1          1       0.000          0.005   0.0
 Neighbor search        4    1         60       6.817         84.359   1.7
 Comm. coord.           4    1       1432       4.819         59.637   1.2
 Force                  4    1       1492     342.687       4240.869  83.0
 Wait + Comm. F         4    1       1492       0.814         10.076   0.2
 PME mesh               4    1       1492      43.671        540.449  10.6
 NB X/F buffer ops.     4    1       4356       0.669          8.279   0.2
 Write traj.            4    1          2       0.445          5.509   0.1
 Update                 4    1       1492       2.277         28.176   0.6
 Constraints            4    1       1492       7.386         91.400   1.8
 Comm. energies         4    1        300       0.872         10.790   0.2
 Rest                                           0.678          8.393   0.2
-----------------------------------------------------------------------------
 Total                                        413.000       5111.020 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    1       2984      22.178        274.463   5.4
 PME spread/gather      4    1       2984      14.505        179.505   3.5
 PME 3D-FFT             4    1       2984       3.783         46.810   0.9
 PME 3D-FFT Comm.       4    1       2984       2.313         28.626   0.6
 PME solve Elec         4    1       1492       0.884         10.946   0.2
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:     1652.000      413.000      400.0
                 (ns/day)    (hour/ns)
Performance:        0.624       38.446
Finished mdrun on rank 0 Wed Apr 22 07:20:09 2020

arisu · Post by **arisu** » Sat Mar 29, 2025 5:57 am

I know this is an old topic, but in case anyone comes across this wondering the same thing, RDTSCP was disabled in the a8 core, but the a9 core now has RDTSCP enabled.

arisu · Post by **arisu** » Sat Aug 30, 2025 11:51 pm

After looking at this again, I think it doesn't matter. RDTSCP timings are used for DLB (Dynamic Load Balancing), which only matters on distributed systems that use multiple MPI ranks. FAH does not currently use multiple MPI ranks, so it probably gets no benefit from better timings.

The timings are used for domain decomposition, which is useful when multiple systems with heterogeneous processor speeds are working on a single simulation. RDTSCP is used for cycle-accurate timings to precisely determine how long each MPI rank took so that the slower ranks can be given fewer atoms to work on. Accurate timings make it possible for load balancing to determine exactly how many atoms is needed so that each rank finishes at the same time. Core a8 and a9 do not benefit from that.

Folding Forum

Recompile for better timings

Recompile for better timings

Re: Recompile for better timings

Re: Recompile for better timings

Re: Recompile for better timings

Re: Recompile for better timings

Re: Recompile for better timings