Recompile for better timings

Moderators: Site Moderators, FAHC Science Team

Post Reply
juvann
Posts: 1
Joined: Wed Apr 22, 2020 9:46 am

Recompile for better timings

Post by juvann »

I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work

Code: Select all

The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.
What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?
I think this project should release the source code or maybe I better stop making my contribution.

I have another question.
Sometimes I set medium power so some work unit start with 6 cores, but sometimes while the process is running I set light power, so I found this message in science.log

Code: Select all

  #ranks mismatch,
    current program: 3
    checkpoint file: 6

  #PME-ranks mismatch,
    current program: -1
    checkpoint file: 0

Gromacs patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.
What does means?
The result of wu is still good or I'm lost time to process this wu?
Joe_H
Site Admin
Posts: 7936
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Recompile for better timings

Post by Joe_H »

The first one is a message from the Gromacs package that is of more use to an individual researcher running a simulation on a dedicated computer. In a setup like that, the hardware is known and the generated code can be optimized for it.

The CPU folding core is based on Gromacs, and the settings used in creating it are more generic so that it will work successfully on a wide range of hardware and also in VMs. So in some cases a small bit of performance is lost, but with the benefit of working for almost everyone.

Similarly, the second message reflects the change in number of CPU threads being used. A researcher running on a dedicated machine would not be doing this, and the message is merely informational in most cases.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Recompile for better timings

Post by PantherX »

Welcome to the F@H Forum juvann,

Please note that the FahCore_a7 and FahCore_22 are built using GROAMCS (for CPU) and OpenMM using OpenCL (for GPU). The building blocks are open source so you can contribute to it if you want to:
GROMACS: https://gitlab.com/gromacs/gromacs
OpenMM: https://github.com/openmm/openmm
OpenCL: https://github.com/KhronosGroup/Khronos ... sources.md
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Recompile for better timings

Post by _r2w_ben »

juvann wrote:I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work

Code: Select all

The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.
What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?
GROMACS tracks time it spent in different parts of the simulation and uses that to re-balance work between CPU cores. RDTSCP provides more accurate timing but it crashed within VMs and was turned off, hence the information message in the log.

Code: Select all

	M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 NB VdW [V&F]                           111.971616         111.972     0.0
 Pair Search distance check            3123.030504       28107.275     0.4
 NxN Ewald Elec. + LJ [F]             94353.287776     7359556.447    93.5
 NxN Ewald Elec. + LJ [V&F]            1022.964144      131962.375     1.7
 NxN Ewald Elec. [F]                    682.964112       41660.811     0.5
 NxN Ewald Elec. [V&F]                    7.368528         618.956     0.0
 1,4 nonbonded interactions             158.656296       14279.067     0.2
 Calc Weights                           443.867016       15979.213     0.2
 Spread Q Bspline                      9469.163008       18938.326     0.2
 Gather F Bspline                      9469.163008       56814.978     0.7
 3D-FFT                               17915.127336      143321.019     1.8
 Solve PME                                6.111232         391.119     0.0
 Reset In Box                             5.850794          17.552     0.0
 CG-CoM                                   5.949960          17.850     0.0
 Bonds                                   23.354276        1377.902     0.0
 Propers                                188.050188       43063.493     0.5
 Impropers                                1.784432         371.162     0.0
 Virial                                   7.550296         135.905     0.0
 Stop-CM                                  1.586656          15.867     0.0
 Calc-Ekin                               59.400434        1603.812     0.0
 Lincs                                   36.570381        2194.223     0.0
 Lincs-Mat                              245.487456         981.950     0.0
 Constraint-V                           165.844035        1326.752     0.0
 Constraint-Vir                           6.584986         158.040     0.0
 Settle                                  30.901091        9981.052     0.1
 (null)                                   0.410300           0.000     0.0
-----------------------------------------------------------------------------
 Total                                                 7872987.115   100.0
-----------------------------------------------------------------------------


    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

 av. #atoms communicated per step for force:  2 x 52794.0
 av. #atoms communicated per step for LINCS:  2 x 3840.0

 Average load imbalance: 2.5 %
 Part of the total run time spent waiting due to load imbalance: 1.8 %


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 4 MPI ranks

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Domain decomp.         4    1         59       1.865         23.078   0.5
 DD comm. load          4    1          1       0.000          0.005   0.0
 Neighbor search        4    1         60       6.817         84.359   1.7
 Comm. coord.           4    1       1432       4.819         59.637   1.2
 Force                  4    1       1492     342.687       4240.869  83.0
 Wait + Comm. F         4    1       1492       0.814         10.076   0.2
 PME mesh               4    1       1492      43.671        540.449  10.6
 NB X/F buffer ops.     4    1       4356       0.669          8.279   0.2
 Write traj.            4    1          2       0.445          5.509   0.1
 Update                 4    1       1492       2.277         28.176   0.6
 Constraints            4    1       1492       7.386         91.400   1.8
 Comm. energies         4    1        300       0.872         10.790   0.2
 Rest                                           0.678          8.393   0.2
-----------------------------------------------------------------------------
 Total                                        413.000       5111.020 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME redist. X/F        4    1       2984      22.178        274.463   5.4
 PME spread/gather      4    1       2984      14.505        179.505   3.5
 PME 3D-FFT             4    1       2984       3.783         46.810   0.9
 PME 3D-FFT Comm.       4    1       2984       2.313         28.626   0.6
 PME solve Elec         4    1       1492       0.884         10.946   0.2
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:     1652.000      413.000      400.0
                 (ns/day)    (hour/ns)
Performance:        0.624       38.446
Finished mdrun on rank 0 Wed Apr 22 07:20:09 2020
Post Reply