Page 1 of 1
Recompile for better timings
Posted: Wed Apr 22, 2020 10:06 am
by juvann
I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work
Code: Select all
The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.
What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?
I think this project should release the source code or maybe I better stop making my contribution.
I have another question.
Sometimes I set medium power so some work unit start with 6 cores, but sometimes while the process is running I set light power, so I found this message in science.log
Code: Select all
#ranks mismatch,
current program: 3
checkpoint file: 6
#PME-ranks mismatch,
current program: -1
checkpoint file: 0
Gromacs patchlevel, binary or parallel settings differ from previous run.
Continuation is exact, but not guaranteed to be binary identical.
What does means?
The result of wu is still good or I'm lost time to process this wu?
Re: Recompile for better timings
Posted: Wed Apr 22, 2020 4:04 pm
by Joe_H
The first one is a message from the Gromacs package that is of more use to an individual researcher running a simulation on a dedicated computer. In a setup like that, the hardware is known and the generated code can be optimized for it.
The CPU folding core is based on Gromacs, and the settings used in creating it are more generic so that it will work successfully on a wide range of hardware and also in VMs. So in some cases a small bit of performance is lost, but with the benefit of working for almost everyone.
Similarly, the second message reflects the change in number of CPU threads being used. A researcher running on a dedicated machine would not be doing this, and the message is merely informational in most cases.
Re: Recompile for better timings
Posted: Thu Apr 23, 2020 7:05 am
by PantherX
Welcome to the F@H Forum juvann,
Please note that the FahCore_a7 and FahCore_22 are built using GROAMCS (for CPU) and OpenMM using OpenCL (for GPU). The building blocks are open source so you can contribute to it if you want to:
GROMACS:
https://gitlab.com/gromacs/gromacs
OpenMM:
https://github.com/openmm/openmm
OpenCL:
https://github.com/KhronosGroup/Khronos ... sources.md
Re: Recompile for better timings
Posted: Thu Apr 23, 2020 11:22 am
by _r2w_ben
juvann wrote:I have the latest version of folding@home running on CPU Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz and linux debian 10.
I reading the log for understand better what is running on my PC and I have found this message in science.log of a work
Code: Select all
The current CPU can measure timings more accurately than the code in
GROMACS was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding GROMACS with the GMX_USE_RDTSCP=OFF CMake option.
What does means?
The process FahCore_a7 can be more fast if is recompile with correct flags or have I misunderstood?
GROMACS tracks time it spent in different parts of the simulation and uses that to re-balance work between CPU cores. RDTSCP provides more accurate timing but it crashed within
VMs and was turned off, hence the information message in the log.
Code: Select all
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 111.971616 111.972 0.0
Pair Search distance check 3123.030504 28107.275 0.4
NxN Ewald Elec. + LJ [F] 94353.287776 7359556.447 93.5
NxN Ewald Elec. + LJ [V&F] 1022.964144 131962.375 1.7
NxN Ewald Elec. [F] 682.964112 41660.811 0.5
NxN Ewald Elec. [V&F] 7.368528 618.956 0.0
1,4 nonbonded interactions 158.656296 14279.067 0.2
Calc Weights 443.867016 15979.213 0.2
Spread Q Bspline 9469.163008 18938.326 0.2
Gather F Bspline 9469.163008 56814.978 0.7
3D-FFT 17915.127336 143321.019 1.8
Solve PME 6.111232 391.119 0.0
Reset In Box 5.850794 17.552 0.0
CG-CoM 5.949960 17.850 0.0
Bonds 23.354276 1377.902 0.0
Propers 188.050188 43063.493 0.5
Impropers 1.784432 371.162 0.0
Virial 7.550296 135.905 0.0
Stop-CM 1.586656 15.867 0.0
Calc-Ekin 59.400434 1603.812 0.0
Lincs 36.570381 2194.223 0.0
Lincs-Mat 245.487456 981.950 0.0
Constraint-V 165.844035 1326.752 0.0
Constraint-Vir 6.584986 158.040 0.0
Settle 30.901091 9981.052 0.1
(null) 0.410300 0.000 0.0
-----------------------------------------------------------------------------
Total 7872987.115 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 52794.0
av. #atoms communicated per step for LINCS: 2 x 3840.0
Average load imbalance: 2.5 %
Part of the total run time spent waiting due to load imbalance: 1.8 %
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 4 MPI ranks
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 4 1 59 1.865 23.078 0.5
DD comm. load 4 1 1 0.000 0.005 0.0
Neighbor search 4 1 60 6.817 84.359 1.7
Comm. coord. 4 1 1432 4.819 59.637 1.2
Force 4 1 1492 342.687 4240.869 83.0
Wait + Comm. F 4 1 1492 0.814 10.076 0.2
PME mesh 4 1 1492 43.671 540.449 10.6
NB X/F buffer ops. 4 1 4356 0.669 8.279 0.2
Write traj. 4 1 2 0.445 5.509 0.1
Update 4 1 1492 2.277 28.176 0.6
Constraints 4 1 1492 7.386 91.400 1.8
Comm. energies 4 1 300 0.872 10.790 0.2
Rest 0.678 8.393 0.2
-----------------------------------------------------------------------------
Total 413.000 5111.020 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 4 1 2984 22.178 274.463 5.4
PME spread/gather 4 1 2984 14.505 179.505 3.5
PME 3D-FFT 4 1 2984 3.783 46.810 0.9
PME 3D-FFT Comm. 4 1 2984 2.313 28.626 0.6
PME solve Elec 4 1 1492 0.884 10.946 0.2
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 1652.000 413.000 400.0
(ns/day) (hour/ns)
Performance: 0.624 38.446
Finished mdrun on rank 0 Wed Apr 22 07:20:09 2020