A8 efficiency - possible unexpected improvement

Moderators: Site Moderators, FAHC Science Team

BobWilliams757
Posts: 519
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

A8 efficiency - possible unexpected improvement

Post by BobWilliams757 »

Just putting out feelers to see if this is unique to my setup or if it is happening on other systems.

In the past I don't often run CPU and GPU folding at the same time. With the integrated graphics on my Ryzen 2400G the overall PPD doesn't go up much, with CPU throughput just reducing GPU throughput. Overall both lose PPD as compared to just running one or the other.

But with the new A8 core, it seems impact on overall throughput is much less. My rise in overall PPD is much greater than before, even after I factor in that the A8 core seems to be delivering higher PPD than the A7 core.

With shared memory for CPU and GPU in this system, this seems to be linked to the new core and how it uses memory. I'm not sure it if was an intended improvement, but it sure is welcome either way.



Are others with conventional (non APU) systems seeing a similar trend?
Fold them if you get them!
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: A8 efficiency - possible unexpected improvement

Post by bruce »

FAHCore_a8 is using a much newer version of GROMACS than FAHCore_a7 and it contains a number of enhancements soit is expected to run faster.

The methods of allocating and sharing memory depend on who designed the hardware so seemingly simple statements about small changes to performance may not apply to somebody with different hardware.
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

I am seeing significantly worse performance with core 0xa8. A 128 physical core 2nd gen EPYC machine that gives 4M - 5M PPD on core 0xa7 with TPF in the 10-20 second range is instead delivering 100K PPD with TPF in the 4-5 minute range. That machine has two 64c folding slots. Since FAH does not have any sort of CPU affinity capabilities, each slot ends up being spread across the two sockets (each socket is one NUMA node), so if there is significantly more data sharing in the new 0xa8 core, that's not good for two socket machines.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: A8 efficiency - possible unexpected improvement

Post by Neil-B »

Check that the AS/WS isn't assigning lower thread count WUs to your slots ... current low availability of CPU WUs means my slots (a 32 and a 24) are seeing 10 thread WUs assigned occasionally - this would impact PPD and only shows up if you look at logs - web and advanced controls just look as if the slot is running very slot ... not an issue with the core - more a lack of CPU WUs for larger slots (at least one project has a max 10 thread assignment rule at the moment).
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: A8 efficiency - possible unexpected improvement

Post by PantherX »

Welcome to the F@H Forum AMDEPYC,

Can you please post the log file? Ensure you include the first 100 lines which will inform us of what the system configuration is and what the client settings are. If you require guidance, please view this topic: viewtopic.php?f=24&t=26036

In all our testing, FahCore_a8 has always provided more performance than FahCore_a7. However, we haven't tested it with 128 physical CPUs so the log file would provide some additional information to help us troubleshoot :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

My normal slot config is two CPU slots, each with 64 cores. Currently one of those is paused and I have a mix of sizes 4/8/12/16 CPU running to characterize this a bit more. Below is a grab from core 0xa8 on the 12 CPU slot. If not obvious from the log, SMT is disabled so this is a dual proc machine with total 128 physical cores.

Code: Select all

15:43:02:WU05:FS05:Connecting to assign1.foldingathome.org:80
15:43:02:WU05:FS05:Assigned to work server 178.174.196.138
15:43:02:WU05:FS05:Requesting new work unit for slot 05: READY cpu:12 from 178.174.196.138
15:43:02:WU05:FS05:Connecting to 178.174.196.138:8080
15:43:02:WU05:FS05:Downloading 2.05MiB
15:43:04:WU05:FS05:Download complete
15:43:04:WU05:FS05:Received Unit: id:05 state:DOWNLOAD error:NO_ERROR project:16812 run:2 clone:960 gen:39 core:0xa8 unit:0x0000002cb2aec48a5f74f1468924fbf1
15:43:04:WU05:FS05:Starting
15:43:04:WU05:FS05:Running FahCore: /usr/bin/FAHCoreWrapper /home/amd/FAH/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 05 -suffix 01 -version 706 -lifeline 2490 -checkpoint 5 -np 12
15:43:04:WU05:FS05:Started FahCore on PID 238066
15:43:04:WU05:FS05:Core PID:238070
15:43:04:WU05:FS05:FahCore 0xa8 started
15:43:04:WU05:FS05:0xa8:*********************** Log Started 2020-10-31T15:43:04Z ***********************
15:43:04:WU05:FS05:0xa8:************************** Gromacs Folding@home Core ***************************
15:43:04:WU05:FS05:0xa8:       Core: Gromacs
15:43:04:WU05:FS05:0xa8:       Type: 0xa8
15:43:04:WU05:FS05:0xa8:    Version: 0.0.9
15:43:04:WU05:FS05:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:43:04:WU05:FS05:0xa8:  Copyright: 2020 foldingathome.org
15:43:04:WU05:FS05:0xa8:   Homepage: https://foldingathome.org/
15:43:04:WU05:FS05:0xa8:       Date: Oct 28 2020
15:43:04:WU05:FS05:0xa8:       Time: 22:15:07
15:43:04:WU05:FS05:0xa8:   Compiler: GNU 8.3.0
15:43:04:WU05:FS05:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:43:04:WU05:FS05:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
15:43:04:WU05:FS05:0xa8:   Platform: linux2 4.15.0-108-generic
15:43:04:WU05:FS05:0xa8:       Bits: 64
15:43:04:WU05:FS05:0xa8:       Mode: Release
15:43:04:WU05:FS05:0xa8:       SIMD: avx2_256
15:43:04:WU05:FS05:0xa8:     OpenMP: ON
15:43:04:WU05:FS05:0xa8:       CUDA: OFF
15:43:04:WU05:FS05:0xa8:       Args: -dir 05 -suffix 01 -version 706 -lifeline 238066 -checkpoint 5 -np
15:43:04:WU05:FS05:0xa8:             12
15:43:04:WU05:FS05:0xa8:************************************ libFAH ************************************
15:43:04:WU05:FS05:0xa8:       Date: Oct 28 2020
15:43:04:WU05:FS05:0xa8:       Time: 22:12:00
15:43:04:WU05:FS05:0xa8:   Compiler: GNU 8.3.0
15:43:04:WU05:FS05:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:43:04:WU05:FS05:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
15:43:04:WU05:FS05:0xa8:   Platform: linux2 4.15.0-108-generic
15:43:04:WU05:FS05:0xa8:       Bits: 64
15:43:04:WU05:FS05:0xa8:       Mode: Release
15:43:04:WU05:FS05:0xa8:************************************ CBang *************************************
15:43:04:WU05:FS05:0xa8:       Date: Oct 28 2020
15:43:04:WU05:FS05:0xa8:       Time: 22:11:46
15:43:04:WU05:FS05:0xa8:   Compiler: GNU 8.3.0
15:43:04:WU05:FS05:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
15:43:04:WU05:FS05:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
15:43:04:WU05:FS05:0xa8:   Platform: linux2 4.15.0-108-generic
15:43:04:WU05:FS05:0xa8:       Bits: 64
15:43:04:WU05:FS05:0xa8:       Mode: Release
15:43:04:WU05:FS05:0xa8:************************************ System ************************************
15:43:04:WU05:FS05:0xa8:        CPU: AMD EPYC 7H12 64-Core Processor
15:43:04:WU05:FS05:0xa8:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
15:43:04:WU05:FS05:0xa8:       CPUs: 128
15:43:04:WU05:FS05:0xa8:     Memory: 503.74GiB
15:43:04:WU05:FS05:0xa8:Free Memory: 498.97GiB
15:43:04:WU05:FS05:0xa8:    Threads: POSIX_THREADS
15:43:04:WU05:FS05:0xa8: OS Version: 5.8
15:43:04:WU05:FS05:0xa8:Has Battery: false
15:43:04:WU05:FS05:0xa8: On Battery: false
15:43:04:WU05:FS05:0xa8: UTC Offset: -4
15:43:04:WU05:FS05:0xa8:        PID: 238070
15:43:04:WU05:FS05:0xa8:        CWD: /home/amd/FAH/work
15:43:04:WU05:FS05:0xa8:********************************************************************************
15:43:04:WU05:FS05:0xa8:Project: 16812 (Run 2, Clone 960, Gen 39)
15:43:04:WU05:FS05:0xa8:Unit: 0x0000002cb2aec48a5f74f1468924fbf1
15:43:04:WU05:FS05:0xa8:Reading tar file core.xml
15:43:04:WU05:FS05:0xa8:Reading tar file frame39.tpr
15:43:04:WU05:FS05:0xa8:Digital signatures verified
15:43:04:WU05:FS05:0xa8:Calling: mdrun -c frame39.gro -s frame39.tpr -x frame39.xtc -cpt 5 -nt 12 -ntmpi 1
15:43:04:WU05:FS05:0xa8:Steps: first=19500000 total=20000000
15:43:06:WU05:FS05:0xa8:Completed 1 out of 500000 steps (0%)
15:44:01:WU05:FS05:0xa8:Completed 5000 out of 500000 steps (1%)
15:44:56:WU05:FS05:0xa8:Completed 10000 out of 500000 steps (2%)
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

A really quick fix (for me) would be to enable the GROMACS mdrun -pin -pinoffset -pinstride and -ntomp (and allow overriding the -ntmpi) - these could be submitted via slot extra-core-args. On EPYC pinning is a must (to take advantage of the huge L3) and if there is shared memory, shared data, sync barriers (OpenMP has a lot), etc. then it's also handy to co-locate threads by L3 cache (if small), or by NUMA node (if larger). Most hybrid HPC applications at a scale of 64+ cores tend to perform best with 4 OpenMP threads per MPI rank (matching the number of physical cores per L3 cache) and mapping each rank to an L3.

For comparison, the 64 core slot just landed an 0xa7 WU:

Code: Select all

17:06:28:WU02:FS00:Connecting to assign1.foldingathome.org:80
17:06:28:WU02:FS00:Assigned to work server 128.252.203.9
17:06:28:WU02:FS00:Requesting new work unit for slot 00: READY cpu:64 from 128.252.203.9
17:06:28:WU02:FS00:Connecting to 128.252.203.9:8080
17:06:29:WU02:FS00:Downloading 8.14MiB
17:06:30:WU02:FS00:Download complete
17:06:30:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:13821 run:225 clone:5 gen:18 core:0xa7 unit:0x0000001480fccb095e73d2f04caa75eb
17:06:30:WU02:FS00:Starting
17:06:30:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/amd/FAH/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 02 -suffix 01 -version 706 -lifeline 2490 -checkpoint 5 -np 64
17:06:30:WU02:FS00:Started FahCore on PID 243411
17:06:30:WU02:FS00:Core PID:243415
17:06:30:WU02:FS00:FahCore 0xa7 started
17:06:30:WU02:FS00:0xa7:*********************** Log Started 2020-11-01T17:06:30Z ***********************
17:06:30:WU02:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
17:06:30:WU02:FS00:0xa7:       Type: 0xa7
17:06:30:WU02:FS00:0xa7:       Core: Gromacs
17:06:30:WU02:FS00:0xa7:       Args: -dir 02 -suffix 01 -version 706 -lifeline 243411 -checkpoint 5 -np
17:06:30:WU02:FS00:0xa7:             64
17:06:30:WU02:FS00:0xa7:************************************ CBang *************************************
17:06:30:WU02:FS00:0xa7:       Date: Nov 27 2019
17:06:30:WU02:FS00:0xa7:       Time: 11:26:54
17:06:30:WU02:FS00:0xa7:   Revision: d25803215b59272441049dfa05a0a9bf7a6e3c48
17:06:30:WU02:FS00:0xa7:     Branch: master
17:06:30:WU02:FS00:0xa7:   Compiler: GNU 8.3.0
17:06:30:WU02:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:06:30:WU02:FS00:0xa7:             -fno-pie -fPIC
17:06:30:WU02:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
17:06:30:WU02:FS00:0xa7:       Bits: 64
17:06:30:WU02:FS00:0xa7:       Mode: Release
17:06:30:WU02:FS00:0xa7:************************************ System ************************************
17:06:30:WU02:FS00:0xa7:        CPU: AMD EPYC 7H12 64-Core Processor
17:06:30:WU02:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:06:30:WU02:FS00:0xa7:       CPUs: 128
17:06:30:WU02:FS00:0xa7:     Memory: 503.74GiB
17:06:30:WU02:FS00:0xa7:Free Memory: 499.83GiB
17:06:30:WU02:FS00:0xa7:    Threads: POSIX_THREADS
17:06:30:WU02:FS00:0xa7: OS Version: 5.8
17:06:30:WU02:FS00:0xa7:Has Battery: false
17:06:30:WU02:FS00:0xa7: On Battery: false
17:06:30:WU02:FS00:0xa7: UTC Offset: -5
17:06:30:WU02:FS00:0xa7:        PID: 243415
17:06:30:WU02:FS00:0xa7:        CWD: /home/amd/FAH/work
17:06:30:WU02:FS00:0xa7:******************************** Build - libFAH ********************************
17:06:30:WU02:FS00:0xa7:    Version: 0.0.19
17:06:30:WU02:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:06:30:WU02:FS00:0xa7:  Copyright: 2019 foldingathome.org
17:06:30:WU02:FS00:0xa7:   Homepage: https://foldingathome.org/
17:06:30:WU02:FS00:0xa7:       Date: Nov 26 2019
17:06:30:WU02:FS00:0xa7:       Time: 00:41:42
17:06:30:WU02:FS00:0xa7:   Revision: d5b5c747532224f986b7cd02c968ed9a20c16d6e
17:06:30:WU02:FS00:0xa7:     Branch: master
17:06:30:WU02:FS00:0xa7:   Compiler: GNU 8.3.0
17:06:30:WU02:FS00:0xa7:    Options: -std=c++11 -ffunction-sections -fdata-sections -O3 -funroll-loops
17:06:30:WU02:FS00:0xa7:             -fno-pie
17:06:30:WU02:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
17:06:30:WU02:FS00:0xa7:       Bits: 64
17:06:30:WU02:FS00:0xa7:       Mode: Release
17:06:30:WU02:FS00:0xa7:************************************ Build *************************************
17:06:30:WU02:FS00:0xa7:       SIMD: avx_256
17:06:30:WU02:FS00:0xa7:********************************************************************************
17:06:30:WU02:FS00:0xa7:Project: 13821 (Run 225, Clone 5, Gen 18)
17:06:30:WU02:FS00:0xa7:Unit: 0x0000001480fccb095e73d2f04caa75eb
17:06:30:WU02:FS00:0xa7:Reading tar file core.xml
17:06:30:WU02:FS00:0xa7:Reading tar file frame18.tpr
17:06:30:WU02:FS00:0xa7:Digital signatures verified
17:06:30:WU02:FS00:0xa7:Calling: mdrun -s frame18.tpr -o frame18.trr -x frame18.xtc -cpt 5 -nt 64
17:06:30:WU02:FS00:0xa7:Steps: first=2250000 total=125000
17:06:33:WU02:FS00:0xa7:Completed 1 out of 125000 steps (0%)
17:06:45:WU02:FS00:0xa7:Completed 1250 out of 125000 steps (1%)
17:06:54:WU02:FS00:0xa7:Completed 2500 out of 125000 steps (2%)
17:07:03:WU02:FS00:0xa7:Completed 3750 out of 125000 steps (3%)
Joe_H
Site Admin
Posts: 7936
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: A8 efficiency - possible unexpected improvement

Post by Joe_H »

They ran into some issues with the A8 core compiled with a different '-ntmpi' setting than is currently released. For now they have released with that setting of '-ntmpi 1' as it works on all systems, though perhaps not efficiently on large server systems. Testing is going on for using a different ntmpi setting, when they have worked through the issues a version including that change should be released.

One issue you may run into with the A7 core is domain decomposition problems. Not all thread counts work, so that can complicate assignments. One of the goals for the A8 core is to handle domain decomposition a bit more gracefully.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

Finally got a core 0xa8 WU on the 64 CPU folding slot. TPF varies significantly over the run. This is most likely the result of memory and CPUs being scattered around due to no affinity, then the scheduler or automatic NUMA balancing trying to make things better. Only 10 threads launched though even though the slot has 64 CPUs.

Code: Select all

17:22:09:WU01:FS00:Connecting to assign1.foldingathome.org:80
17:22:09:WU01:FS00:Assigned to work server 129.32.209.204
17:22:09:WU01:FS00:Requesting new work unit for slot 00: READY cpu:64 from 129.32.209.204
17:22:09:WU01:FS00:Connecting to 129.32.209.204:8080
17:22:09:WU01:FS00:Downloading 54.50KiB
17:22:09:WU01:FS00:Download complete
17:22:09:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:16926 run:0 clone:30 gen:15 core:0xa8 unit:0x0000000f8120d1cc5f7dfda0755570c3
17:22:09:WU01:FS00:Starting
17:22:09:WARNING:WU01:FS00:AS lowered CPUs from 64 to 10
17:22:09:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/amd/FAH/cores/cores.foldingathome.org/lin/64bit-avx2-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 2490 -checkpoint 5 -np 10
17:22:09:WU01:FS00:Started FahCore on PID 243493
17:22:09:WU01:FS00:Core PID:243497
17:22:09:WU01:FS00:FahCore 0xa8 started
17:22:10:WU01:FS00:0xa8:*********************** Log Started 2020-11-01T17:22:09Z ***********************
17:22:10:WU01:FS00:0xa8:************************** Gromacs Folding@home Core ***************************
17:22:10:WU01:FS00:0xa8:       Core: Gromacs
17:22:10:WU01:FS00:0xa8:       Type: 0xa8
17:22:10:WU01:FS00:0xa8:    Version: 0.0.9
17:22:10:WU01:FS00:0xa8:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:22:10:WU01:FS00:0xa8:  Copyright: 2020 foldingathome.org
17:22:10:WU01:FS00:0xa8:   Homepage: https://foldingathome.org/
17:22:10:WU01:FS00:0xa8:       Date: Oct 28 2020
17:22:10:WU01:FS00:0xa8:       Time: 22:15:07
17:22:10:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:10:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:10:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:10:WU01:FS00:0xa8:   Platform: linux2 4.15.0-108-generic
17:22:10:WU01:FS00:0xa8:       Bits: 64
17:22:10:WU01:FS00:0xa8:       Mode: Release
17:22:10:WU01:FS00:0xa8:       SIMD: avx2_256
17:22:10:WU01:FS00:0xa8:     OpenMP: ON
17:22:10:WU01:FS00:0xa8:       CUDA: OFF
17:22:10:WU01:FS00:0xa8:       Args: -dir 01 -suffix 01 -version 706 -lifeline 243493 -checkpoint 5 -np
17:22:10:WU01:FS00:0xa8:             10
17:22:10:WU01:FS00:0xa8:************************************ libFAH ************************************
17:22:10:WU01:FS00:0xa8:       Date: Oct 28 2020
17:22:10:WU01:FS00:0xa8:       Time: 22:12:00
17:22:10:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:10:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:10:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie
17:22:10:WU01:FS00:0xa8:   Platform: linux2 4.15.0-108-generic
17:22:10:WU01:FS00:0xa8:       Bits: 64
17:22:10:WU01:FS00:0xa8:       Mode: Release
17:22:10:WU01:FS00:0xa8:************************************ CBang *************************************
17:22:10:WU01:FS00:0xa8:       Date: Oct 28 2020
17:22:10:WU01:FS00:0xa8:       Time: 22:11:46
17:22:10:WU01:FS00:0xa8:   Compiler: GNU 8.3.0
17:22:10:WU01:FS00:0xa8:    Options: -faligned-new -std=c++14 -fsigned-char -ffunction-sections
17:22:10:WU01:FS00:0xa8:             -fdata-sections -O3 -funroll-loops -fno-pie -fPIC
17:22:10:WU01:FS00:0xa8:   Platform: linux2 4.15.0-108-generic
17:22:10:WU01:FS00:0xa8:       Bits: 64
17:22:10:WU01:FS00:0xa8:       Mode: Release
17:22:10:WU01:FS00:0xa8:************************************ System ************************************
17:22:10:WU01:FS00:0xa8:        CPU: AMD EPYC 7H12 64-Core Processor
17:22:10:WU01:FS00:0xa8:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
17:22:10:WU01:FS00:0xa8:       CPUs: 128
17:22:10:WU01:FS00:0xa8:     Memory: 503.74GiB
17:22:10:WU01:FS00:0xa8:Free Memory: 499.83GiB
17:22:10:WU01:FS00:0xa8:    Threads: POSIX_THREADS
17:22:10:WU01:FS00:0xa8: OS Version: 5.8
17:22:10:WU01:FS00:0xa8:Has Battery: false
17:22:10:WU01:FS00:0xa8: On Battery: false
17:22:10:WU01:FS00:0xa8: UTC Offset: -5
17:22:10:WU01:FS00:0xa8:        PID: 243497
17:22:10:WU01:FS00:0xa8:        CWD: /home/amd/FAH/work
17:22:10:WU01:FS00:0xa8:********************************************************************************
17:22:10:WU01:FS00:0xa8:Project: 16926 (Run 0, Clone 30, Gen 15)
17:22:10:WU01:FS00:0xa8:Unit: 0x0000000f8120d1cc5f7dfda0755570c3
17:22:10:WU01:FS00:0xa8:Reading tar file core.xml
17:22:10:WU01:FS00:0xa8:Reading tar file frame15.tpr
17:22:10:WU01:FS00:0xa8:Digital signatures verified
17:22:10:WU01:FS00:0xa8:Calling: mdrun -c frame15.gro -s frame15.tpr -x frame15.xtc -cpt 5 -nt 10 -ntmpi 1
17:22:10:WU01:FS00:0xa8:Steps: first=750000000 total=800000000
17:22:10:WU01:FS00:0xa8:Completed 1 out of 50000000 steps (0%)
17:25:45:WU01:FS00:0xa8:Completed 500000 out of 50000000 steps (1%)
17:30:20:WU01:FS00:0xa8:Completed 1000000 out of 50000000 steps (2%)
17:35:16:WU01:FS00:0xa8:Completed 1500000 out of 50000000 steps (3%)
17:40:12:WU01:FS00:0xa8:Completed 2000000 out of 50000000 steps (4%)
17:43:56:WU01:FS00:0xa8:Completed 2500000 out of 50000000 steps (5%)
17:46:20:WU01:FS00:0xa8:Completed 3000000 out of 50000000 steps (6%)
17:48:31:WU01:FS00:0xa8:Completed 3500000 out of 50000000 steps (7%)
17:50:42:WU01:FS00:0xa8:Completed 4000000 out of 50000000 steps (8%)
17:52:53:WU01:FS00:0xa8:Completed 4500000 out of 50000000 steps (9%)
17:55:07:WU01:FS00:0xa8:Completed 5000000 out of 50000000 steps (10%)
17:57:20:WU01:FS00:0xa8:Completed 5500000 out of 50000000 steps (11%)
17:59:33:WU01:FS00:0xa8:Completed 6000000 out of 50000000 steps (12%)
18:01:47:WU01:FS00:0xa8:Completed 6500000 out of 50000000 steps (13%)
18:04:00:WU01:FS00:0xa8:Completed 7000000 out of 50000000 steps (14%)
18:06:13:WU01:FS00:0xa8:Completed 7500000 out of 50000000 steps (15%)
Joe_H
Site Admin
Posts: 7936
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: A8 efficiency - possible unexpected improvement

Post by Joe_H »

Code: Select all

17:22:09:WARNING:WU01:FS00:AS lowered CPUs from 64 to 10[
The AS did not have WUs that could use 64 threads and assigned you an available WU capped at 10 threads.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: A8 efficiency - possible unexpected improvement

Post by Neil-B »

Yup ... as mentioned earlier ... it will improve as more cpu projects get released but for now expect some of these slot count reductions to happen.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

What do you guys need - more 10-12 CPU slots or fewer bigger slots? Or mix of both?
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: A8 efficiency - possible unexpected improvement

Post by PantherX »

It's a tough question to answer but you can make an informed choice:

Ideally
Few large CPU Slots (you can turn on HT too in this case), maybe combination of 32/64/128 etc.
Currently, there's a shortage of CPU Projects and large projects that scales to 128+ CPUs is very limited.

Set-&-forget
Multiple CPU Slots (using physical Cores without HT), maybe a combination of 8/12/16 etc.
Apart from the CPU WU shortage, there's more likelihood of CPU Projects in the "low" CPU range than in the high CPU range.

Related question, is this system a dedicated folding system or does it run other applications too? Is it on 24/7? By having a bit more information about how this system of yours is going to be used, we can potentially find an optimum configuration that simply works the best for you :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: A8 efficiency - possible unexpected improvement

Post by Neil-B »

It depends how you look at it tbh .. this is hopefully a short term issue .. you could create more lower count slots but the pauses in assignment of even these indicates more resource than work so still might not fully use you kit .. what is really needed are more cpu projects but these take time to generate .. strangely this is a good issue to have in some ways as it means the donated resources are able to fold projects as quick as the researchers can get them out :)
Last edited by Neil-B on Mon Nov 02, 2020 12:32 pm, edited 1 time in total.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
AMDEPYC
Posts: 6
Joined: Sat Oct 31, 2020 11:17 am

Re: A8 efficiency - possible unexpected improvement

Post by AMDEPYC »

Replying to @PantherX - I have two systems dedicated for folding:
Ryzen 9 3950X + RX560 GPU - one CPU slot, 12 cores (SMT off), one GPU slot
2P EPYC 7601 - two CPU slots, 32 cores (SMT off)

The other two systems are used throughout the day:
Ryzen 9 3950X + RX5500XT - one CPU slot, 12 cores (SMT off), one GPU slot
2P EPYC 7H12 - two CPU slots, 64 cores (SMT off)

All are configured as client-type beta. I usually have enough runway to finish in-process jobs on the above two systems when I need them so I leave them folding 24x7 as well. Plus as it's starting to get into winter, the ~2kW of total power from the above means less running the heater, though it wasn't too friendly on the electric bill this summer.
Post Reply