Project: 17201 (Run 0, Clone 2568, Gen 185) reduced threads

Moderators: Site Moderators, FAHC Science Team

Post Reply
HendricksSA
Posts: 339
Joined: Fri Jun 26, 2009 4:34 am

Project: 17201 (Run 0, Clone 2568, Gen 185) reduced threads

Post by HendricksSA »

Have we made a change to server codes to reduce threads to avoid domain decomposition errors? I noticed the fans on my 48 thread machine were idle and found it was only using 9 threads to process this Project: 17201 (Run 0, Clone 2568, Gen 185). Looking through the log I found this. It is the first time I've noticed it.

17:29:44:WU00:FS00:Connecting to assign1.foldingathome.org:80
[93m17:29:45:WARNING:WU00:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration[0m
17:29:45:WU00:FS00:Connecting to assign2.foldingathome.org:80
17:29:45:WU00:FS00:Assigned to work server 128.252.203.10
17:29:45:WU00:FS00:Requesting new work unit for slot 00: READY cpu:48 from 128.252.203.10
17:29:45:WU00:FS00:Connecting to 128.252.203.10:8080
17:29:46:WU00:FS00:Downloading 2.22MiB
17:29:47:WU00:FS00:Download complete
17:29:47:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:17201 run:0 clone:2568 gen:185 core:0xa7 unit:0x000000d480fccb0a5f32fed6c8584138
17:29:47:WU00:FS00:Starting
[93m17:29:47:WARNING:WU00:FS00:AS lowered CPUs from 48 to 9[0m
17:29:47:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1876 -checkpoint 15 -np 9
17:29:47:WU00:FS00:Started FahCore on PID 10403
17:29:47:WU00:FS00:Core PID:10407
17:29:47:WU00:FS00:FahCore 0xa7 started
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Post by bruce »

Yes. GROMACS has some severe limitations in the numbers of threads that can be used on a project. The version used in FAHCore_a7 use some hack-like corrections to enable it to work. THe upcoming version that will be in FAHCore_a8 will change all that but there still will be some similar issues. Domain Decomposition was designed back when CPUs had 1,2,4,8,12,16 cores. Nobody could conceive of trying to use as many threads as you have.

The OpenMM code used on GPUs is entirely different ... and if it reduces the parallelism to avoid specific problems, it doesn't tell you about it.

FAH is considering some improvements for CPUs.

I reommend using several CPU slots while avoiding any numbers with large prime factors.
Joe_H
Site Admin
Posts: 8226
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Post by Joe_H »

The code has been there for some time. Depending on server settings being also all set correctly, if a WU is not available for the requested CPU thread number, a WU that will use fewer will be assigned.

What is unusual here is that the AS went so far down, usually there are WUs available for somewhat higher thread counts. It would be more normal to see a reduction to 32 or 24 for example.

The A8 folding core is less tied to domain decomposition numbers, there are projects waiting to be created once new servers are ready. Some smaller projects may not use a large number of threads as efficiently, but they will still process. But for right now there is a bit of a shortage of CPU WUs, especially for higher thread counts.
Image
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Post by PantherX »

There was discussions about what to do when a CPU with X CPUs requests work and there wasn't any. Thus, instead of idle CPU, the idea was that it would assign you a WU for Y CPUs where Y < X thus, you can still contribute. As Joe_H mentioned, this is due to a shortage of CPU WUs under some conditions which will hopefully be resolved soon.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply