Page 4 of 5

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:03 am
by PeterGarlic
ajm wrote:A few thoughts. With a 4 CPUs system, the overhead will probably always result in near 100% utilization. I don't have experience with so few threads, my smaller processor is a 4C/8T running on 6 threads (at 100%). In fact, each time I use about 2/3 to 3/4 of the available threads, the overall utilization is close to or at the maximum. I see more nuances only on a 32C/64T, but there too, above 48 threads, the overall utilization is stuck at 95-100%, and adding more CPUs to FAH will essentially raise the temperature. If this is a problem, it would be better to define 12, 24 or 32 CPUs slots in systems with respectively 16, 28 and 36 available threads. It would probably also yield more points (= more and better science) because the results would be delivered faster to FAH's servers. But above all, it would be more flexible to use because of that speed - if the hardware is needed for something else, you'll always find several running WUs that can be finished within an hour or juste minutes, then the VM can be saved and shut down. No worries. With 4 CPUs systems, not to mention running only 1 or 2 CPUS, WUs will last for days. Of course, you can pause them and restore them later too, if you need the hardware, but you would always be close to the maximum duration admissible of all WUs and it would be a nightmare (or a solid piece of programming work) keeping track of hundreds of such VM with their duration and deadlines. With 36T systems, you would have much less of them and you could play with their availability much more easily, without much concern about deadlines, all the more because the stopping of only one or two of those would already provide a fair amount of computing power. Two 36T systems stopped give you 72T to work with. With 4T systems, you would have to manage 18 VM to get there. Even if you automatize the whole thing (which will take time), you'll probably end up with endless issues to spot and resolve.
A 4 vCPU VM was used just as case study before start the real deployment. We have 3x4 vCPU instances and in a week they processed more than 250 WU and is not bad considering how many time we reconfigured, stopped, restarted, etc. these VMs.

From a pratical point of view each vCPU is mapped on one hypervisor based on 2 x Xeon 14 core/28 Threads

But there are many question that require one answer, considering that the clusters will run with a mixed load (Customers/FAHclient):

Which will be the best configuration to have the best processing performances on fah VMs?
- under analysis
The best fah configuration can create resource contention?
- waiting previous task
How to avoid to goes out of resources if the customer load increase?
- Resource monitoring, FAH power control, script automation for pause/resume and dynamic VM allocation/removal.

The reason why we are testing small VMs is based on cluster permormances: A VM with a reduced amount of vCPU (4 as example) is more easy to be placed on the hypervisor scheduler than a 32 vCPU VM and also have less overhead, double process requests, core spin/exits, etc. (looking around about KVM resource contention there are many info about)

Considering this point of view probably on a long term hundreds small VMs will process more WU than few big VMs (right?) and the cluster resources will be used in a better way (right!).

We started testing with a 4 vCPU configuration because is easy to control and analyze and is not sure that this will be the target configuration.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:10 am
by NRT_AntiKytherA
F@H is set at a low scheduling priority by default so any other running task takes precedence, your customers should never run out of resources when their loads on your servers increase.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:22 am
by PeterGarlic
Neil-B wrote:One thing to consider depending upon how broad your thoughts of "dynamic" are ... especially if you have any thoughts as to dynamically resizing (from a vCPU perspective) VMs during the course of folding a WU ... and that is that whilst it is possible to reduce the number of threads/cores/vCPUs working on a WU during folding it is not possible to increase this above the level it was set at when downloading.

A WU downloaded to work on a 6 thread slot can de reduced to 4 but not increased to 12 ... The VM could be, bet the FAH would only use 6 until the next WU is downloaded.
Thanks Neil-B,
we considered this point.

The basic idea is to run VMs with a full load (leaving out 1 vCPU from SLOT to avoid cluster alarms if I will find the way to use power control otherwise we have to use power=medium)
When the customer load groove send a command to move power down and return up when the cluster load decrease.
if is still not enough start pausing clients and shutdown the VMs that will be restarted on late time.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:24 am
by PeterGarlic
NRT_AntiKytherA wrote:F@H is set at a low scheduling priority by default so any other running task takes precedence, your customers should never run out of resources when their loads on your servers increase.
May you explain me a little bit in deep this concept?

Thanks in advance.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:25 am
by ajm
PeterGarlic wrote:The reason why we are testing small VMs is based on cluster permormances: A VM with a reduced amount of vCPU (4 as example) is more easy to be placed on the hypervisor scheduler than a 32 vCPU VM and also have less overhead, double process requests, core spin/exits, etc. (looking around about KVM resource contention there are many info about)

Considering this point of view probably on a long term hundreds small VMs will process more WU than few big VMs (right?) and the cluster resources will be used in a better way (right!).
For the cluster ressources, it's quite possible. For the raw number of WUs processed, I don't think so, but I may be wrong and I'll probably do some tests about that next week. But for the science, no, for all I have understood about the processes involved, it won't be better. The science is better off with FAST crunching, because the servers need to get the WUs back in order to generate the following ones. So larger and hence faster VM will deliver more scientific benefits.

And what I wrote above re duration and deadlines still holds: smaller VM will be easier to handle as such, but "their" WUs won't: they will need to run for much longer and if you pause them for too long, they won't deliver anything.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Thu May 14, 2020 8:35 am
by PantherX
The FahCore_a7 which does the folding runs at either idle or low priority. Thus, virtually all applications will take priority over it. That means that even if the CPU Usage is 100% will folding, when you fire up applications, it will response 98% of the time as if nothing else was running on your system. You can perform tests in your lab with the specific applications that your hosting.

I am not sure what Hypervisor you're using but some of them can adjust the VM resources so for F@H, they can be set as a lower priority while the app VMs can be set to a higher priority so if resources have to be taken, the F@H VMs will get less while the App VMs will get whatever they want.

Regarding the number of CPUs, a good balance would be 12 CPUs per cluster since F@H is FPU intensive so the physical cores are the real performers, SMT/HT can add additional improvement in TFP (reduce the overall time it takes to fold the WU) but it won't be close to what a physical core can do.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 8:37 am
by PeterGarlic
Hi all,
yesterday I started a new VM and spent a couple of time to complete the found issues analysis:

I collected some information on the "power-control" issue and some other about other issues found on OS/Package control.
I have also many log files (available on demand) but I'm going to post 2 reports about what I found.

Some points may require further information but I´m available for feedback is someone is interested.

Take a look

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 8:42 am
by PeterGarlic
[*### Installation/removal and other issues

Process sequence:

-) Create new VM (4CPU) for better control of parameters on test

or

-) save /etc/fahclient
-) uninstall client with:

# uninstall packages but not dependencies
$ rpm -e --nodeps fahviewer fahcontrol fahclient

### WARNING: components not removed on uninstall

# search remaining
$ find / -iname "*fah*"

# When the VM is ready:

-) disabled internet connectivity

1) client fresh install

[root@gm-srv-fah-004 7.6.13]# dnf install fahclient-7.6.13-1.x86_64.rpm fahcontrol-7.6.13-1.noarch.rpm fahviewer-7.6.13-1.x86_64.rpm
Last metadata expiration check: 1:23:39 ago on Mon 18 May 2020 12:51:31 PM CEST.
Dependencies resolved.
==============================================================================================================================================================
Package Architecture Version Repository Size
==============================================================================================================================================================
Installing:
fahclient x86_64 7.6.13-1 @commandline 3.5 M
fahcontrol noarch 7.6.13-1 @commandline 209 k
fahviewer x86_64 7.6.13-1 @commandline 4.4 M

Transaction Summary
==============================================================================================================================================================
Install 3 Packages

Total size: 8.1 M
Installed size: 24 M
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : fahviewer-7.6.13-1.x86_64 1/3
Installing : fahcontrol-7.6.13-1.noarch 2/3
Installing : fahclient-7.6.13-1.x86_64 3/3
Running scriptlet: fahclient-7.6.13-1.x86_64 3/3
Starting fahclient ... FAIL

Verifying : fahclient-7.6.13-1.x86_64 1/3
Verifying : fahcontrol-7.6.13-1.noarch 2/3
Verifying : fahviewer-7.6.13-1.x86_64 3/3

Installed:
fahclient-7.6.13-1.x86_64 fahcontrol-7.6.13-1.noarch fahviewer-7.6.13-1.x86_64

Complete!


## ERROR-01: Starting fahclient ... FAIL
#
# process control return error also if successful.
# The problem is present also in normal operations mode after the complete configuration

[root@gm-srv-fah-004 7.6.13]# /etc/init.d/FAHClient status
fahclient is running with PID 1287

[root@gm-srv-fah-004 ~]# service --status-all
fahclient is running with PID 1287

### ERROR-02: systemctl unit created but not working
#
# http://pedroivanlopez.com/install-fahcl ... vice-unit/
# https://apuntesderootblog.wordpress.com ... in-fedora/
# https://gist.github.com/lopezpdvn/81c8bb867c51292045c6
#

$ systemctl -a | grep -i fah

● FAHClient.service loaded failed failed LSB: Folding@home Client

### ERROR-03: after installation FAHclient start with default config & error
#
# After command line installation program start running with a default configuration
#
# Note: this can be also a security issue: a fake package can be installed and start doing anything.
#


[fedora@gm-srv-fah-004 ~]$ ps -eaf | grep -i fah
root 1287 1 0 14:15 ? 00:00:01 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+ 1289 1287 0 14:15 ? 00:00:03 /usr/bin/FAHClient --child /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon


[root@gm-srv-fah-004 init.d]# cd /etc/fahclient
[root@gm-srv-fah-004 fahclient]# cat config.xml

<config>
<!-- Folding Slot Configuration -->
<gpu v='false'/>

<!-- Slot Control -->
<power v='light'/>

<!-- User Information -->
<user v='anonymous'/>

<!-- Folding Slots -->
<slot id='0' type='CPU'/>

### ERROR-04 - after 1st installation process start attempting downloads
#
# with default attempt download/connection at start
#

Reference log:
20200518-log-01.txt

12:15:25:Downloading GPUs.txt from assign1.foldingathome.org:80
12:15:25:Connecting to assign1.foldingathome.org:80
12:15:28:WARNING:While updating GPUs.txt from assign1.foldingathome.org:80: Failed to connect to assign1.foldingathome.org:80: No route to host
12:15:28:Downloading GPUs.txt from assign2.foldingathome.org:80
12:15:28:Connecting to assign2.foldingathome.org:80
...

### ISSUE-01: client must be configure by hand on systems without gui
#
# with default config FAHControl can access only localhost but if you don´t have a GUI is a problem
# you can configure the client only manually
#

### ISSUE-02: missed text configuration
#
# Fedora/CentOS: missed text/bash based config wizard.
# Ubuntu: Wizard implemented on install but if you make a typo not easy to come-back:
# No summary, no confirmation to save config.
#

### ERROR-03 - execution of /usr/bin/FAHclient
#
# direct call of binary /usr/bin/FAHclient start also if already running:
# no command line required.
# return error for socket in use and doesn't exit

## Effect with default config:
#
# return error and exit

[root@gm-srv-fah-004 fahclient]# /usr/bin/FAHClient
13:58:12:Trying to access database...
13:58:42:ERROR:Exception: Error executing: 'PRAGMA synchronous=NORMAL': database is locked

## with configured client:
#
# display client info
# display configuration
# display: socket already in use error
# start looping attemption connection to various servers on port 80

Reference log:
20200518-log-02.txt

WARNING:WU00:FS00:Failed to get ID from 'assign1.foldingathome.org:80': Failed to connect to assign1.foldingathome.org:80: No route to host
12:16:29:WU00:FS00:Connecting to assign2.foldingathome.org:80
12:16:32:WARNING:WU00:FS00:Failed to get ID from 'assign2.foldingathome.org:80': Failed to connect to assign2.foldingathome.org:80: No route to host
12:16:32:WU00:FS00:Connecting to assign3.foldingathome.org:80
12:16:35:WARNING:WU00:FS00:Failed to get ID from 'assign3.foldingathome.org:80': Failed to connect to assign3.foldingathome.org:80: No route to host
12:16:35:WU00:FS00:Connecting to assign4.foldingathome.org:80
12:16:38:WARNING:WU00:FS00:Failed to get ID from 'assign4.foldingathome.org:80': Failed to connect to assign4.foldingathome.org:80: No route to host

### ISSUE-04 - unreferenced error
#
# /usr/bin/FAHClient return error if executed as unpriviledged user

[fedora@gm-srv-fah-004 fahclient]$ /usr/bin/FAHClient
13:41:25:ERROR:Exception: Failed to create directory 'logs': boost::filesystem::create_directory: Permission denied: "logs"


### ISSUE-05 - undocumented features
#
# /usr/bin/FAHclient --help return 489 text lines with many unrerefenced options

### ISSUE-06 - no man pages

[root@gm-srv-fah-004 fahclient]# man FAHClient
No manual entry for FAHClient
[root@gm-srv-fah-004 fahclient]# man FAHControl
No manual entry for FAHControl
[root@gm-srv-fah-004 fahclient]# man FAHViewer
No manual entry for FAHViewer
[root@gm-srv-fah-004 fahclient]# man FAHCoreWrapper
No manual entry for FAHCoreWrapper


### INFO-07 - manual execution of corewrapper
#
# unknown command action /usr/bin/FAHCoreWrapper
#

[root@gm-srv-fah-002 fahclient]# /usr/bin/FAHCoreWrapper
14:19:35:ERROR:Exception: Missing arguments
[root@gm-srv-fah-002 fahclient]# /usr/bin/FAHCoreWrapper --help
Started core on PID 34947


### INFO-00 - process function(s)s not documented
#
# Missed process explanation

[root@gm-srv-fah-002 fahclient]# ps -eaf | grep -i fah
root 26742 1 0 May13 ? 00:01:32 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+ 26745 26742 0 May13 ? 00:08:17 /usr/bin/FAHClient --child /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+ 34017 26745 0 10:22 ? 00:00:02 /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 26745 -checkpoint 15 -np 3
fahclie+ 34021 34017 99 10:22 ? 11:53:51 /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 34017 -checkpoint 15 -np 3


### ISSUE-00 - non standard logging
#
# log file unreferenced on standard location
# custom log rotation
#
# /var/lib/fahclient/log.txt
# /var/lib/fahclient/logs
#

### ISSUE-007 error on start/restart
#
[root@gm-srv-fah-004 /etc/fahclient]$ /etc/init.d/FAHClient start
Starting fahclient ... FAIL
[root@gm-srv-fah-004 /etc/fahclient]$ /etc/init.d/FAHClient start
Starting fahclient ... FAILED
fahclient seems to be already running with PID 3346


[root@gm-srv-fah-004 /etc/fahclient]$ vi config.xml
[root@gm-srv-fah-004 /etc/fahclient]$ /etc/init.d/FAHClient restart
Stopping fahclient ... OK
Starting fahclient ... FAIL
[root@gm-srv-fah-004 /etc/fahclient]$ /etc/init.d/FAHClient status
fahclient is running with PID 1277

]

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 8:43 am
by PeterGarlic
[*### Power control check

### NOTE-01 - pre-power control test
#
# first start: light mode
#
# slide CPU availability on FAHcontrol:
#
# 1 cpu - light
# 3 cpu - medium
# 4 cpu - full
#
# FAHControl cursor movement removes <power> tag from /etc/fahclient/control.xml for some time
#
#

### ERROR-10 - missed medium power tag on control.xmlc
#
# <power v='medium'/> never appear using FAHControl
#

### ERROR-11 - issues on FAHControl power control & config.xlm
#
#

$ cd /etc/fahclient
$ while true ; date ; do grep power config.xml; sleep 5; done


# start with light
#
# set to medium: after ~ 60 sec value disappear
# set to high: after ~ 60 sec value re-appear
#

[root@gm-srv-fah-004 /etc/fahclient]$ while true ; date ; do grep power config.xml; sleep 5; done
Mon 18 May 2020 06:05:46 PM CEST
<power v='light'/>
Mon 18 May 2020 06:05:51 PM CEST
<power v='light'/>
Mon 18 May 2020 06:05:56 PM CEST
<power v='light'/>
Mon 18 May 2020 06:06:01 PM CEST
<power v='light'/>
Mon 18 May 2020 06:06:06 PM CEST
<power v='light'/>
.....
Mon 18 May 2020 06:06:41 PM CEST
Mon 18 May 2020 06:06:46 PM CEST
Mon 18 May 2020 06:06:51 PM CEST
Mon 18 May 2020 06:06:56 PM CEST
Mon 18 May 2020 06:07:01 PM CEST
.....
Mon 18 May 2020 06:09:27 PM CEST
Mon 18 May 2020 06:09:32 PM CEST
Mon 18 May 2020 06:09:37 PM CEST
Mon 18 May 2020 06:09:42 PM CEST
<power v='full'/>
Mon 18 May 2020 06:09:47 PM CEST
<power v='full'/>
Mon 18 May 2020 06:09:52 PM CEST
<power v='full'/>
Mon 18 May 2020 06:09:57 PM CEST
<power v='full'/>



### ISSUE-10 - set <core-priority> from FAHControl has no effect
#
# note: value not present on config before (but described as default)
# and not remove while waiting jobs neither while running
#

<!-- Folding Core -->
<core-priority v='low'/>

$ cd /etc/fahclient
$ while true ; date ; do grep core config.xml; sleep 5; done

# start folding max power - low priority
#
# power control slide have no effects
#
# core-priority doesn' change
#
# CPU -1 : no effects
#
###

# ERROR-12: cpu subset (<cpus v='4'/>) disable power control

# client paused

<!-- Folding Slots -->
<slot id='0' type='CPU'>
<paused v='true'/>
</slot>

# fixed config.xml

<!-- Folding Slots -->
<slot id='0' type='CPU'>
<cpus v='4'/>
<paused v='true'/>
</slot>


### ISSUE-11: pause from command line not implemented
#

### ERROR-13: CPU -1 have ho effect on FAHControl
#
# replace by-hand

<!-- Folding Slots -->
<slot id='0' type='CPU'>
<cpus v='4'/>
<paused v='true'/>
</slot>

with

<!-- Folding Slots -->
<slot id='-1' type='CPU'/>

# unpause

# then a slot id is assigned
# FAHControl continue to display -1

### ERROR-14: all values on slot id are the same

<!-- Folding Slots -->
<slot id='18446744073709551615' type='CPU'/>

### ISSUE-12 - When ID is assigned all CPU come back under FAHControl power control


### ISSUE-13 - <core-priority v='low'/> has no effect

# this has no effect
<!-- Folding Core -->
<core-priority v='low'/>
]

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 8:58 am
by PeterGarlic
And this is the summary:

Installation/removal and other issues
### WARNING: components not removed on uninstall
### ERROR-01: Starting fahclient ... FAIL
### ERROR-02: systemctl unit created but not working
### ERROR-03: after installation FAHclient start with default config & error
### ERROR-04 - after 1st installation process start attempting downloads
### ISSUE-01: client must be configure by hand on systems without gui
### ISSUE-02: missed text configuration
### ERROR-03 - execution of /usr/bin/FAHclient
### ISSUE-04 - unreferenced error
### ISSUE-05 - undocumented features
### ISSUE-06 - no man pages
### INFO-07 - manual execution of corewrapper
### INFO-00 - process function(s)s not documented
### ISSUE-00 - non standard logging
### ISSUE-007 error on start/restart

Power control check
### NOTE-01 - pre-power control test
### ERROR-10 - missed medium power tag on control.xmlc
### ERROR-11 - issues on FAHControl power control & config.xlm
### ISSUE-10 - set <core-priority> from FAHControl has no effect
### ISSUE-11: pause from command line not implemented
### ERROR-13: CPU -1 have ho effect on FAHControl
### ERROR-14: all values on slot id are the same
### ISSUE-12 - When ID is assigned all CPU come back under FAHControl power control
### ISSUE-13 - <core-priority v='low'/> has no effect

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 9:15 am
by PantherX
PeterGarlic wrote:...### ERROR-04 - after 1st installation process start attempting downloads
#
# with default attempt download/connection at start
#

Reference log:
20200518-log-01.txt

12:15:25:Downloading GPUs.txt from assign1.foldingathome.org:80
12:15:25:Connecting to assign1.foldingathome.org:80
12:15:28:WARNING:While updating GPUs.txt from assign1.foldingathome.org:80: Failed to connect to assign1.foldingathome.org:80: No route to host
12:15:28:Downloading GPUs.txt from assign2.foldingathome.org:80
12:15:28:Connecting to assign2.foldingathome.org:80...
That's the design of the client. When it starts up, it attempts to download the GPUs.txt file to ensure that it can detect any GPUs on the system and then attempt to configure them. Do you see that as a problem? Alternatively, you can download the GPUs.txt file in the Gold copy of the VM and that way, it will not attempt to download the file.
PeterGarlic wrote:...### ISSUE-01: client must be configure by hand on systems without gui
#
# with default config FAHControl can access only localhost but if you don´t have a GUI is a problem
# you can configure the client only manually...
If all your VM copies are going to be identical, create a single config.xml file and that should work across all the replicas of the VM once the replica are online.
PeterGarlic wrote:...### ISSUE-05 - undocumented features
#
# /usr/bin/FAHclient --help return 489 text lines with many unrerefenced options

### ISSUE-06 - no man pages

[root@gm-srv-fah-004 fahclient]# man FAHClient
No manual entry for FAHClient
[root@gm-srv-fah-004 fahclient]# man FAHControl
No manual entry for FAHControl
[root@gm-srv-fah-004 fahclient]# man FAHViewer
No manual entry for FAHViewer
[root@gm-srv-fah-004 fahclient]# man FAHCoreWrapper
No manual entry for FAHCoreWrapper...
I believe that it is FAHClient --help and the same might apply to others but not sure about FAHCoreWrapper as that should only be called by FAHClient and not individually as it manages FahCore_XX which does the folding.
PeterGarlic wrote:...### INFO-07 - manual execution of corewrapper
#
# unknown command action /usr/bin/FAHCoreWrapper
#

[root@gm-srv-fah-002 fahclient]# /usr/bin/FAHCoreWrapper
14:19:35:ERROR:Exception: Missing arguments
[root@gm-srv-fah-002 fahclient]# /usr/bin/FAHCoreWrapper --help
Started core on PID 34947...
That's expected as FAHClient passes all the required arguments to successfully start it. it wasn't designed for manually running it.
PeterGarlic wrote:...### ISSUE-10 - set <core-priority> from FAHControl has no effect
#
# note: value not present on config before (but described as default)
# and not remove while waiting jobs neither while running
#

<!-- Folding Core -->
<core-priority v='low'/>

$ cd /etc/fahclient
$ while true ; date ; do grep core config.xml; sleep 5; done

# start folding max power - low priority
#
# power control slide have no effects
#
# core-priority doesn' change
#
# CPU -1 : no effects...
That depends on the OS as FahCore_XX Is designed to run at idle or low.
PeterGarlic wrote:...# ERROR-12: cpu subset (<cpus v='4'/>) disable power control

# client paused

<!-- Folding Slots -->
<slot id='0' type='CPU'>
<paused v='true'/>
</slot>

# fixed config.xml

<!-- Folding Slots -->
<slot id='0' type='CPU'>
<cpus v='4'/>
<paused v='true'/>
</slot>...
Manually setting the CPU value will disable the power slider as the power slider is meant for Donors who aren't going to change the CPU values. It works as long as the CPU value is -1 which means that the client controls it. Changing it means that the Donor has chosen to manually override it.
PeterGarlic wrote:...### ISSUE-11: pause from command line not implemented...
If you send it the flag via the telnet session, it does work.
PeterGarlic wrote:...### ERROR-14: all values on slot id are the same

<!-- Folding Slots -->
<slot id='18446744073709551615' type='CPU'/>...
That's weird as I expected it to start from 0 on wards.
PeterGarlic wrote:...### ISSUE-13 - <core-priority v='low'/> has no effect

# this has no effect
<!-- Folding Core -->
<core-priority v='low'/>
See my answer above about the idle/low priority.

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 9:33 am
by Neil-B
I see PantherX has responded already on many of your points, but please forgive me for just responding to ("harping on about") one of you comments:
PeterGarlic wrote:Considering this point of view probably on a long term hundreds small VMs will process more WU than few big VMs (right?) and the cluster resources will be used in a better way (right!).
I will repeat/conflate the myriad forum messages I have seen relating to this point: (and I have no association with FAH Consortium and am just a regular folder who may have got it all wrong)

From a "getting the science done" it isn't just about how many WUs can be completed over a set period of time but also about how fast each WU is completed (which is possibly more so, given that FAH has a Quick Return Bonus (QRB) system implemented to reward fast returns indicating a preference for speed - or they may just be trying to reward 24/7 folding - I believe it is a bit of both) … Processing more WUs slowly is not necessarily the best approach (up to a point) … All things being equal (which they are not, I get it) from a FAH perspective a single 32 core slot is more useful than two 16 core slots, and so on.

Even if 8 WUs take longer for a single 32core slot to complete (done serially) than it would take eight 4core slots to complete in parallel it is likely that say five/six of the WUs will be completed by the 32core slot before a single one is returned by the 4core slots - that is five/six WUs that can have their next generation calculated and released … I'll accept that it isn't 100% clear where that break even between more slow slots and fewer faster slots is - and slower slots still have value (specifically not saying they don't) … and you will have far more of an understanding about the drop off in throughput and performance under your configurations - obviously eight 4core WUs would be much better than sat 2 32core WUs in the same period :shock:

So it comes down to how you define cluster resources being used in a better way … If you are counting highest number of WUs produced over a given timespan then yes resources are probably being used in a better way with small core counts … If you are counting highest value contribution to FAH then it may be that small core counts are not necessarily using the cluster resources in a better way :?:

4core slots are really relatively small counts - if it were potential within target configuration to push that up to 6, 8 or even 12 that would significantly speed up the return of the WUs and help the projects progress faster? … Having said that lots of small cores slots is far better than none at all (if that were the only other option) - and if in order to deconflict FAH with other usage this is the way you need to go then there is no issue :)

So please do ignore the above if it in any way conflicts with the likelihood of you folding !!! :)

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 7:30 pm
by PantherX
Neil-B wrote:...FAH has a Quick Return Bonus (QRB) system implemented to reward fast returns indicating a preference for speed - or they may just be trying to reward 24/7 folding - I believe it is a bit of both)...
Almost there... F@H rewards/encourages fastest WU return, the F@H enthusiasts encourage folding 24/7. If we have 2 identical systems but different usage, this is what would happen:
System A folds for only 4 hours but finishes 1 WU and gets 1,000 points.
System B folds 24/7 and finishes 6 WUs and gets 6,000 points.

Both systems earn the exact same points per WU as they took the same time to complete. However, the PPD will vary significantly as the number of WUs folded between the systems will be different. Thus, F@H doesn't encourage you to fold 24/7, it encourages you to return the WU as quickly as possible based on your resources that you want to donate... QRB does that while the F@H Community suggests leaving the system running 24/7 if possible :)

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Tue May 19, 2020 7:41 pm
by Neil-B
Actually for slower systems that wouldn't complete a WU within timeout unless left of 24/7 (and this does happen in GPU land) the QRB does in effect encourage 24/7 hence why I mentioned it ... Even with a 4hr WU there is an encouragement from the QRB to fold for at least 4hrs after the WU is downloaded - some peoples natural pattern is just a couple of hours in the evening !!

Re: fahclient 7.6.x ignore <power value="*"/>

Posted: Wed May 20, 2020 1:42 pm
by PeterGarlic
Neil-B wrote:Actually for slower systems that wouldn't complete a WU within timeout unless left of 24/7 (and this does happen in GPU land) the QRB does in effect encourage 24/7 hence why I mentioned it ... Even with a 4hr WU there is an encouragement from the QRB to fold for at least 4hrs after the WU is downloaded - some peoples natural pattern is just a couple of hours in the evening !!
A 4 vCPU VM is not the target, we have still to find the best configuration for our VMs and other people on my company is working on different branches (monitoring, automation, etc.) while they match the minimum requirements.

The 4 core VM test was created just to test the client and check possible issues (as example we found the problem with power control and that FAHControl uses 1,3,4 cpu on light, medium, full power)

At the moment few clients are running in test mode waiting other teams but we will groove :wink: