Page 3 of 5
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Tue May 12, 2020 3:59 pm
by Neil-B
1:
The way Project progress is (in my very simple terms) that a Project starts off with a batch of WUs defined by "Run" and "Clone" which are basically variants … The WUs are generated and sent out by the Work Server (WS) with a "Gen" counter of "0" … You will see this referred to as the PRCG which allows every WUs in a Project to be tracked … For various reasons unique PRCGs are only issued to one folder at a time (with a few exceptions) and the folder folds that and returns it … The results of that PRCG WU allow a new WU to be generated for folding which will be the Gen 1 of the PRC triple … when that comes back the next WU in the sequence will be generated from the results and be Gen 2, so the project relies of WUs being returned as quickly as possible to allow each generation to be prepared and sent out.
Each Project has two dates - a Timeout and a Deadline - If a WU is not returned to the WS before the Timeout it is issued again (in case the original has "got lost") and another folder gets it even though it might still be being worked on by the original folder. Whoever returns it first allows the PRC to continue with the next Gen being prepared from the results of the previous … If a WU is not returned by the deadline it is treated as "no longer existing" - The server wouldn't accept it back after this time and actually the client will dump any WU that exceeds it Deadline.
The Quick Return Bonus points system is an "incentive" for folders to complete WUs asap and get them back before Timeout as they get more points that way (personally I don't like that "game" - but it works) … What it is trying to do is incentivise quicker returns as these allow the next Gens of WUs to be created.
2:
I believe so … if you look at
https://apps.foldingathome.org/wu#proje ... ne=2&gen=0 you will see the third field is the CPUID … This relates to a processing slot and can help track stats and setups amongst other things … Someone might use the same username, passkey and team information on say 20 machines each having 1 cpu slots and say 2 GPU slots … This would mean that in a stable environment (where slots aren't being deleted/reset/clients reinstalled there would be some 60 CPUIDs being tracked for that user/Passkey/team triple … this CPUID is 16 digits so may only be poart of the 20 digits you are looking at - or may be in no way related, but I mention it just in case
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Wed May 13, 2020 7:32 am
by PantherX
PeterGarlic wrote:...Single slot assignment
First way to configure is usage of each core/cpu as slot – not the best approach because each configured slot will be used at 100% and each one will be slower than the sum of available processing power. Maybe can be useful for some special configurations (for example can be assigned less slot than the available cou) but the power control is unusable and I´m not sure about the activity of the OS scheduler...
As explained above, this would be the worst setup for F@H and is discouraged by using the Quick Return Bonus points system. IN F@H, we value the quickest return of the WU. That is scientifically more valuable.
PeterGarlic wrote:...Grouping CPU in one slot:
With this configuration 4 CPU will be available and the power control (thanks PantherX) will use:
light: 1 core
medium: 3 cores
full: 4 cores...
This is the best method but it would be:
Light: 2 CPUs
Medium: 3 CPUs
Full 4 CPUs
The above assumes that you do not have a GPU Slot. If you have a GPU Slot then minus the number of GPU Slots from above.
PeterGarlic wrote:...Grouping CPU in more slots
With this configuration 2 groups of 2 CPU will be available and the power control will use:
light: 1 core
medium: 1 core
full: 2 cores...
This is again is bad for science and not recommended for this setup. However, setting up multiple CPU Slots can be recommended when you have systems with more than 32 CPUs. In that case, you can run a 32 CPU and a 24 CPU Slots for a system with 56 CPUs.
PeterGarlic wrote:...
Setting -1 in FAHControl
If you set -1 as CPU slot the system goes in automatic mode and your slot configuration will be something like:
Code: Select all
<!-- Folding Slots -->
<slot id='12345678901234567890' type='CPU'/>
where the slot is a “pseudo-random” 20 digits number assigned to the user (?)/team (?) but is always the same for all the instances running with the same account. Once again you will have full control of power with the same assignment seen on "Grouping CPU in one slot" .
Note: I´m not sure about that but our lab VMs are working in this way (anybody have one explanation for that?)...
The "-1" setting would yield a very similar result to "Grouping CPU in one slot" however, the Slot id is incorrect. A Slot ID always starts at 0 and increments by 1. As explained above, the client upon first starting up, connects to the Assignment Server to be assigned a assignment ID which should be unique to each client. If you are cloning VMs, make sure that the Gold copy which contains the client was never connected to the internet when you installed and configured the client.
PeterGarlic wrote:...All right or I have to study more?
It depends, what are you trying to achieve as we may have information to help you speed up or verify what we have said
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Wed May 13, 2020 10:39 am
by PeterGarlic
Thanks @PantherX and @Neil-b,
your explanation are really interesting and help me to understand a lot of technical details.
But I think I have to study more: Yesterday I changed the configuration (sequence: client paused, client stop, config update, client start, client un-paused) on one of the running fahclient VM (4 vCPU test) from:
Code: Select all
<!-- Folding Slots -->
<slot id='12345678901234567890' type='CPU'/>
(Note: this is not my real id code, just a number that I used instead of my real id)
to
Code: Select all
<!-- Folding Slots -->
<slot id='0' type='CPU'>
<cpus v='4'/>
</slot>
Then I set power to "full" and waited the end of the WU in progress.
When the new WU is arrived started using 4 cpu at 100% but I lost the control of power from FAHControl: now if the value is set to light, medium of full the changes are not applied and all the cpu are always at 100%.
This mean that the power control is available only when i use -1 an CPU (and I receive a slot id) or I have still to understand something?
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Wed May 13, 2020 10:59 am
by PeterGarlic
Neil-B wrote:1:
2:
I believe so … if you look at
https://apps.foldingathome.org/wu#proje ... ne=2&gen=0 you will see the third field is the CPUID … This relates to a processing slot and can help track stats and setups amongst other things … Someone might use the same username, passkey and team information on say 20 machines each having 1 cpu slots and say 2 GPU slots … This would mean that in a stable environment (where slots aren't being deleted/reset/clients reinstalled there would be some 60 CPUIDs being tracked for that user/Passkey/team triple … this CPUID is 16 digits so may only be part of the 20 digits you are looking at - or may be in no way related, but I mention it just in case
Hi Neil-b
some additional question on "slot id":
- It is possible to clone a VM instance and then require a new id for that VM?
- In our company we are using a proxy and we exit from a firewall with a NAT. This can influence the id received from the client?
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Wed May 13, 2020 11:26 am
by Neil-B
If you install the software into a VM without an internet connection it will not connect and so no id is created ... if that VM is cloned and the individual VMs stood up from it are allowed to connect to the internet themselves this resolves id issue (I think - someone more technical than me will need to confirm).
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Wed May 13, 2020 3:33 pm
by ajm
A few thoughts. With a 4 CPUs system, the overhead will probably always result in near 100% utilization. I don't have experience with so few threads, my smaller processor is a 4C/8T running on 6 threads (at 100%). In fact, each time I use about 2/3 to 3/4 of the available threads, the overall utilization is close to or at the maximum. I see more nuances only on a 32C/64T, but there too, above 48 threads, the overall utilization is stuck at 95-100%, and adding more CPUs to FAH will essentially raise the temperature. If this is a problem, it would be better to define 12, 24 or 32 CPUs slots in systems with respectively 16, 28 and 36 available threads. It would probably also yield more points (= more and better science) because the results would be delivered faster to FAH's servers. But above all, it would be more flexible to use because of that speed - if the hardware is needed for something else, you'll always find several running WUs that can be finished within an hour or juste minutes, then the VM can be saved and shut down. No worries. With 4 CPUs systems, not to mention running only 1 or 2 CPUS, WUs will last for days. Of course, you can pause them and restore them later too, if you need the hardware, but you would always be close to the maximum duration admissible of all WUs and it would be a nightmare (or a solid piece of programming work) keeping track of hundreds of such VM with their duration and deadlines. With 36T systems, you would have much less of them and you could play with their availability much more easily, without much concern about deadlines, all the more because the stopping of only one or two of those would already provide a fair amount of computing power. Two 36T systems stopped give you 72T to work with. With 4T systems, you would have to manage 18 VM to get there. Even if you automatize the whole thing (which will take time), you'll probably end up with endless issues to spot and resolve.
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 3:54 am
by PantherX
PeterGarlic wrote:...When the new WU is arrived started using 4 cpu at 100% but I lost the control of power from FAHControl: now if the value is set to light, medium of full the changes are not applied and all the cpu are always at 100%.
This mean that the power control is available only when i use -1 an CPU (and I receive a slot id) or I have still to understand something?
I am not sure... you may need to check the log file to see if you have any of these as changing the power slider would trigger a new config.xml file being written:
<power v='light'/>
<power v='medium'/>
<power v='full'/>
I would be keen to see the log file of that test. It could be that -1 is required to automatically change the CPU values... I just haven't tested it.
Also, there's an unofficial VMware Appliance for you to test it out:
https://flings.vmware.com/vmware-applia ... lding-home
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 3:57 am
by PantherX
Neil-B wrote:If you install the software into a VM without an internet connection it will not connect and so no id is created ... if that VM is cloned and the individual VMs stood up from it are allowed to connect to the internet themselves this resolves id issue (I think - someone more technical than me will need to confirm).
That's correct. You can install the client without any network connectivity and when you start the client with network connectivity, it will ask for an assignment\User ID from the Assignment Server.
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 4:31 am
by bruce
bruce wrote:FAH doesn't specify which CPU it will use, only how much ... it's up to the scheduler in the OS to pick the available CPU. From the perspective of the program, 100% of 1 CPU is the same as 50% of two CPUs. or 33% of three CPUs.
I think you misuderstand the concept of power=light
FAH has absolutely no ability to decide which CPU will be used. If we are talking about a physical CPU that processes 4 threads, and you have configured your system (or the WU) to use one CPU, FAH will ask the operating system for 25% of it's resources. The task scheduler in the OS decides which thread to use.
Moreover,(depending on your OS) the task scheduler will probably so its best to
avoid moving a task from one thread to another simply because it's more efficient NOT to move it. (Doing so disrupts the function of the CPUs cache.) People who want the fastest processing do manually set "
affinity" to maximize the efficiency of the cache, but a good job scheduler will often do that for you. Localized heating shouldn't be your concern.
By the way, this whole concept means nothing if you're running in a VM environment. Each virtual CPU is mapped onto some real CPU or some combination of CPU resources. Virtual CPUs don't experience localized heating and the real CPU Cache isn't directly mapped onto a virtual cache.
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 6:58 am
by PeterGarlic
PantherX wrote:Neil-B wrote:If you install the software into a VM without an internet connection it will not connect and so no id is created ... if that VM is cloned and the individual VMs stood up from it are allowed to connect to the internet themselves this resolves id issue (I think - someone more technical than me will need to confirm).
That's correct. You can install the client without any network connectivity and when you start the client with network connectivity, it will ask for an assignment\User ID from the Assignment Server.
This is a good suggestion: I´m going to sent it directly to the automation team.
What about the other 2 other open points on id assignment?
- It is possible to clone a VM instance and then require a new id for that VM?
(maybe removing and reinstalling fahclient or with other operations...)
- In our company we are using a proxy and we exit from a firewall with a NAT.
This can influence the id received from the client?
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 7:02 am
by PeterGarlic
PantherX wrote:PeterGarlic wrote:...When the new WU is arrived started using 4 cpu at 100% but I lost the control of power from FAHControl: now if the value is set to light, medium of full the changes are not applied and all the cpu are always at 100%.
This mean that the power control is available only when i use -1 an CPU (and I receive a slot id) or I have still to understand something?
I am not sure... you may need to check the log file to see if you have any of these as changing the power slider would trigger a new config.xml file being written:
<power v='light'/>
<power v='medium'/>
<power v='full'/>
I would be keen to see the log file of that test. It could be that -1 is required to automatically change the CPU values... I just haven't tested it.
Also, there's an unofficial VMware Appliance for you to test it out:
https://flings.vmware.com/vmware-applia ... lding-home
I´m going to reproduce, log and document each step. I will answer later to that.
I saw the vmware appliance, but we don´t use VMware and the deployment steps seems to be specific for that platform
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 7:09 am
by PantherX
PeterGarlic wrote:...It is possible to clone a VM instance and then require a new id for that VM?
(maybe removing and reinstalling fahclient or with other operations...)...
If you mean that an already running VM were to be cloned, then yes, it does require a new ID. You can definitely uninstall (include option to delete data) and then perform a fresh installation.
PeterGarlic wrote:...In our company we are using a proxy and we exit from a firewall with a NAT.
This can influence the id received from the client?
AFAIK, the client does support proxy configuration so if you do it like that, it should work. If it doesn't it is either a bug or an enhancement. Please keep us posted on this
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 7:10 am
by PantherX
PeterGarlic wrote:... saw the vmware appliance, but we don´t use VMware and the deployment steps seems to be specific for that platform
Humm... if you're using Docker, have a look at the official package:
https://github.com/FoldingAtHome/containers
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 7:12 am
by PeterGarlic
bruce wrote:bruce wrote:FAH doesn't specify which CPU it will use, only how much ... it's up to the scheduler in the OS to pick the available CPU. From the perspective of the program, 100% of 1 CPU is the same as 50% of two CPUs. or 33% of three CPUs.
I think you misuderstand the concept of power=light
FAH has absolutely no ability to decide which CPU will be used. If we are talking about a physical CPU that processes 4 threads, and you have configured your system (or the WU) to use one CPU, FAH will ask the operating system for 25% of it's resources. The task scheduler in the OS decides which thread to use.
Moreover,(depending on your OS) the task scheduler will probably so its best to
avoid moving a task from one thread to another simply because it's more efficient NOT to move it. (Doing so disrupts the function of the CPUs cache.) People who want the fastest processing do manually set "
affinity" to maximize the efficiency of the cache, but a good job scheduler will often do that for you. Localized heating shouldn't be your concern.
By the way, this whole concept means nothing if you're running in a VM environment. Each virtual CPU is mapped onto some real CPU or some combination of CPU resources. Virtual CPUs don't experience localized heating and the real CPU Cache isn't directly mapped onto a virtual cache.
Thanks for the insight but I never supposed to set a CPU affinity (I have just discussed about that) and the concept of power is absolutely clear.
As PantherX wrote at the beginning of this post:
"This is what the power values mean (assuming no supported GPU is present)
Light -> Uses 50% of your CPUs
Medium -> Uses 1 CPU less than all your CPUs
Full -> Uses all your CPUs"
And this is perfect: I have just to find the way to make it working as expected, because our idea is to have a dynamic resource allocation and the power control can be essential for one part of this automation process. (I will explain better the concept iin the next reply)
Re: fahclient 7.6.x ignore <power value="*"/>
Posted: Thu May 14, 2020 7:28 am
by Neil-B
One thing to consider depending upon how broad your thoughts of "dynamic" are ... especially if you have any thoughts as to dynamically resizing (from a vCPU perspective) VMs during the course of folding a WU ... and that is that whilst it is possible to reduce the number of threads/cores/vCPUs working on a WU during folding it is not possible to increase this above the level it was set at when downloading.
A WU downloaded to work on a 6 thread slot can de reduced to 4 but not increased to 12 ... The VM could be, bet the FAH would only use 6 until the next WU is downloaded.