Algorithm for distributing Work Units, statistical model
Posted: Sat May 02, 2020 9:23 am
Hi,
I havn't gotten into much-published information on how work units are distributed however I have a few and ideas and possible improves
I would imagine that across all projects that have available work, you dish out work units.
Can we decided the priority and every there is nothing left if your preferences, then
pick the next best work unit from any other category, maybe this is an optional checkboxes feature.
Because waisting computational resource at your disposal slows progress obviously, especially if there work units in other major projects, that on is not supporting, but would like to determine there specific ordering of supported projects.
Imagion High has COVID-19 is top of the list.
Can some on also explain the progress of the project on the projects stats page.
Then I guess the other thing hopefully in work units, I don't know the details, is that machine that has six sigma reliability of returning and doing work, at highest computation rates are first assigned workloads of high priority, followed by those lower. Ideally, you looking to reduce and optimize the finishing time of all the workloads, you don't want 1 client to have a workload, that results in taking an additional 8 days to complete, so if faster clients can finish the workloads faster statistically, can work off a set of work units in the expected time of 4days processing 8 work units, then there is no point giving a work unit to someone that would take 8days, just means you going to be waisting computation and slowing down the rate at which the project is completed.
So to be clear WU rate is WU/ time in seconds metric.
Not clocks or anything else.
Guess have optimization algorithm, running the permutations, set of all clients and configurations, which then be simplified into common aggregations buckets of WU, sigma, work rate, work rate reliability, work rate min, max, avg, std. Then of all WU units available, assign them into the different work-rate aggregations' buckets, or assigned basically maximum slowest work-rate unit bucket, due to anything less than this just be wasting resource and creating additional waiting time as explained above. Every time new projects are added, then have to re-run this permutations algorithm, to update the distribution of the project of units, across work-rate aggregation buckets for assignment, when wu are requested. Basically match resources based on how reliable and efficient they are to complete the project in the least amount of time.
Obviously, you only updated the clients into add common aggregation buckets of WU, at a slower rate or incrementally as required promotion and de-promotion.
To take this further, each client for should report the typically maximum, avg, min power consumption of there chip average across the cores they were using, that is also then taken into account for the aggregation buckets. In which now you can compute to optimization dimensions max horsepower and measure the power consumption of the network, or have a balance between
max horsepower and power consumption, to where the curve flattens out, less CO2 emissions you saving the world.
To improve this further, ideally client on the machine, should be able to run the computations and then PWM idle cycles, or so to reduce power consumption, then measure system at different loads at the granularity of 1% which then also gets submitted and maintained by there client with re-benchmark/sample, which checks randomly from time to time as well.
Having this information about your client machines and the computation WU rate/ power consumption you can then basically find the sweet spot of balancing power and performance of the network.
In the long run, what this results in is naturally old slow and computation power inefficient clients, will no longer get work, as they can no longer make the baseline of WUrate/power consumption rate WU, which means just be doing the environmental damage.
If you would like to take this further, then that would be having the geographic location of the client and statistics as detail as you can get of there power source, in which cases as this information that would be obtained from national power operators changes for wind extra availability, then you can re-run section of the distribution of WU to aggregation WU clients, based on how clean there power is.
You could also associate each client with the geographic power rate in that country, which would allow you to in ways know which would be the cheapest to run, but balancing at between cheap and CO2.
You can then integrate this with OpenADR and PLMA Power utility demand response programs at a high level. In Which case this partially the start of everything being run remotely, however, to
a different set of contains, later everything will take topology of the grid network and period constant deadlines time availability, usage patterns, all going to be run in the cloud.
So would be possible first attempts, but at the pace things are going to have to move for EV and rest of distributed feedback network and both systems utility and distributed power
The other is for the client indicate that they are off-grid, and solar, in which case probably don't care as long its runoff solar permanently.
Eventually, this turns into a nice little AI problem.
I havn't gotten into much-published information on how work units are distributed however I have a few and ideas and possible improves
I would imagine that across all projects that have available work, you dish out work units.
Can we decided the priority and every there is nothing left if your preferences, then
pick the next best work unit from any other category, maybe this is an optional checkboxes feature.
Because waisting computational resource at your disposal slows progress obviously, especially if there work units in other major projects, that on is not supporting, but would like to determine there specific ordering of supported projects.
Imagion High has COVID-19 is top of the list.
Can some on also explain the progress of the project on the projects stats page.
Then I guess the other thing hopefully in work units, I don't know the details, is that machine that has six sigma reliability of returning and doing work, at highest computation rates are first assigned workloads of high priority, followed by those lower. Ideally, you looking to reduce and optimize the finishing time of all the workloads, you don't want 1 client to have a workload, that results in taking an additional 8 days to complete, so if faster clients can finish the workloads faster statistically, can work off a set of work units in the expected time of 4days processing 8 work units, then there is no point giving a work unit to someone that would take 8days, just means you going to be waisting computation and slowing down the rate at which the project is completed.
So to be clear WU rate is WU/ time in seconds metric.
Not clocks or anything else.
Guess have optimization algorithm, running the permutations, set of all clients and configurations, which then be simplified into common aggregations buckets of WU, sigma, work rate, work rate reliability, work rate min, max, avg, std. Then of all WU units available, assign them into the different work-rate aggregations' buckets, or assigned basically maximum slowest work-rate unit bucket, due to anything less than this just be wasting resource and creating additional waiting time as explained above. Every time new projects are added, then have to re-run this permutations algorithm, to update the distribution of the project of units, across work-rate aggregation buckets for assignment, when wu are requested. Basically match resources based on how reliable and efficient they are to complete the project in the least amount of time.
Obviously, you only updated the clients into add common aggregation buckets of WU, at a slower rate or incrementally as required promotion and de-promotion.
To take this further, each client for should report the typically maximum, avg, min power consumption of there chip average across the cores they were using, that is also then taken into account for the aggregation buckets. In which now you can compute to optimization dimensions max horsepower and measure the power consumption of the network, or have a balance between
max horsepower and power consumption, to where the curve flattens out, less CO2 emissions you saving the world.
To improve this further, ideally client on the machine, should be able to run the computations and then PWM idle cycles, or so to reduce power consumption, then measure system at different loads at the granularity of 1% which then also gets submitted and maintained by there client with re-benchmark/sample, which checks randomly from time to time as well.
Having this information about your client machines and the computation WU rate/ power consumption you can then basically find the sweet spot of balancing power and performance of the network.
In the long run, what this results in is naturally old slow and computation power inefficient clients, will no longer get work, as they can no longer make the baseline of WUrate/power consumption rate WU, which means just be doing the environmental damage.
If you would like to take this further, then that would be having the geographic location of the client and statistics as detail as you can get of there power source, in which cases as this information that would be obtained from national power operators changes for wind extra availability, then you can re-run section of the distribution of WU to aggregation WU clients, based on how clean there power is.
You could also associate each client with the geographic power rate in that country, which would allow you to in ways know which would be the cheapest to run, but balancing at between cheap and CO2.
You can then integrate this with OpenADR and PLMA Power utility demand response programs at a high level. In Which case this partially the start of everything being run remotely, however, to
a different set of contains, later everything will take topology of the grid network and period constant deadlines time availability, usage patterns, all going to be run in the cloud.
So would be possible first attempts, but at the pace things are going to have to move for EV and rest of distributed feedback network and both systems utility and distributed power
The other is for the client indicate that they are off-grid, and solar, in which case probably don't care as long its runoff solar permanently.
Eventually, this turns into a nice little AI problem.