Is there a Multiple Regression model for estimated time per step?
If not, can we make one?
I was wondering what are the variables that I need to collect data on.
So far, I have:
CPU FSB, CPU Speed, CPU type, Memory Speed, Single/Dual Channel, Project No., Operating System.
Please let me know what I should add.
I did not include the Run-clone-gen values because as long as they have the same Project No., their time difference is very small in my experience.
A regression model for estimated Time per Step?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 35
- Joined: Wed Dec 05, 2007 8:53 pm
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: A regression model for estimated Time per Step?
Modeling breaks down unless you can account for how many hours per day the processor is folding, unless or normalize the data to assume the full 24 hours. You also have to track if the processor is using 100% CPU usage, or folding at slightly less, say 85% so that my computer doesn't lag when I try to use it.
Also, are you only tracking CPU clients, or SMP clients also. If using SMP clients, then the number of processors has to be tracked as well. And Cache Size. And...
You also have to track more than just the project number. Gromacs process at a different speed than DGromacs because of SSE2.
If you still want to dive in, then dive in head first, and do it. Don't just talk about it. Start collecting data. If you aren't tracking enough predictive variables, the data will tell you that.
Or just go visit fahinfo.org. The site already has charts and data for more of what you want.
One last question, what do you accomplish by being able to predict the time per step? Would not predicting PPD be a better goal, as that is what most people want to know? Isn't that how most people compare performance? Not time per step.
Also, are you only tracking CPU clients, or SMP clients also. If using SMP clients, then the number of processors has to be tracked as well. And Cache Size. And...
You also have to track more than just the project number. Gromacs process at a different speed than DGromacs because of SSE2.
If you still want to dive in, then dive in head first, and do it. Don't just talk about it. Start collecting data. If you aren't tracking enough predictive variables, the data will tell you that.
Or just go visit fahinfo.org. The site already has charts and data for more of what you want.
One last question, what do you accomplish by being able to predict the time per step? Would not predicting PPD be a better goal, as that is what most people want to know? Isn't that how most people compare performance? Not time per step.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: A regression model for estimated Time per Step?
A couple of very important factors are the size and organization of cache. I'm not sure how you're going to account for that.
The size issue is probably obvious; the organization issue is not so obvious. Suppose you have a dual Core2Duo machine (or a Q6600). A SMP project that starts four copies of FahCore_a1 will need tight communications between all four copies of the FahCore, but invariably this will be unequal. Since two CPUs share one cache and the other two CPUs share a different cache, there will be a significant difference in overhead (and time) depending on which FahCore is nearest to which other copy. Affinity Changer addresses a portion of this issue, but not all.
AMD organizes their cache quite differently than Intel, and this will make a significant difference, too.
The size issue is probably obvious; the organization issue is not so obvious. Suppose you have a dual Core2Duo machine (or a Q6600). A SMP project that starts four copies of FahCore_a1 will need tight communications between all four copies of the FahCore, but invariably this will be unequal. Since two CPUs share one cache and the other two CPUs share a different cache, there will be a significant difference in overhead (and time) depending on which FahCore is nearest to which other copy. Affinity Changer addresses a portion of this issue, but not all.
AMD organizes their cache quite differently than Intel, and this will make a significant difference, too.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 35
- Joined: Wed Dec 05, 2007 8:53 pm
Re: A regression model for estimated Time per Step?
I guess this will be used mainly for 24/7 folders, since I assume this is where the interest will be in.
To accound for L2 cache and etc is a lot of work, so wouldn't it be a better idea to just track the processor type? Since most 24/7 folders use Q6600, Q9450 and the like.
SMP clients will be tracked now, if this works out, I can track the GPU2 client after that.
Of course the final goal is PPD, getting the time per step will let me achieve that.
One question, do the same project numbers give the same amount of points?
For now, I am going to just start tracking a couple of machines.
To accound for L2 cache and etc is a lot of work, so wouldn't it be a better idea to just track the processor type? Since most 24/7 folders use Q6600, Q9450 and the like.
SMP clients will be tracked now, if this works out, I can track the GPU2 client after that.
Of course the final goal is PPD, getting the time per step will let me achieve that.
One question, do the same project numbers give the same amount of points?
For now, I am going to just start tracking a couple of machines.
7im wrote:Modeling breaks down unless you can account for how many hours per day the processor is folding, unless or normalize the data to assume the full 24 hours. You also have to track if the processor is using 100% CPU usage, or folding at slightly less, say 85% so that my computer doesn't lag when I try to use it.
Also, are you only tracking CPU clients, or SMP clients also. If using SMP clients, then the number of processors has to be tracked as well. And Cache Size. And...
You also have to track more than just the project number. Gromacs process at a different speed than DGromacs because of SSE2.
If you still want to dive in, then dive in head first, and do it. Don't just talk about it. Start collecting data. If you aren't tracking enough predictive variables, the data will tell you that.
Or just go visit fahinfo.org. The site already has charts and data for more of what you want.
One last question, what do you accomplish by being able to predict the time per step? Would not predicting PPD be a better goal, as that is what most people want to know? Isn't that how most people compare performance? Not time per step.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: A regression model for estimated Time per Step?
Time per how many steps? per how many steps total? Not all work units have 100 steps, and not all work units count steps 1 at a time.chungenhung wrote:...
Of course the final goal is PPD, getting the time per step will let me achieve that.
One question, do the same project numbers give the same amount of points?
Yes, in general, the same project does use the same points. However, in the past, project numbers have been re-used, and given different points totals. And while VERY rare, Stanford does adjust the points if the project benchmark is shown to be significantly too low.
Also, why drill down to such a fine grain as time per step. If the final goal is to predict PPD, then that is what you should be trying to track. Just use the total time for the WU, and the total points. If you do calculations to estimate the time per frame, and then do more math to estimate the PPD, you can lose data precision.
And don't forget the time per step is not alway constant. In some projects, the time per step slowly increases with each step. Some slowly decrease in time with each step, and most stay the same. And it doesn't allow for folding events, which can spike the time per step in the middle of a WU.
I guess it depends on how accurate you want to get, how much work you are willing to put in to it, and if you judge the value of the modeled information to be worth that effort.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 35
- Joined: Wed Dec 05, 2007 8:53 pm
Re: A regression model for estimated Time per Step?
Got your point.
So I will be tracking total workunit time, and total points per workunit.
So I will be tracking total workunit time, and total points per workunit.
7im wrote: Time per how many steps? per how many steps total? Not all work units have 100 steps, and not all work units count steps 1 at a time.
Yes, in general, the same project does use the same points. However, in the past, project numbers have been re-used, and given different points totals. And while VERY rare, Stanford does adjust the points if the project benchmark is shown to be significantly too low.
Also, why drill down to such a fine grain as time per step. If the final goal is to predict PPD, then that is what you should be trying to track. Just use the total time for the WU, and the total points. If you do calculations to estimate the time per frame, and then do more math to estimate the PPD, you can lose data precision.
And don't forget the time per step is not alway constant. In some projects, the time per step slowly increases with each step. Some slowly decrease in time with each step, and most stay the same. And it doesn't allow for folding events, which can spike the time per step in the middle of a WU.
I guess it depends on how accurate you want to get, how much work you are willing to put in to it, and if you judge the value of the modeled information to be worth that effort.
-
- Posts: 35
- Joined: Wed Dec 05, 2007 8:53 pm
Re: A regression model for estimated Time per Step?
One question:
does the workunit finished at 100%?
So I should track from 0% to 100%?
does the workunit finished at 100%?
So I should track from 0% to 100%?
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: A regression model for estimated Time per Step?
Yes, a very high percentage of work units finish at 100 %, but not all.chungenhung wrote:One question:
does the workunit finished at 100%?
So I should track from 0% to 100%?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: A regression model for estimated Time per Step?
Although it's not a formal regression model, I'd think that fahinfo.org is doing the same thing you're planning to do in a somewhat different way. If you digest all of his data, wouldn't you arrive at the same model? Then if the model has unexplained variations, you would be able to identify some important variable that's not being acounted for. I'm sure that's what uncle_fungus intended when he set up his site.
Why don't you discuss it with him?
Why don't you discuss it with him?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.