Page 1 of 1

A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 5:40 am
by chungenhung
Is there a Multiple Regression model for estimated time per step?
If not, can we make one?
I was wondering what are the variables that I need to collect data on.
So far, I have:
CPU FSB, CPU Speed, CPU type, Memory Speed, Single/Dual Channel, Project No., Operating System.

Please let me know what I should add.
I did not include the Run-clone-gen values because as long as they have the same Project No., their time difference is very small in my experience.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 6:37 am
by 7im
Modeling breaks down unless you can account for how many hours per day the processor is folding, unless or normalize the data to assume the full 24 hours. You also have to track if the processor is using 100% CPU usage, or folding at slightly less, say 85% so that my computer doesn't lag when I try to use it.

Also, are you only tracking CPU clients, or SMP clients also. If using SMP clients, then the number of processors has to be tracked as well. And Cache Size. And...

You also have to track more than just the project number. Gromacs process at a different speed than DGromacs because of SSE2.

If you still want to dive in, then dive in head first, and do it. Don't just talk about it. Start collecting data. If you aren't tracking enough predictive variables, the data will tell you that.

Or just go visit fahinfo.org. The site already has charts and data for more of what you want.

One last question, what do you accomplish by being able to predict the time per step? Would not predicting PPD be a better goal, as that is what most people want to know? Isn't that how most people compare performance? Not time per step.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 1:01 pm
by bruce
A couple of very important factors are the size and organization of cache. I'm not sure how you're going to account for that.

The size issue is probably obvious; the organization issue is not so obvious. Suppose you have a dual Core2Duo machine (or a Q6600). A SMP project that starts four copies of FahCore_a1 will need tight communications between all four copies of the FahCore, but invariably this will be unequal. Since two CPUs share one cache and the other two CPUs share a different cache, there will be a significant difference in overhead (and time) depending on which FahCore is nearest to which other copy. Affinity Changer addresses a portion of this issue, but not all.

AMD organizes their cache quite differently than Intel, and this will make a significant difference, too.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 3:12 pm
by chungenhung
I guess this will be used mainly for 24/7 folders, since I assume this is where the interest will be in.
To accound for L2 cache and etc is a lot of work, so wouldn't it be a better idea to just track the processor type? Since most 24/7 folders use Q6600, Q9450 and the like.
SMP clients will be tracked now, if this works out, I can track the GPU2 client after that.

Of course the final goal is PPD, getting the time per step will let me achieve that.
One question, do the same project numbers give the same amount of points?

For now, I am going to just start tracking a couple of machines.
7im wrote:Modeling breaks down unless you can account for how many hours per day the processor is folding, unless or normalize the data to assume the full 24 hours. You also have to track if the processor is using 100% CPU usage, or folding at slightly less, say 85% so that my computer doesn't lag when I try to use it.

Also, are you only tracking CPU clients, or SMP clients also. If using SMP clients, then the number of processors has to be tracked as well. And Cache Size. And...

You also have to track more than just the project number. Gromacs process at a different speed than DGromacs because of SSE2.

If you still want to dive in, then dive in head first, and do it. Don't just talk about it. Start collecting data. If you aren't tracking enough predictive variables, the data will tell you that.

Or just go visit fahinfo.org. The site already has charts and data for more of what you want.

One last question, what do you accomplish by being able to predict the time per step? Would not predicting PPD be a better goal, as that is what most people want to know? Isn't that how most people compare performance? Not time per step.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 4:35 pm
by 7im
chungenhung wrote:...

Of course the final goal is PPD, getting the time per step will let me achieve that.
One question, do the same project numbers give the same amount of points?
Time per how many steps? per how many steps total? Not all work units have 100 steps, and not all work units count steps 1 at a time.

Yes, in general, the same project does use the same points. However, in the past, project numbers have been re-used, and given different points totals. And while VERY rare, Stanford does adjust the points if the project benchmark is shown to be significantly too low.

Also, why drill down to such a fine grain as time per step. If the final goal is to predict PPD, then that is what you should be trying to track. Just use the total time for the WU, and the total points. If you do calculations to estimate the time per frame, and then do more math to estimate the PPD, you can lose data precision.

And don't forget the time per step is not alway constant. In some projects, the time per step slowly increases with each step. Some slowly decrease in time with each step, and most stay the same. And it doesn't allow for folding events, which can spike the time per step in the middle of a WU.

I guess it depends on how accurate you want to get, how much work you are willing to put in to it, and if you judge the value of the modeled information to be worth that effort.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 4:56 pm
by chungenhung
Got your point.
So I will be tracking total workunit time, and total points per workunit.
7im wrote: Time per how many steps? per how many steps total? Not all work units have 100 steps, and not all work units count steps 1 at a time.

Yes, in general, the same project does use the same points. However, in the past, project numbers have been re-used, and given different points totals. And while VERY rare, Stanford does adjust the points if the project benchmark is shown to be significantly too low.

Also, why drill down to such a fine grain as time per step. If the final goal is to predict PPD, then that is what you should be trying to track. Just use the total time for the WU, and the total points. If you do calculations to estimate the time per frame, and then do more math to estimate the PPD, you can lose data precision.

And don't forget the time per step is not alway constant. In some projects, the time per step slowly increases with each step. Some slowly decrease in time with each step, and most stay the same. And it doesn't allow for folding events, which can spike the time per step in the middle of a WU.

I guess it depends on how accurate you want to get, how much work you are willing to put in to it, and if you judge the value of the modeled information to be worth that effort.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 5:01 pm
by chungenhung
One question:
does the workunit finished at 100%?
So I should track from 0% to 100%?

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 5:19 pm
by 7im
chungenhung wrote:One question:
does the workunit finished at 100%?
So I should track from 0% to 100%?
Yes, a very high percentage of work units finish at 100 %, but not all.

Re: A regression model for estimated Time per Step?

Posted: Wed Apr 23, 2008 5:43 pm
by bruce
Although it's not a formal regression model, I'd think that fahinfo.org is doing the same thing you're planning to do in a somewhat different way. If you digest all of his data, wouldn't you arrive at the same model? Then if the model has unexplained variations, you would be able to identify some important variable that's not being acounted for. I'm sure that's what uncle_fungus intended when he set up his site.

Why don't you discuss it with him?