Re: measuring/monitoring F@H
Posted: Mon Mar 30, 2009 5:33 pm
*edit*
Source: Huang, J. (2003). Understanding Gigabit Ethernet Performance on Sun Fire Systems. Sun Microsystems : Santa Clara, CA. retrived from http://www.sun.com/blueprints/0203/817-1657.pdf on March 30, 2009.
It states that in their tests on the Sun Fire 6800 that the round-trip latency is 140 ms (or 70 ms one-way). So, based on that, I think that the way for me to find out what the latency is between my systems is to ping the systems from in both directions.
I'm currently averaging about 1 ms (or slightly less than that, at about 0.7 ms). It seems that when I start the ping command (both in Windows and in Linux) that there's a bit of a lag (up to about 8 ms) before it settles down to the actual transfer rates. I'm doing a bit more data collecting now while the F@H (standalone SMP clients) are running in the background.
One of the two systems that I'm going to be trying the distributed parallel on is currently running Windows XP Pro 32-bit. (Q9550, 2.83 GHz, quad-core, 8 GB RAM)
The other system, (dual AMD Opteron 2220, 2.8 GHz, 8 GB RAM) is currently running RHEL4 WS.
I'm doing absolutely NOTHING to tweak/tune/refine/etc. and of the TCP parameters although if you read the PDF cited above, you'd see that they go into like change the payload size, TCP window size, reducing overhead, and a whole slew of parameters. I'm not going to realllly bother with all that. Mostly because I'm lazy, but also because while I'm pretty sure that they will have an effect on the effective transfer rates, that in reality, on a real distributed parallel deployment; I don't think that people are really going to be spending that much time to keep constantly tweaking the parameters. I suppose that if they had like a real-time solver or something so that they can automate the entire process; then sure, and let the system pick the best parameters to maximize the throughput; but I'm much too lazy (and don't really have the time) to do quite that extensive of a study.
If I were to go for a distributed parallel processing deployment, I'd want to know enough JUST to get the cluster up and running, know the method that will trigger the load balancing (like in ClusterKNOPPIX, it used to be that you needed to have about like 8 processes or something before it'll start load balancing. I don't know exact details, but that was what I noticed before when I did earlier testing with it for my work.) And then set up the F@H SMP clients, and let 'er rip. (and watch the GbE switch light up like a Christmas tree.)
So, my current presumptions now is that even with a 12 MB/s constant data flow rate; that GbE is essentially designed to try to best reach/achieve/sustain GbE speeds which would tend to also mean that it would have to have a low latency in order to achieve that if there's nothing that it is doing to tweak the payload size, or TCP window size, or anything else like that.
Of course, I won't know until I actually do the clustering tests BUT that's at least the current working presumptions for now; which would tend to suggest that it is actually quite possible to run F@H in a distributed parallel setting. I'm not entirely sure how the number of nodes would affect it's overhead, but this COULD be good news for a bunch of people who have a number of single and dual processor (cores) systems laying around in that they can string them together using cheap GbE and make it run the SMP client, thus getting them a much higher PPD than the uniprocessor clients could/would be able to.
However unrealistic, IF this actually works, and people started doing it; it would be nice to see a good portion of the uniprocessor units be converted to SMP units, so that we'd be able to throw more processors on it, and run it more efficiently. *shrug* I dunno. just a thought.
Source: Huang, J. (2003). Understanding Gigabit Ethernet Performance on Sun Fire Systems. Sun Microsystems : Santa Clara, CA. retrived from http://www.sun.com/blueprints/0203/817-1657.pdf on March 30, 2009.
It states that in their tests on the Sun Fire 6800 that the round-trip latency is 140 ms (or 70 ms one-way). So, based on that, I think that the way for me to find out what the latency is between my systems is to ping the systems from in both directions.
I'm currently averaging about 1 ms (or slightly less than that, at about 0.7 ms). It seems that when I start the ping command (both in Windows and in Linux) that there's a bit of a lag (up to about 8 ms) before it settles down to the actual transfer rates. I'm doing a bit more data collecting now while the F@H (standalone SMP clients) are running in the background.
One of the two systems that I'm going to be trying the distributed parallel on is currently running Windows XP Pro 32-bit. (Q9550, 2.83 GHz, quad-core, 8 GB RAM)
The other system, (dual AMD Opteron 2220, 2.8 GHz, 8 GB RAM) is currently running RHEL4 WS.
I'm doing absolutely NOTHING to tweak/tune/refine/etc. and of the TCP parameters although if you read the PDF cited above, you'd see that they go into like change the payload size, TCP window size, reducing overhead, and a whole slew of parameters. I'm not going to realllly bother with all that. Mostly because I'm lazy, but also because while I'm pretty sure that they will have an effect on the effective transfer rates, that in reality, on a real distributed parallel deployment; I don't think that people are really going to be spending that much time to keep constantly tweaking the parameters. I suppose that if they had like a real-time solver or something so that they can automate the entire process; then sure, and let the system pick the best parameters to maximize the throughput; but I'm much too lazy (and don't really have the time) to do quite that extensive of a study.
If I were to go for a distributed parallel processing deployment, I'd want to know enough JUST to get the cluster up and running, know the method that will trigger the load balancing (like in ClusterKNOPPIX, it used to be that you needed to have about like 8 processes or something before it'll start load balancing. I don't know exact details, but that was what I noticed before when I did earlier testing with it for my work.) And then set up the F@H SMP clients, and let 'er rip. (and watch the GbE switch light up like a Christmas tree.)
So, my current presumptions now is that even with a 12 MB/s constant data flow rate; that GbE is essentially designed to try to best reach/achieve/sustain GbE speeds which would tend to also mean that it would have to have a low latency in order to achieve that if there's nothing that it is doing to tweak the payload size, or TCP window size, or anything else like that.
Of course, I won't know until I actually do the clustering tests BUT that's at least the current working presumptions for now; which would tend to suggest that it is actually quite possible to run F@H in a distributed parallel setting. I'm not entirely sure how the number of nodes would affect it's overhead, but this COULD be good news for a bunch of people who have a number of single and dual processor (cores) systems laying around in that they can string them together using cheap GbE and make it run the SMP client, thus getting them a much higher PPD than the uniprocessor clients could/would be able to.
However unrealistic, IF this actually works, and people started doing it; it would be nice to see a good portion of the uniprocessor units be converted to SMP units, so that we'd be able to throw more processors on it, and run it more efficiently. *shrug* I dunno. just a thought.