Linux SMP v6 compared to Windows SMP client
Moderators: Site Moderators, FAHC Science Team
Linux SMP v6 compared to Windows SMP client
Do they perform at about the same rate?
Logic would tell me that the Linux client would perform better in console-only compared to having Windows running.
Logic would tell me that the Linux client would perform better in console-only compared to having Windows running.
Re: Linux SMP v6 compared to Windows SMP client
A) When you have the viewer open in Windows, folding slows down. The new viewer uses a lot more CPU time than the old one, but it has always been true. When the viewer is closed, the Windows GUI client and the Windows Console client are essentially the same.
B) Each of the versions of MPI for Windows-32 is inferior to MPI for Linux-64. That creates a difference that is not easily overcome. Some of that comes out as stability issues and some as performance issues.
C) The overhead for Windows sitting there but not doing anything isn't really zero, but it's a lot smaller than a lot of the Linux people believe -- unless Windows is actually doing something, which is what I covered in part A. Of course I'm assuming that you turn off some of the unnecessary functions like virus scans and file indexing, etc.
B) Each of the versions of MPI for Windows-32 is inferior to MPI for Linux-64. That creates a difference that is not easily overcome. Some of that comes out as stability issues and some as performance issues.
C) The overhead for Windows sitting there but not doing anything isn't really zero, but it's a lot smaller than a lot of the Linux people believe -- unless Windows is actually doing something, which is what I covered in part A. Of course I'm assuming that you turn off some of the unnecessary functions like virus scans and file indexing, etc.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: Linux SMP v6 compared to Windows SMP client
.
There is a markedly higher performance with LinuxSMP against WinSMP.
You could say that Linux has the direct-drive system (or the DOHC motor) whereas Windows hasn't ...
That 's why I went with Linux, which I didn't really know before; it has better stability too (see reports)
.
There is a markedly higher performance with LinuxSMP against WinSMP.
You could say that Linux has the direct-drive system (or the DOHC motor) whereas Windows hasn't ...
That 's why I went with Linux, which I didn't really know before; it has better stability too (see reports)
.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Linux SMP v6 compared to Windows SMP client
It's not the OS that affects the speed. The MPICH packages are different between the two OSs, and so behave differently. In my experience, the Windows client isn't any less stable, just succeptable to more problems than the Linux client.
The performance difference is noteable, 10-15%, sometimes more, sometimes not at all. It's worth running Linux in VM to some people, not others. Run what you want to run.
The performance difference is noteable, 10-15%, sometimes more, sometimes not at all. It's worth running Linux in VM to some people, not others. Run what you want to run.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Linux SMP v6 compared to Windows SMP client
Logic would tell me that the Linux client would perform better in console-only compared to having Windows running.
It depends. I don't imagine there is a big difference. I run multiple uni-processor clients (v5.04) on Linux because I saw weird things happening with the Linux SMP client. For example, when another process is running and taking as little as 5% CPU, one of the F@H SMP threads gets migrated away to a different core, so I end up with 50% idle time that F@H isn't using. I didn't observe that behaviour on the Windows SMP client, but it is entirely possible that this is merely unobservable on Windows, not that it isn't actually happening.
I use SMP client on Windows just because I can live with 1 minimized console window in the tray, but 4 would start to annoy me. On Linux it's all forked into the background in rc.local, so I use 4x single-thread clients because it seems to yield most CPU utilization, so presumably it's actually doing more work. I haven't done any PPD benchmarking, though.
It depends. I don't imagine there is a big difference. I run multiple uni-processor clients (v5.04) on Linux because I saw weird things happening with the Linux SMP client. For example, when another process is running and taking as little as 5% CPU, one of the F@H SMP threads gets migrated away to a different core, so I end up with 50% idle time that F@H isn't using. I didn't observe that behaviour on the Windows SMP client, but it is entirely possible that this is merely unobservable on Windows, not that it isn't actually happening.
I use SMP client on Windows just because I can live with 1 minimized console window in the tray, but 4 would start to annoy me. On Linux it's all forked into the background in rc.local, so I use 4x single-thread clients because it seems to yield most CPU utilization, so presumably it's actually doing more work. I haven't done any PPD benchmarking, though.
-
- Posts: 357
- Joined: Mon Dec 03, 2007 4:36 pm
- Hardware configuration: Q9450 OC @ 3.2GHz (Win7 Home Premium) - SMP2
E7500 OC @ 3.66GHz (Windows Home Server) - SMP2
i5-3750k @ 3.8GHz (Win7 Pro) - SMP2 - Location: University of Birmingham, UK
Re: Linux SMP v6 compared to Windows SMP client
@shatteredsilicon
Firstly, welcome to the forums!
If the only reason for you running SMP is because four minimised console windows would annoy you, you could always use something like TrayIt (as I believe it's called) to move the minimised windows into the System tray... or install 4 v5.04 console clients as a service (meaning that you don't get a window) then use something like FahMon to monitor them (which again can run from the system tray).
Firstly, welcome to the forums!
If the only reason for you running SMP is because four minimised console windows would annoy you, you could always use something like TrayIt (as I believe it's called) to move the minimised windows into the System tray... or install 4 v5.04 console clients as a service (meaning that you don't get a window) then use something like FahMon to monitor them (which again can run from the system tray).
Folding whatever I'm sent since March 2006 Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Linux SMP v6 compared to Windows SMP client
Thank you!Firstly, welcome to the forums!
There is a point in there somewhere, that SMP client is actually quite pointless. The overheads involved, as per what I saw mentioned on the FAQ page, mean that it doesn't scale linearly with multiple cores. This is quite opposite to running multiple single-thread clients. So:If the only reason for you running SMP is because four minimised console windows would annoy you,
1) Why is anyone bothering running an SMP client? Perhaps the TrayIt suggestion ought to be in the FAQ.
2) Why was the SMP client even written? I cannot see what advantages it could possibly provide compared to running multiple separate clients.
I'm assuming here that I'm missing an important advantage of the SMP client, but I just can't see what.
Re: Linux SMP v6 compared to Windows SMP client
You should read up on Moore's Law.shatteredsilicon wrote:There is a point in there somewhere, that SMP client is actually quite pointless. The overheads involved, as per what I saw mentioned on the FAQ page, mean that it doesn't scale linearly with multiple cores. This is quite opposite to running multiple single-thread clients. So:Firstly, welcome to the forums!
1) Why is anyone bothering running an SMP client? Perhaps the TrayIt suggestion ought to be in the FAQ.
2) Why was the SMP client even written? I cannot see what advantages it could possibly provide compared to running multiple separate clients.
I'm assuming here that I'm missing an important advantage of the SMP client, but I just can't see what.
Completing one WU in 25-30% of the time is MUCH more valuable to science than taking 100% of the time to complete four similar WUs. All of the recent development work has been aimed at creating high-performance clients. The SMP and PS3 and GPU clients do the same science much faster and that's a lot more important than you are assuming.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Linux SMP v6 compared to Windows SMP client
I'm not sure I follow that logic. In this particular case, why is it more important you get the next one set of results in 4 hours than 4 sets of results in 12 hours? What is it that this faster turnaround gains you that isn't outweighed by getting 25-30% more done in the long term? The point is that with better scalability, more science gets done overall. We're not talking about long time intervals here, either.
Re: Linux SMP v6 compared to Windows SMP client
You might also read this thread which is going on concurrently: viewtopic.php?p=37556#p37556
FAH assignments consists of both WUs that can be done concurrently and WUs that must be done serially. It's the total of all the serial steps that turns out to have the most important scientific value, not the number of parallel tasks that can be started.
A single trajectory that takes 10 years to compute is a lot more valuable that 10 trajectories that take one year to compute -- especially if the event of interest doesn't happen in the first year's worth of work. Reducing that cure that takes 10 years to compute to 2.8 years is really important, and if it can be reduced to 1.4 years by using 8-core machines, then it's even better.
FAH assignments consists of both WUs that can be done concurrently and WUs that must be done serially. It's the total of all the serial steps that turns out to have the most important scientific value, not the number of parallel tasks that can be started.
A single trajectory that takes 10 years to compute is a lot more valuable that 10 trajectories that take one year to compute -- especially if the event of interest doesn't happen in the first year's worth of work. Reducing that cure that takes 10 years to compute to 2.8 years is really important, and if it can be reduced to 1.4 years by using 8-core machines, then it's even better.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: Linux SMP v6 compared to Windows SMP client
Look at the deadlines for the uniprocessor client -- They are on the order of months to deal with the lowest common denominator. The deadlines on the SMP client is measured in days because they can assume a certain minimum speed that one can't with the uniprocessor machines. With each project, the majority of WU's need to be returned, before the next generation can be released. So you can go through many generations with the SMP client before the uni-processor client can get through one. Therfore the scientific value of the higher performance clients is far greater than the lower performance clients.
Now the value of the GPU clients are in the fact that they can process far more flops than even the SMP clients. What that gives is the ability to calculate a far bigger time-slice which again gives more scientific value, even though the deadlines are not a large scale difference from the SMP clients.
This is my interpretation of the reasons given. Please correct me, if I'm wrong
Now the value of the GPU clients are in the fact that they can process far more flops than even the SMP clients. What that gives is the ability to calculate a far bigger time-slice which again gives more scientific value, even though the deadlines are not a large scale difference from the SMP clients.
This is my interpretation of the reasons given. Please correct me, if I'm wrong
-
- Posts: 270
- Joined: Sun Dec 02, 2007 2:26 pm
- Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+ - Location: Belgium, near the International Sea-Port of Antwerp
Re: Linux SMP v6 compared to Windows SMP client
.shatteredsilicon wrote:I'm not sure I follow that logic. In this particular case, why is it more important you get the next one set of results in 4 hours than 4 sets of results in 12 hours? What is it that this faster turnaround gains you that isn't outweighed by getting 25-30% more done in the long term? The point is that with better scalability, more science gets done overall. We're not talking about long time intervals here, either.
The WU's all are a tiny piece of 1 timeline (per project); the results of a finished WU give Stanford new parameters to inject in a new WU and so on, and so on ...
That 's the simple (maybe too simple) explanation.
.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
....................................
Folded since 10-06-04 till 09-2010
Re: Linux SMP v6 compared to Windows SMP client
Essentially correct . . . but somewhat too simple. Each WU is a tiny piece of a single timeline, but there are number of timelines within a single project. Until a WU is returned, the next WU for that same timeline cannot be created.noorman wrote:The WU's all are a tiny piece of 1 timeline (per project); the results of a finished WU give Stanford new parameters to inject in a new WU and so on, and so on ...
That 's the simple (maybe too simple) explanation.
("Timeline" = "Trajectory"
Each PRC is a separate trajectory. Each Gen is another piece of the same trajectory.)
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 87
- Joined: Tue Jul 08, 2008 2:27 pm
- Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
Re: Linux SMP v6 compared to Windows SMP client
Thanks for the clarification. That makes sense WRT usefulness.
As far as parallel performance scalability is concerned, the standard optimization paradigm says: "Vectorize inner loops, parallelize outer loops." Presumably, that is the paradigm followed in the SMP F@H client. Just out of interest - what compiler are F@H cores built with? ICC's optimizer can do vectorizing automatically for reasonably written code, as well as auto-parallelizing. Assembly can do the same, but I'm just wondering if leveraging a better compiler has been explored for F@H. I have personally seen speed improvements of up to 7x (700%) from using ICC to compile my own number crunching libraries (pure C++) compared to GCC. Just a thought. It might provide scope for completely avoiding MPI and some of the overheads. I'd try it myself, but F@H is closed source...
As far as parallel performance scalability is concerned, the standard optimization paradigm says: "Vectorize inner loops, parallelize outer loops." Presumably, that is the paradigm followed in the SMP F@H client. Just out of interest - what compiler are F@H cores built with? ICC's optimizer can do vectorizing automatically for reasonably written code, as well as auto-parallelizing. Assembly can do the same, but I'm just wondering if leveraging a better compiler has been explored for F@H. I have personally seen speed improvements of up to 7x (700%) from using ICC to compile my own number crunching libraries (pure C++) compared to GCC. Just a thought. It might provide scope for completely avoiding MPI and some of the overheads. I'd try it myself, but F@H is closed source...
-
- Posts: 357
- Joined: Mon Dec 03, 2007 4:36 pm
- Hardware configuration: Q9450 OC @ 3.2GHz (Win7 Home Premium) - SMP2
E7500 OC @ 3.66GHz (Windows Home Server) - SMP2
i5-3750k @ 3.8GHz (Win7 Pro) - SMP2 - Location: University of Birmingham, UK
Re: Linux SMP v6 compared to Windows SMP client
You say reasonably written code... all the FAH cores are hand-coded to get the most out of the hardware they are using (well... maybe except the a1 SMP core lol)... so I guess that might negate some of the advantages of a new compiler. And besides, the Pande Group always wants more speed so I would guess they regularly look at their compilers to see if a new one can make the code run more efficiently and therefore faster
EDIT: I would also guess that the answer is no, new compilers cannot make the cores faster, for single core clients anyway... check the build dates, most are from 2006 on the older cores
EDIT: I would also guess that the answer is no, new compilers cannot make the cores faster, for single core clients anyway... check the build dates, most are from 2006 on the older cores
Folding whatever I'm sent since March 2006 Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.