Linux SMP v6 compared to Windows SMP client

Moderators: Site Moderators, FAHC Science Team

Balistyx
Posts: 3
Joined: Thu Apr 17, 2008 1:47 am

Linux SMP v6 compared to Windows SMP client

Post by Balistyx »

Do they perform at about the same rate?

Logic would tell me that the Linux client would perform better in console-only compared to having Windows running.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

A) When you have the viewer open in Windows, folding slows down. The new viewer uses a lot more CPU time than the old one, but it has always been true. When the viewer is closed, the Windows GUI client and the Windows Console client are essentially the same.

B) Each of the versions of MPI for Windows-32 is inferior to MPI for Linux-64. That creates a difference that is not easily overcome. Some of that comes out as stability issues and some as performance issues.

C) The overhead for Windows sitting there but not doing anything isn't really zero, but it's a lot smaller than a lot of the Linux people believe -- unless Windows is actually doing something, which is what I covered in part A. Of course I'm assuming that you turn off some of the unnecessary functions like virus scans and file indexing, etc.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: Linux SMP v6 compared to Windows SMP client

Post by noorman »

.

There is a markedly higher performance with LinuxSMP against WinSMP.

You could say that Linux has the direct-drive system (or the DOHC motor) whereas Windows hasn't ...

That 's why I went with Linux, which I didn't really know before; it has better stability too (see reports)


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Linux SMP v6 compared to Windows SMP client

Post by 7im »

It's not the OS that affects the speed. The MPICH packages are different between the two OSs, and so behave differently. In my experience, the Windows client isn't any less stable, just succeptable to more problems than the Linux client.

The performance difference is noteable, 10-15%, sometimes more, sometimes not at all. It's worth running Linux in VM to some people, not others. Run what you want to run.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

Logic would tell me that the Linux client would perform better in console-only compared to having Windows running.

It depends. I don't imagine there is a big difference. I run multiple uni-processor clients (v5.04) on Linux because I saw weird things happening with the Linux SMP client. For example, when another process is running and taking as little as 5% CPU, one of the F@H SMP threads gets migrated away to a different core, so I end up with 50% idle time that F@H isn't using. I didn't observe that behaviour on the Windows SMP client, but it is entirely possible that this is merely unobservable on Windows, not that it isn't actually happening.

I use SMP client on Windows just because I can live with 1 minimized console window in the tray, but 4 would start to annoy me. On Linux it's all forked into the background in rc.local, so I use 4x single-thread clients because it seems to yield most CPU utilization, so presumably it's actually doing more work. I haven't done any PPD benchmarking, though.
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
John Naylor
Posts: 357
Joined: Mon Dec 03, 2007 4:36 pm
Hardware configuration: Q9450 OC @ 3.2GHz (Win7 Home Premium) - SMP2
E7500 OC @ 3.66GHz (Windows Home Server) - SMP2
i5-3750k @ 3.8GHz (Win7 Pro) - SMP2
Location: University of Birmingham, UK

Re: Linux SMP v6 compared to Windows SMP client

Post by John Naylor »

@shatteredsilicon

Firstly, welcome to the forums!

If the only reason for you running SMP is because four minimised console windows would annoy you, you could always use something like TrayIt (as I believe it's called) to move the minimised windows into the System tray... or install 4 v5.04 console clients as a service (meaning that you don't get a window) then use something like FahMon to monitor them (which again can run from the system tray).
Folding whatever I'm sent since March 2006 :) Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

Firstly, welcome to the forums!
Thank you! :)
If the only reason for you running SMP is because four minimised console windows would annoy you,
There is a point in there somewhere, that SMP client is actually quite pointless. The overheads involved, as per what I saw mentioned on the FAQ page, mean that it doesn't scale linearly with multiple cores. This is quite opposite to running multiple single-thread clients. So:

1) Why is anyone bothering running an SMP client? Perhaps the TrayIt suggestion ought to be in the FAQ.

2) Why was the SMP client even written? I cannot see what advantages it could possibly provide compared to running multiple separate clients.

I'm assuming here that I'm missing an important advantage of the SMP client, but I just can't see what.
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

shatteredsilicon wrote:
Firstly, welcome to the forums!
There is a point in there somewhere, that SMP client is actually quite pointless. The overheads involved, as per what I saw mentioned on the FAQ page, mean that it doesn't scale linearly with multiple cores. This is quite opposite to running multiple single-thread clients. So:

1) Why is anyone bothering running an SMP client? Perhaps the TrayIt suggestion ought to be in the FAQ.

2) Why was the SMP client even written? I cannot see what advantages it could possibly provide compared to running multiple separate clients.

I'm assuming here that I'm missing an important advantage of the SMP client, but I just can't see what.
You should read up on Moore's Law.

Completing one WU in 25-30% of the time is MUCH more valuable to science than taking 100% of the time to complete four similar WUs. All of the recent development work has been aimed at creating high-performance clients. The SMP and PS3 and GPU clients do the same science much faster and that's a lot more important than you are assuming.
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

I'm not sure I follow that logic. In this particular case, why is it more important you get the next one set of results in 4 hours than 4 sets of results in 12 hours? What is it that this faster turnaround gains you that isn't outweighed by getting 25-30% more done in the long term? The point is that with better scalability, more science gets done overall. We're not talking about long time intervals here, either.
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

You might also read this thread which is going on concurrently: viewtopic.php?p=37556#p37556

FAH assignments consists of both WUs that can be done concurrently and WUs that must be done serially. It's the total of all the serial steps that turns out to have the most important scientific value, not the number of parallel tasks that can be started.

A single trajectory that takes 10 years to compute is a lot more valuable that 10 trajectories that take one year to compute -- especially if the event of interest doesn't happen in the first year's worth of work. Reducing that cure that takes 10 years to compute to 2.8 years is really important, and if it can be reduced to 1.4 years by using 8-core machines, then it's even better.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Linux SMP v6 compared to Windows SMP client

Post by P5-133XL »

Look at the deadlines for the uniprocessor client -- They are on the order of months to deal with the lowest common denominator. The deadlines on the SMP client is measured in days because they can assume a certain minimum speed that one can't with the uniprocessor machines. With each project, the majority of WU's need to be returned, before the next generation can be released. So you can go through many generations with the SMP client before the uni-processor client can get through one. Therfore the scientific value of the higher performance clients is far greater than the lower performance clients.

Now the value of the GPU clients are in the fact that they can process far more flops than even the SMP clients. What that gives is the ability to calculate a far bigger time-slice which again gives more scientific value, even though the deadlines are not a large scale difference from the SMP clients.

This is my interpretation of the reasons given. Please correct me, if I'm wrong
Image
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: Linux SMP v6 compared to Windows SMP client

Post by noorman »

shatteredsilicon wrote:I'm not sure I follow that logic. In this particular case, why is it more important you get the next one set of results in 4 hours than 4 sets of results in 12 hours? What is it that this faster turnaround gains you that isn't outweighed by getting 25-30% more done in the long term? The point is that with better scalability, more science gets done overall. We're not talking about long time intervals here, either.
.


The WU's all are a tiny piece of 1 timeline (per project); the results of a finished WU give Stanford new parameters to inject in a new WU and so on, and so on ...

That 's the simple (maybe too simple) explanation.

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Linux SMP v6 compared to Windows SMP client

Post by bruce »

noorman wrote:The WU's all are a tiny piece of 1 timeline (per project); the results of a finished WU give Stanford new parameters to inject in a new WU and so on, and so on ...

That 's the simple (maybe too simple) explanation.
Essentially correct . . . but somewhat too simple. Each WU is a tiny piece of a single timeline, but there are number of timelines within a single project. Until a WU is returned, the next WU for that same timeline cannot be created.

("Timeline" = "Trajectory"
Each PRC is a separate trajectory. Each Gen is another piece of the same trajectory.)
shatteredsilicon
Posts: 87
Joined: Tue Jul 08, 2008 2:27 pm
Hardware configuration: 1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers

Re: Linux SMP v6 compared to Windows SMP client

Post by shatteredsilicon »

Thanks for the clarification. That makes sense WRT usefulness.

As far as parallel performance scalability is concerned, the standard optimization paradigm says: "Vectorize inner loops, parallelize outer loops." Presumably, that is the paradigm followed in the SMP F@H client. Just out of interest - what compiler are F@H cores built with? ICC's optimizer can do vectorizing automatically for reasonably written code, as well as auto-parallelizing. Assembly can do the same, but I'm just wondering if leveraging a better compiler has been explored for F@H. I have personally seen speed improvements of up to 7x (700%) from using ICC to compile my own number crunching libraries (pure C++) compared to GCC. Just a thought. It might provide scope for completely avoiding MPI and some of the overheads. I'd try it myself, but F@H is closed source...
Image
1x Q6600 @ 3.2GHz, 4GB DDR3-1333
1x Phenom X4 9950 @ 2.6GHz, 4GB DDR2-1066
3x GeForce 9800GX2
1x GeForce 8800GT
CentOS 5 x86-64, WINE 1.x with CUDA wrappers
John Naylor
Posts: 357
Joined: Mon Dec 03, 2007 4:36 pm
Hardware configuration: Q9450 OC @ 3.2GHz (Win7 Home Premium) - SMP2
E7500 OC @ 3.66GHz (Windows Home Server) - SMP2
i5-3750k @ 3.8GHz (Win7 Pro) - SMP2
Location: University of Birmingham, UK

Re: Linux SMP v6 compared to Windows SMP client

Post by John Naylor »

You say reasonably written code... all the FAH cores are hand-coded to get the most out of the hardware they are using (well... maybe except the a1 SMP core lol)... so I guess that might negate some of the advantages of a new compiler. And besides, the Pande Group always wants more speed so I would guess they regularly look at their compilers to see if a new one can make the code run more efficiently and therefore faster :P

EDIT: I would also guess that the answer is no, new compilers cannot make the cores faster, for single core clients anyway... check the build dates, most are from 2006 on the older cores :P
Folding whatever I'm sent since March 2006 :) Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.
Post Reply