SMP Affinity Changer

This forum contains information about 3rd party applications which may be of use to those who run the FAH client and one place where you might be able to get help when using one of those apps.

Moderator: Site Moderators

MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Post by MoneyGuyBK »

I have been following this thread since its inception.

Question for the Gurus:
I am running 2 Linux SMPs in Ubuntu inside VMPlayer. It is all installed on a Vista machine.

I use the task manager in Windows Vista to assign affinities.
My CPU is a QX6800
So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?

I am hoping this Q is not off ot too off subject at hand.

Thanx in advance.

Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
theMASS
Posts: 65
Joined: Sun Dec 02, 2007 7:54 am
Hardware configuration: 8 - Q6600s (mixed steppings) average clock speed 3.3GHz
1 - E6420 @ 3.2GHz (This box is bulletproof - Stable for over a year) Avg 2250/PPD
1 - E6600 @ Stock (The only Intel chip in the last 2 years that "only" ran stock)
P35 and G33 based Gigabyte motherboards
Dedicated boxes run Notfred's Linux
Production boxes run Vista and XP Pro with Linux SMP running in a VM
Location: Los Angeles

Post by theMASS »

dnamechanic wrote:About running VMware, I have found that WinXP running two VMware instances (Ubuntu) provides a gain in processing speed relative to two WinXP instances (even with your A-Ch). And, in my experience A-Ch has worked very well. Two instances of SMP under virtual Linux are roughly equivalent, or slightly better, compared to two instances of WinXP even with A-Ch. If the Linux instances have affinity set in WinXP for maximum performance then the two Linux instances beats the purely WinXP processing.
In the testing I did I found 2 XP SMP clients with A-Ch resulted in a higher PPD than 2 VMware Lin SMP clients. I was getting

~4300 PPD with XP vs.
~4100 PPD with VMware.

This is more likely a result of the WUs received. 2653's on XP and a mix of 2605, 2608, and 2609 on Linux.
rilian
Posts: 53
Joined: Wed Dec 05, 2007 12:58 am
Contact:

Post by rilian »

So.... is Linux+VMware really faster ?? :shock:
dnamechanic
Posts: 16
Joined: Tue Dec 04, 2007 10:42 pm
Location: Dallas, Texas

Post by dnamechanic »

theMASS wrote: In the testing I did I found 2 XP SMP clients with A-Ch resulted in a higher PPD than 2 VMware Lin SMP clients. I was getting

~4300 PPD with XP vs.
~4100 PPD with VMware.

This is more likely a result of the WUs received. 2653's on XP and a mix of 2605, 2608, and 2609 on Linux.
The fold rates listed above are not an 'Apples to Apples' comparision. I think the last line sums it up. It is well known that fold rates can be quite different for different p-numbered Work Units.

Actually, the fold rates given above look representative, given that WinXP was folding p2653's and VMware Linux was folding a mix of p2605, p2608, and p2609.
rilian wrote: So.... is Linux+VMware really faster ??
I have not tried to ascertain what the mix of received p-numbered work units actually is. I recall reading that some members in this forum thought they received more of a mix of work units when folding with Linux, whereas the computers using WinXP received more p2653 type work units. If this is true, then at the present time, folding with the VMware Linux setup even if it is faster may not yield more PPD.
dnamechanic
Posts: 16
Joined: Tue Dec 04, 2007 10:42 pm
Location: Dallas, Texas

Post by dnamechanic »

MoneyGuyBK wrote: I am running 2 Linux SMPs in Ubuntu inside VMPlayer. It is all installed on a Vista machine.?
My experience is with WinXP and VMware Server. VMware Server seems to be slightly better for SMP folding application see Wiki: Comparision of Virtual Machines.

http://en.wikipedia.org/wiki/Comparison ... l_machines

Using VMware Player vs. VMware server or other variations of VMware could possibly account for some of performance differences that people are seeing in virtual Linux folding. Also VMware Server allows use of VMtools which helps with timing synchronization on AMD systems and such.

MoneyGuyBK wrote:I use the task manager in Windows Vista to assign affinities...
...So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?
To obtain maximum performance you must somehow determine the best combination of affinity assignments. Pairs of the cores 0,1,2, & 3 can be assigned in different ways. Examples of assignment:

- 0 paired with 1 and 2 paired with 3
- 0 paired with 2 and 1 paired with 3
- 0 paired with 3 and 1 paired with 2

Performance varies with the combinations chosen. I experimentally determine which combinations provide best performance. Just select a combination and run it a while and record PPD , then choose another and repeat until you are satisfied which combination is best. Once determined, then this pairing is fine until the Windows is rebooted. After reboot, the best combination may be different than before.
toTOW
Site Moderator
Posts: 6334
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Post by toTOW »

And don't forget that there's will be some CPU time spent on virtualization operations ... it seems normal to see a little slowdown in virtualized clients ;)
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
rilian
Posts: 53
Joined: Wed Dec 05, 2007 12:58 am
Contact:

Post by rilian »

theMASS, thank you for these testing.

Maybe it was better without A-Ch, when winSMP gives 10% ppd less than VMware + LinuxSMP ?

Can someone do additional testing ?

If winSMP + A-Ch gives 4300PPD then how many PPD is possible on the smae machine with VMware ?
theMASS
Posts: 65
Joined: Sun Dec 02, 2007 7:54 am
Hardware configuration: 8 - Q6600s (mixed steppings) average clock speed 3.3GHz
1 - E6420 @ 3.2GHz (This box is bulletproof - Stable for over a year) Avg 2250/PPD
1 - E6600 @ Stock (The only Intel chip in the last 2 years that "only" ran stock)
P35 and G33 based Gigabyte motherboards
Dedicated boxes run Notfred's Linux
Production boxes run Vista and XP Pro with Linux SMP running in a VM
Location: Los Angeles

Post by theMASS »

The numbers I gave were based on the highest sustained numbers observed not averages... and rounded off. The two configurations were very close in terms of PPD.

Currently Windows is giving out almost exclusively 2653's and Linux (when 2 cores are detected) mostly 2605's my numbers are representative of Windows with 2x 2653 and Linux with 2x 2605 although about the same if 1x 2605 and 1x 2608.

They were run on the same machine. Q6600 @ 3.3GHz.

VMware is amazing efficient and yes Server does perform slightly better than Player.
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Post by MoneyGuyBK »

On my setup (VMPlayer running 2 instances of Linux_SMP) inside a Vista 32-Bit machine... with this setup:
Dell XPS720H2C, QX6800 running at Bin+2 (3.47Ghz) and 4Gig RAM Dominator OC'ed to 1066Mhz

I get mostly if not all P_2605s (1760_Pointers?)
Running 2 instances, they finish in 18-20 Hours depending on if I am not or I am running other apps.

So, assuming I do nothing but folding, and finishing two 2605s in 18 hours, I get about 4693 Points at the top end (4224 at low end) per day.
Not bad I would say.
On days I do video & photo editing the time goes up to 21.5 Hours (3929 Points at this level)

I have run the machine at Bin+3 (3.73Ghz) before, resulting in even better PPD due to finishing WUs at up to One Hour Faster times.
I don't run it at this level to keep Temps lower!

Although I have had issues at times, like finishing a WU and not getting any points (Resolved)
and recently, finishing a WU, but requiring manual input to upload results, see here:
http://foldingforum.org/viewtopic.php?t=328

I have not used the server edition of VMPlayer, so I could not comment on that.

Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Post by bruce »

MoneyGuyBK wrote:I use the task manager in Windows Vista to assign affinities.
My CPU is a QX6800
So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?
I can't imagine that reassigning affinity every 10 minutes with the software tool is going to have any advantage over doing it once by hand.

It will matter which core is pared with which. The difference won't be large, (maybe 10%, which isn't small, either) but there will be a difference because two cores share one cache, two cores share the other cache, and if data needs to move from one cache to the other, it has to move at the next slower data rate.
dnamechanic wrote:- 0 paired with 1 and 2 paired with 3
- 0 paired with 2 and 1 paired with 3
- 0 paired with 3 and 1 paired with 2
It's not clear to me that a software tool has any way of knowing which way to pair the cores. For that reason, if you figure it out, should should be able to do better than the SMP Affinity Changer unless it's incredibly lucky, in which case the results will be equal.
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Post by MoneyGuyBK »

It will matter which core is pared with which. The difference won't be large, (maybe 10%, which isn't small, either) but there will be a difference because two cores share one cache, two cores share the other cache, and if data needs to move from one cache to the other, it has to move at the next slower data rate.
Bruce, I can now attest to what you said being true.

I notice about 11% faster crunch times with pairing as follows:
Cores 0 & 1... and Cores 2 & 3 vs. Slower if I do 0 & 2 with 1 & 3
I even braved doing it as Cores 0 & 3 and Cores 1 & 2 Resulting 11% slower.

* The best way is to pair the cores that share the same cache together.

Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Post by bruce »

MoneyGuyBK wrote: * The best way is to pair the cores that share the same cache together.
Peace
My test-bed isn't a Core2Quad, it's a dual-Xeon @3.0MHz. The two chips are physically separate, and hyperthreaded so it acts sort of like a Core2Quad although it's MUCH slower. A pair of virtual CPUs share a single chip and when either one of them needs to talk to the cache in the other chip it has to go through the motherboard. That's where I came up with the 10% guess even though I'm surprised that it's even close to what you're reporting.
MoneyGuyBK
Posts: 179
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Post by MoneyGuyBK »

....I'm surprised that it's even close to what you're reporting.
every .... can get lucky one time :wink:

Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
rilian
Posts: 53
Joined: Wed Dec 05, 2007 12:58 am
Contact:

Re: SMP Affinity Changer

Post by rilian »

New version 1.0.4 released with 64-bit cpu support and it starts automatically after install (so not reboot/service start needed)

Next version is about to be released very soon with single-threaded client support


DOWNLOAD
poopinstack
Posts: 1
Joined: Thu Dec 20, 2007 5:37 am

Re: SMP Affinity Changer

Post by poopinstack »

Check my specs http://foldingforum.org/viewtopic.php?f ... a&start=15

That's with affinity 1.0.2

I tried to install 1.0.4, but it said my processor wasn't supported. I tried the regular version, and the 64 bit version - no go with either. It did work with 1.0.3, however.
Rig 1
q6600 @ 3.6 GHz on Water
8800GT

Rig 2
Athlon 64 X2 5600+
8800GT
Post Reply