Page 3 of 10

Posted: Sun Dec 16, 2007 3:48 am
by MoneyGuyBK
I have been following this thread since its inception.

Question for the Gurus:
I am running 2 Linux SMPs in Ubuntu inside VMPlayer. It is all installed on a Vista machine.

I use the task manager in Windows Vista to assign affinities.
My CPU is a QX6800
So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?

I am hoping this Q is not off ot too off subject at hand.

Thanx in advance.

Peace

Posted: Sun Dec 16, 2007 4:27 am
by theMASS
dnamechanic wrote:About running VMware, I have found that WinXP running two VMware instances (Ubuntu) provides a gain in processing speed relative to two WinXP instances (even with your A-Ch). And, in my experience A-Ch has worked very well. Two instances of SMP under virtual Linux are roughly equivalent, or slightly better, compared to two instances of WinXP even with A-Ch. If the Linux instances have affinity set in WinXP for maximum performance then the two Linux instances beats the purely WinXP processing.
In the testing I did I found 2 XP SMP clients with A-Ch resulted in a higher PPD than 2 VMware Lin SMP clients. I was getting

~4300 PPD with XP vs.
~4100 PPD with VMware.

This is more likely a result of the WUs received. 2653's on XP and a mix of 2605, 2608, and 2609 on Linux.

Posted: Sun Dec 16, 2007 11:56 am
by rilian
So.... is Linux+VMware really faster ?? :shock:

Posted: Sun Dec 16, 2007 1:35 pm
by dnamechanic
theMASS wrote: In the testing I did I found 2 XP SMP clients with A-Ch resulted in a higher PPD than 2 VMware Lin SMP clients. I was getting

~4300 PPD with XP vs.
~4100 PPD with VMware.

This is more likely a result of the WUs received. 2653's on XP and a mix of 2605, 2608, and 2609 on Linux.
The fold rates listed above are not an 'Apples to Apples' comparision. I think the last line sums it up. It is well known that fold rates can be quite different for different p-numbered Work Units.

Actually, the fold rates given above look representative, given that WinXP was folding p2653's and VMware Linux was folding a mix of p2605, p2608, and p2609.
rilian wrote: So.... is Linux+VMware really faster ??
I have not tried to ascertain what the mix of received p-numbered work units actually is. I recall reading that some members in this forum thought they received more of a mix of work units when folding with Linux, whereas the computers using WinXP received more p2653 type work units. If this is true, then at the present time, folding with the VMware Linux setup even if it is faster may not yield more PPD.

Posted: Sun Dec 16, 2007 2:11 pm
by dnamechanic
MoneyGuyBK wrote: I am running 2 Linux SMPs in Ubuntu inside VMPlayer. It is all installed on a Vista machine.?
My experience is with WinXP and VMware Server. VMware Server seems to be slightly better for SMP folding application see Wiki: Comparision of Virtual Machines.

http://en.wikipedia.org/wiki/Comparison ... l_machines

Using VMware Player vs. VMware server or other variations of VMware could possibly account for some of performance differences that people are seeing in virtual Linux folding. Also VMware Server allows use of VMtools which helps with timing synchronization on AMD systems and such.

MoneyGuyBK wrote:I use the task manager in Windows Vista to assign affinities...
...So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?
To obtain maximum performance you must somehow determine the best combination of affinity assignments. Pairs of the cores 0,1,2, & 3 can be assigned in different ways. Examples of assignment:

- 0 paired with 1 and 2 paired with 3
- 0 paired with 2 and 1 paired with 3
- 0 paired with 3 and 1 paired with 2

Performance varies with the combinations chosen. I experimentally determine which combinations provide best performance. Just select a combination and run it a while and record PPD , then choose another and repeat until you are satisfied which combination is best. Once determined, then this pairing is fine until the Windows is rebooted. After reboot, the best combination may be different than before.

Posted: Sun Dec 16, 2007 2:53 pm
by toTOW
And don't forget that there's will be some CPU time spent on virtualization operations ... it seems normal to see a little slowdown in virtualized clients ;)

Posted: Sun Dec 16, 2007 3:14 pm
by rilian
theMASS, thank you for these testing.

Maybe it was better without A-Ch, when winSMP gives 10% ppd less than VMware + LinuxSMP ?

Can someone do additional testing ?

If winSMP + A-Ch gives 4300PPD then how many PPD is possible on the smae machine with VMware ?

Posted: Sun Dec 16, 2007 6:43 pm
by theMASS
The numbers I gave were based on the highest sustained numbers observed not averages... and rounded off. The two configurations were very close in terms of PPD.

Currently Windows is giving out almost exclusively 2653's and Linux (when 2 cores are detected) mostly 2605's my numbers are representative of Windows with 2x 2653 and Linux with 2x 2605 although about the same if 1x 2605 and 1x 2608.

They were run on the same machine. Q6600 @ 3.3GHz.

VMware is amazing efficient and yes Server does perform slightly better than Player.

Posted: Sun Dec 16, 2007 7:23 pm
by MoneyGuyBK
On my setup (VMPlayer running 2 instances of Linux_SMP) inside a Vista 32-Bit machine... with this setup:
Dell XPS720H2C, QX6800 running at Bin+2 (3.47Ghz) and 4Gig RAM Dominator OC'ed to 1066Mhz

I get mostly if not all P_2605s (1760_Pointers?)
Running 2 instances, they finish in 18-20 Hours depending on if I am not or I am running other apps.

So, assuming I do nothing but folding, and finishing two 2605s in 18 hours, I get about 4693 Points at the top end (4224 at low end) per day.
Not bad I would say.
On days I do video & photo editing the time goes up to 21.5 Hours (3929 Points at this level)

I have run the machine at Bin+3 (3.73Ghz) before, resulting in even better PPD due to finishing WUs at up to One Hour Faster times.
I don't run it at this level to keep Temps lower!

Although I have had issues at times, like finishing a WU and not getting any points (Resolved)
and recently, finishing a WU, but requiring manual input to upload results, see here:
http://foldingforum.org/viewtopic.php?t=328

I have not used the server edition of VMPlayer, so I could not comment on that.

Peace

Posted: Mon Dec 17, 2007 11:46 pm
by bruce
MoneyGuyBK wrote:I use the task manager in Windows Vista to assign affinities.
My CPU is a QX6800
So, I manually assign 2 cores each to each one of my 2 instances of VMPlayer.

Is this a good way of doing this, will it improve my machine's progress?
Or, based on my setup, is there a better way to improve performance?
I can't imagine that reassigning affinity every 10 minutes with the software tool is going to have any advantage over doing it once by hand.

It will matter which core is pared with which. The difference won't be large, (maybe 10%, which isn't small, either) but there will be a difference because two cores share one cache, two cores share the other cache, and if data needs to move from one cache to the other, it has to move at the next slower data rate.
dnamechanic wrote:- 0 paired with 1 and 2 paired with 3
- 0 paired with 2 and 1 paired with 3
- 0 paired with 3 and 1 paired with 2
It's not clear to me that a software tool has any way of knowing which way to pair the cores. For that reason, if you figure it out, should should be able to do better than the SMP Affinity Changer unless it's incredibly lucky, in which case the results will be equal.

Posted: Tue Dec 18, 2007 12:11 am
by MoneyGuyBK
It will matter which core is pared with which. The difference won't be large, (maybe 10%, which isn't small, either) but there will be a difference because two cores share one cache, two cores share the other cache, and if data needs to move from one cache to the other, it has to move at the next slower data rate.
Bruce, I can now attest to what you said being true.

I notice about 11% faster crunch times with pairing as follows:
Cores 0 & 1... and Cores 2 & 3 vs. Slower if I do 0 & 2 with 1 & 3
I even braved doing it as Cores 0 & 3 and Cores 1 & 2 Resulting 11% slower.

* The best way is to pair the cores that share the same cache together.

Peace

Posted: Tue Dec 18, 2007 1:33 am
by bruce
MoneyGuyBK wrote: * The best way is to pair the cores that share the same cache together.
Peace
My test-bed isn't a Core2Quad, it's a dual-Xeon @3.0MHz. The two chips are physically separate, and hyperthreaded so it acts sort of like a Core2Quad although it's MUCH slower. A pair of virtual CPUs share a single chip and when either one of them needs to talk to the cache in the other chip it has to go through the motherboard. That's where I came up with the 10% guess even though I'm surprised that it's even close to what you're reporting.

Posted: Tue Dec 18, 2007 5:58 am
by MoneyGuyBK
....I'm surprised that it's even close to what you're reporting.
every .... can get lucky one time :wink:

Peace

Re: SMP Affinity Changer

Posted: Wed Dec 19, 2007 10:53 pm
by rilian
New version 1.0.4 released with 64-bit cpu support and it starts automatically after install (so not reboot/service start needed)

Next version is about to be released very soon with single-threaded client support


DOWNLOAD

Re: SMP Affinity Changer

Posted: Thu Dec 20, 2007 6:12 am
by poopinstack
Check my specs http://foldingforum.org/viewtopic.php?f ... a&start=15

That's with affinity 1.0.2

I tried to install 1.0.4, but it said my processor wasn't supported. I tried the regular version, and the 64 bit version - no go with either. It did work with 1.0.3, however.