Page 2 of 6
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 10:16 am
by HaloJones
A single-core fast processor will be faster than a multi-processor. 1x4 is better than 2x2 unless the "process" is genuinely multi-threaded without intra-thread dependencies. With a truly multi-threaded independent process, you would still need dedicated cache per core and this is missing from all but the very latest processors. I run a lot of Sun Niagara servers which can run 32 threads across 8 cpus but this is only efficient because I'm running massively multi-threaded (80 threads each, two per system) weblogic instances. There's little to no intra-thread dependency.
Multi-threaded Folding has been demonstrated to be more efficient when using the SMP Process Affinity service to bind threads to cores. This works not only on quads but also duals.
I really don't hink you two (MtM and shatteredsilicon) are actually discussing the same things.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 10:26 am
by MtM
HaloJones wrote:I could run a single Windows SMP client on my quad and get around 2200ppd. (Let's pretend that the ppd reflects the benefit to Stanford.)
I actually run two Linux VMs each limited to 2-cpus. Each Linux VM runs two Linux SMP clients. This gets 4400ppd.
Now some people get all hot under the collar and complain that I'm delaying the return of the results but if they're making the deadline that can't be a problem for Stanford - after all, they set the deadline, not me. Secondly, it must also be of benefit to Stanford because more work gets done.
And finally, doesn't it prove that running a single SMP client on a quad is a massive waste of potential resource?
Let's pretend for this one post, ok but just because you ask not because it's true.
I to moved from single windows smp to dual vm's ( and trust me you're not running 4 smp clients on a quad core, that would be 16 threads, I think you're running 2 as well just as me ) and the return times per client are almost the same as with the single windows client. a2 core for linux is just much more efficient.
If you compare windows smp with linux smp, you could already say windows donors are waisting resources because the scheduler is so much worse at the task at hand the the linux variant, so you are right with your last line imo, weren't if for the importance of return times. Now I think it's safe to say return times is more important because they need their wu back for giving out the new one, and this means the quicker you do one wu the better. But if you can do two wu's in just a slight increase in time, then you're getting to a point where I can not answer. Infact I asked this to one off the scientist involved recently, but have yet to get a reply directly to this scenario.
@ HaloJones, you're right 100% about 1x4 being better in your example but shatterdsillicon isn't using that example, we're talking strictly about folding here afaik
Afaik the intra thread dependancy is because of the complexity of the work units. It's not just atoms in a work unit it's ( kinetic? ) forces as well. I think SMP run's multiple threads with intra dependancy to allow these work units to run within reasonable time frame's, each thread having it's singular purpose in the total work unit. Which is also why 4x1 > 1x4.
Cpu speed isnt everything, if you can do 4 times the instruction in the same time frame it will not neceserally indicate you can do 4x the instructions when they are dependant on oneanother. This is what he is failing to see.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 10:32 am
by HaloJones
If they're dependent on each other then surely there must be a sequential aspect that prevents them being efficient on a 4x1 instead of a 1x4.
Oh, and "trust me, you're not running 4 smp clients on a quad core"? Do you have any idea how patronising that sounds? I'm a Technical Architect at a massive UK retailer with 25 years in IT. I know precisely what I'm doing, thank you very much! I was running two smp units on each vm so yes, that was 16 fahcore_a? processes on a quad. Why would you think that isn't possible; it's very possible. Must I provide pictorial proof?
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 10:40 am
by MtM
HaloJones wrote:If they're dependent on each other then surely there must be a sequential aspect that prevents them being efficient on a 4x1 instead of a 1x4.
Oh, and "trust me, you're not running 4 smp clients on a quad core"? Do you have any idea how patronising that sounds? I'm a Technical Architect at a massive UK retailer with 25 years in IT. I know precisely what I'm doing, thank you very much! I was running two smp units on each vm so yes, that was 16 fahcore_a? processes on a quad. Why would you think that isn't possible; it's very possible. Must I provide pictorial proof?
Sorry for sounding patronizing, that was not the intend. But, imho, that's very inefficient. I get over 5K ppd with dual vm's on a q6600 and each vm runs one smp instance. I wouldn't even know how to run 2 smp clients per vm, I trust you that is possible, but I just run notfreds for it's ease of use and simplicity. I still think it's abit off for you to push a smp client on a single core, that's not what you're supposed to do and it's not efficient.
Btw your first line is exactly what I'm saying I think is the case here.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 12:18 pm
by HaloJones
Hmm, if you're running notFred's distro you don't have a gui but try this instead. You presumably know the IPs of your VMs, so use something like putty to ssh onto the notFred VM and once logged in, try running "top". What you'll see is that the VM doesn't show 100% usage on the cpu. The four cores running on two actual cpus should fully utilise the two but it doesn't. Even the a2 cores don't. That delta is why it's more efficient to run two clients on each VM. With eight processes running in each VM, you'll see full utilisations of each cpu within the VM.
So, I've just fired up a VM. With a single instance of an a1 unit, the VM shows an average idle time of around 15%. That's four processes on two cpus yet it doesn't fully load them. That holds true with two VMs on a quad. You'll be wasting around 15% of the capability of the quad. You should really run three of NotFred's distros and see what you gain. Even with the disadvantage of increased context switching you'll still benefit in terms of total units crunched. Each one will take longer and only you can say if they still fit inside the deadlines. With my quad at 3.2GHz and running 24/7 the deadlines are not an issue.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 12:24 pm
by WangFeiHong
@Mtm, shattersilicon,
relax it's only a question on CPU infrastructure and software (SMP).....besides 138 isn't that high
i wish they could have separate stats for SMP and classic... then we can compare FLOPS (tho SMP would probably own it coz everyone's using duo, tris, quads and all at high clocks :O)
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 1:14 pm
by MtM
HaloJones wrote:Hmm, if you're running notFred's distro you don't have a gui but try this instead. You presumably know the IPs of your VMs, so use something like putty to ssh onto the notFred VM and once logged in, try running "top". What you'll see is that the VM doesn't show 100% usage on the cpu. The four cores running on two actual cpus should fully utilise the two but it doesn't. Even the a2 cores don't. That delta is why it's more efficient to run two clients on each VM. With eight processes running in each VM, you'll see full utilisations of each cpu within the VM.
So, I've just fired up a VM. With a single instance of an a1 unit, the VM shows an average idle time of around 15%. That's four processes on two cpus yet it doesn't fully load them. That holds true with two VMs on a quad. You'll be wasting around 15% of the capability of the quad. You should really run three of NotFred's distros and see what you gain. Even with the disadvantage of increased context switching you'll still benefit in terms of total units crunched. Each one will take longer and only you can say if they still fit inside the deadlines. With my quad at 3.2GHz and running 24/7 the deadlines are not an issue.
I know they don't utilise the full processing potential, I said that in a previous post, but you're missing my point I'm afraid. 100% core usage is not the same as 100% efficiency in regard to the project, the project is more happy with the lower cpu utilization in combination with the faster returns for each indidual work unit, afaik. Offcourse throughput of many wu's is important, but judging from the quote I posted the 'new approach' is very much serial in nature and benefits more from speed then sheer volume of work units. I might interpret it wrong, which is why I asked the person to reply to an extended 'fictional case' I and a team mate thought of which would give a more definitive answer to which way is more beneficial to the project. I think my point of view is correct, but my team mate took certain parts of the quote and took that as the whole story something I do not agree with but also something I can not disagree with based on that particular part of the quote leaving room for either argument.
WangFeiHong wrote:@Mtm, shattersilicon,
relax it's only a question on CPU infrastructure and software (SMP).....besides 138 isn't that high
i wish they could have separate stats for SMP and classic... then we can compare FLOPS (tho SMP would probably own it coz everyone's using duo, tris, quads and all at high clocks :O)
That would still not give an accurate picture for the science involved as there is more to it then FLOPS alone. Complexity is key, longer simulation times in a single wu is key, and both do not only depent on FLOPS but also architecture. Shatterdsillicon with his technical engineering background should know that, which is why he does get to me when he takes a stance opposite of what I believe he should know
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 1:28 pm
by Ren02
shatteredsilicon wrote:Specifically, on the setup that I'm using, I have 6x GPU clients and a quad-core CPU. Each GPU client consumes around 20% of a single core (it's actually a bit less, but 20% is 1/5, a nice round number for the sake of the explanation. Here's an ASCII-art example of what happens when all run at the same time, with a
single SMP client.
First line is the CPU core number. G is GPU client's CPU usage, I is idle CPU time, S is SMP client's CPU time.
Code: Select all
0 1 2 3
G G G G
G I G I
S S S S
S S S S
S S S S
Essentially, it means that there's 40% of one core not being used, due to the fact that the MPI scheduler not being quite up to the task of balancing the workload distributed to the cores. What happens then is that the OS process scheduler notices that the SMP cores want to use more CPU, so it tries to reschedule things to optimize the CPU utilization, and starts throwing processes around from core to core trying to do a better job. This starts introducing CPU migration latencies (typically 100-150ns) all over the place, and the performance drops through the floor. If you had a single core (or bound a process set, such as a single instance of an SMP client to a single core) this imbalance and process migration wouldn't happen, thus yielding a massive saving in wasted CPU time, which will increase the WU throughput. As I said, I have observed a difference of 2x under real-world conditions.
Even theoretically, 1x4GHz will come out at worst equal and on average significantly ahead of a 2x2GHz solution. Trust me - I'm a computer scientist.
Hmmm... But why run 4 SMP clients? You could have 1 SMP client use 60% of 2 cores and second one 80% of 2 cores:
Code: Select all
00 01 02 03
G1 G3 G4 G6
G2 S2 G5 S2
S1 S2 S1 S2
S1 S2 S1 S2
S1 S2 S1 S2
Or does 4 clients give significantly higher PPD than 2?
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 1:38 pm
by WangFeiHong
1. He is using 1 SMP client read the table carefully the "S' represents usage of the core.
2. There is no throttling in SMP clients (viewtopic.php?f=46&t=6888&start=0&st=0&sk=t&sd=a&hilit=throttle) so you can't specify % for each core/client
3. I think it was mentioned here that cores need to play catch-up with each other in SMP, which is especially obvious when at the end of a WU, when you can see some of the 4 threads still running while the others have finished and are idling.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 1:52 pm
by Ren02
WangFeiHong wrote:1. He is using 1 SMP client read the table carefully the "S' represents usage of the core.
Yes I know, but he also said that he runs 4 SMP clients to compensate for this loss. I'm really interested if it yields higher PPD than 2 clients or not. Especially with A2 core.
2. There is no throttling in SMP clients (viewtopic.php?f=46&t=6888&start=0&st=0&sk=t&sd=a&hilit=throttle) so you can't specify % for each core/client
You can specify core affinity and use process priority to achieve this. If GPU clients have higher priority then SMP client will use whatever is left over on the cores that it's been locked to.
3. I think it was mentioned here that cores need to play catch-up with each other in SMP, which is especially obvious when at the end of a WU, when you can see some of the 4 threads still running while the others have finished and are idling.
That's just packing the results + clean-up of temporary files.
The catch-up game is necessary in the middle of a WU.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 6:59 pm
by Sahkuhnder
HaloJones wrote:Now some people get all hot under the collar and complain that I'm delaying the return of the results but if they're making the deadline that can't be a problem for Stanford - after all, they set the deadline, not me. Secondly, it must also be of benefit to Stanford because more work gets done.
"Some people" are the scientific researchers at The Pande Group. Remember them? The really smart ones with all the letters after their names? The ones we are all trying to help?
By running two SMP clients you
are delaying the return of the results. Period.
The deadlines have little to do with the scientific value, which prefer the results
absolutely as soon as possible. The deadlines allow extra time for the return of the results for practical considerations like power, network and server outages.
The Pande Group has made their position clear. Sorry, but they know better than you do. There is no possible way to rationalize your choice to not follow their policy. I don't understand when people choose to ignore the policy of the actual researchers and then try to justify their actions with lame excuses of how they know more than those very researches about the optimal way the clients are best run.
Assisting the scientific researchers is the goal. Your point score is just for fun.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 7:56 pm
by 7im
When running a race, the goal is to finish first. NOT to finish just before they close the race course.
Finishing a work unit just before the deadline is not sufficient justification for folding in that manner. It is not as scientifically helpful. It may score you more PPD, and that is currently PERMITTED under the current folding rules, but don't delude yourself in to thinking that you are helping the project more by doing it that way. It's simply producing more PPD, and helping the project less than optimally. But since all donations are welcomed, we really shouldn't pass judgement on how you choose to donate, so long as you don't claim the higher ground, which you don't have when running multiple clients on a quad.
Pande Group gives recommendations for a reason. Folding is relay race, with each WU being returned the quickest is the most helpful to the project. Follow the recommendations or don't, but higher PPD is NOT exactly equal to helping the project more. Fold on.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Sun Nov 16, 2008 9:35 pm
by shatteredsilicon
MtM wrote:shatteredsilicon wrote:Specifically, on the setup that I'm using, I have 6x GPU clients and a quad-core CPU. Each GPU client consumes around 20% of a single core (it's actually a bit less, but 20% is 1/5, a nice round number for the sake of the explanation. Here's an ASCII-art example of what happens when all run at the same time, with a
single SMP client.
First line is the CPU core number. G is GPU client's CPU usage, I is idle CPU time, S is SMP client's CPU time.
Code: Select all
0 1 2 3
G G G G
G I G I
S S S S
S S S S
S S S S
Essentially, it means that there's 40% of one core not being used, due to the fact that the MPI scheduler not being quite up to the task of balancing the workload distributed to the cores. What happens then is that the OS process scheduler notices that the SMP cores want to use more CPU, so it tries to reschedule things to optimize the CPU utilization, and starts throwing processes around from core to core trying to do a better job. This starts introducing CPU migration latencies (typically 100-150ns) all over the place, and the performance drops through the floor. If you had a single core (or bound a process set, such as a single instance of an SMP client to a single core) this imbalance and process migration wouldn't happen, thus yielding a massive saving in wasted CPU time, which will increase the WU throughput. As I said, I have observed a difference of 2x under real-world conditions.
Even theoretically, 1x4GHz will come out at worst equal and on average significantly ahead of a 2x2GHz solution. Trust me - I'm a computer scientist.
You are a scientist? Why then have you got problems with rebuting the above quote, which you try to do with examples and arguments which aren't specific to the question/situation at hand? Trust me, I am not a scientist but I am eligible for Mensa and I can see you're just not hitting the mark with your 'observations'. ( sorry people who read this, but he said he's a 'scientist' lol )
OK - explain how it isn't specific to the question/situation at hand. The subject line says: "Dual-core 2Ghz vs Single-core 4Ghz - Which faster?". You really should learn to read, since you don't seem to have parsed the subject line, let alone the contents of the post. I have answered this question with theory background and practical examples, both specifically relating to the folding SMP client and the general concept of breaking up a process into parallelizable tasks. Which part of that did you miss?
MtM wrote:shatteredsilicon wrote:I promise you, I'm not picking on you specifically. I apologize if it looks that way. Perhaps I shouldn't have inserted a snippet from your post, and stuck purely to answering the original question asked. I'll try to not repeat that mistake.
It's a mistake if you think you can talk all this nonsence and think I will not respond to correct your pile of BS
You're not answering any original question at all you're only repeating this discussion in a thread I posted in before you because you're hoping I will get abit red hot or something, that's your motive nothing else.
Not really. I've given up on educating you.
MtM wrote:Singl;e core is not the same, first of all which current cpu has a single core variant? Oww... ok then.
Since you asked - among the current line x86 CPUs:
Core2 Solo and
Athlon 64. Sill in production, still available.
Oh, and many thanks Halo - you seem to have successfully gotten through the points I've been (unsuccessfully, I was almost about to conclude futilely) trying to make.
Fold on.
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Mon Nov 17, 2008 9:11 am
by MtM
shatteredsilicon wrote:MtM wrote:shatteredsilicon wrote:Specifically, on the setup that I'm using, I have 6x GPU clients and a quad-core CPU. Each GPU client consumes around 20% of a single core (it's actually a bit less, but 20% is 1/5, a nice round number for the sake of the explanation. Here's an ASCII-art example of what happens when all run at the same time, with a
single SMP client.
First line is the CPU core number. G is GPU client's CPU usage, I is idle CPU time, S is SMP client's CPU time.
Code: Select all
0 1 2 3
G G G G
G I G I
S S S S
S S S S
S S S S
Essentially, it means that there's 40% of one core not being used, due to the fact that the MPI scheduler not being quite up to the task of balancing the workload distributed to the cores. What happens then is that the OS process scheduler notices that the SMP cores want to use more CPU, so it tries to reschedule things to optimize the CPU utilization, and starts throwing processes around from core to core trying to do a better job. This starts introducing CPU migration latencies (typically 100-150ns) all over the place, and the performance drops through the floor. If you had a single core (or bound a process set, such as a single instance of an SMP client to a single core) this imbalance and process migration wouldn't happen, thus yielding a massive saving in wasted CPU time, which will increase the WU throughput. As I said, I have observed a difference of 2x under real-world conditions.
Even theoretically, 1x4GHz will come out at worst equal and on average significantly ahead of a 2x2GHz solution. Trust me - I'm a computer scientist.
You are a scientist? Why then have you got problems with rebuting the above quote, which you try to do with examples and arguments which aren't specific to the question/situation at hand? Trust me, I am not a scientist but I am eligible for Mensa and I can see you're just not hitting the mark with your 'observations'. ( sorry people who read this, but he said he's a 'scientist' lol )
OK - explain how it isn't specific to the question/situation at hand. The subject line says: "Dual-core 2Ghz vs Single-core 4Ghz - Which faster?". You really should learn to read, since you don't seem to have parsed the subject line, let alone the contents of the post. I have answered this question with theory background and practical examples, both specifically relating to the folding SMP client and the general concept of breaking up a process into parallelizable tasks. Which part of that did you miss?
MtM wrote:shatteredsilicon wrote:I promise you, I'm not picking on you specifically. I apologize if it looks that way. Perhaps I shouldn't have inserted a snippet from your post, and stuck purely to answering the original question asked. I'll try to not repeat that mistake.
It's a mistake if you think you can talk all this nonsence and think I will not respond to correct your pile of BS
You're not answering any original question at all you're only repeating this discussion in a thread I posted in before you because you're hoping I will get abit red hot or something, that's your motive nothing else.
Not really. I've given up on educating you.
MtM wrote:Singl;e core is not the same, first of all which current cpu has a single core variant? Oww... ok then.
Since you asked - among the current line x86 CPUs:
Core2 Solo and
Athlon 64. Sill in production, still available.
Oh, and many thanks Halo - you seem to have successfully gotten through the points I've been (unsuccessfully, I was almost about to conclude futilely) trying to make.
Fold on.
Please
I got nothing more to say then that, I explained why you're way off the mark here, and you still want to come accros as trying to educate me...
Welcome to the ignore bin, I know it's lonely in there since you're the only one in it. Be proud
Re: Dual-core 2Ghz vs Single-core 4Ghz - Which faster?
Posted: Mon Nov 17, 2008 10:17 am
by osgorth
7im wrote:When running a race, the goal is to finish first. NOT to finish just before they close the race course.
...
Folding is relay race, with each WU being returned the quickest is the most helpful to the project. Follow the recommendations or don't, but higher PPD is NOT exactly equal to helping the project more. Fold on.
So, in essence what you're saying is that I should throw out my 16 GPUs and 6 SMPs and replace them with the fastest available single card and the fastest quad core CPU, just so I can return my two WUs as quickly as possible?
I highly doubt you mean that, but that's how it comes across.
Obviously, the more WUs returned, the better. Given that a quad-core CPU can process 2 SMP WUs in almost the same time as 1 WU, I see absolutely no reason whatsoever why this should be frowned upon.