Page 1 of 4
my recommended configuration for >= 8-cores
Posted: Tue Jan 27, 2009 5:37 am
by alpha754293
If you have a system with at least 8-cores, my recommended configuration would either have to be (in order) Ubuntu 8.04 server (text only console), CentOS 5.2, or SLES 10 SP2.
Since you can't tell if you're going to get an a1 WU or an a2 WU, the best thing to do would be to have two clients running, both with "-smp 4" flag.
That way if you get an a1 WU, you'd be running at it's max. And can get two a2 WU, and run them at the same time.
Not entirely sure how well this would work for 4-cores (probably you'd only want to run one client anyways), or if you've a 4-core HTT capable CPU. (Although I've previously mimicked HTT and found out that you only get about 95% or so of your possible PPD value).
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 4:36 pm
by twistedspark
How about Win OS's on a home PC? I have an i7, so 8 cores, but no server OS. Just Vista x64 and x86 XP SP3.
Also, just the [-smp 4] flag? No other flags necessary?
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 4:42 pm
by alpha754293
twistedspark wrote:How about Win OS's on a home PC? I have an i7, so 8 cores, but no server OS. Just Vista x64 and x86 XP SP3.
Also, just the [-smp 4] flag? No other flags necessary?
I don't have an official statement with regards to HTT only because I do not have a system to test HTT with.
(I wished I did, but the closest thing I've got is Q9550, and there might be some substantial differences in architecture between that and the Core i7).
However, in my simulated results (I have a server with 8 native cores, and it is NOT HTT capable (AMD)), I have found that you take a 4% performance penalty when I ran with simulated HTT.
However, that may NOT be indicative of real-world HTT results though.
For Windows, because the client is restricted to only 4-cores, my only suggestion for you would be to try it with and without HTT and find the one that will work the best for you.
You can TRY to run two SMP clients, since only the Linux client (desktop or server distributions) are the only ones that a F@H client capable of using all 8 cores. (possibly more, but unverified).
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 4:52 pm
by twistedspark
O.K. Yeah, I was planning on testing this out running two smp cores with the -smp 4 flag on each. The i7 lets you set core affinity to the folding exe's, so I'd run one instance on all four physical cores, and the other on all four logical cores only. That way any performance differential would not matter.
I just wanted to know if I needed any other flags. I have yet to see a list of all the flags and what they each mean. EG: I've seen -smp -verbosity 9
What the heck does verbosity 9 do?
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 4:56 pm
by alpha754293
twistedspark wrote:O.K. Yeah, I was planning on testing this out running two smp cores with the -smp 4 flag on each. The i7 lets you set core affinity to the folding exe's, so I'd run one instance on all four physical cores, and the other on all four logical cores only. That way any performance differential would not matter.
I just wanted to know if I needed any other flags. I have yet to see a list of all the flags and what they each mean. EG: I've seen -smp -verbosity 9
What the heck does verbosity 9 do?
Don't remember.
Can you tell which cores are physical and which cores are logical on the i7? I would think that they'd just show up as 8 cores. If you have a way to tell (explicitly), then that's great.
Remember that HTT doesn't replicate the physical FPUs that does all the work, so it would be interesting to see what happens.
I really don't know. I didn't think that Windows XP supported more than 4 cores (logical or physical). *shrug*
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 4:58 pm
by twistedspark
Nevermind that last part. I just found the
wiki flag explanations.
-verbosity x
Sets the detail level of the output written to screen and to the fahlog.txt. Options are from 1 to 9 (max). The default is 3. Level 9 is helpful for diagnosing problems and helpful when reporting them to the F@h development team. Supported in client versions v3.x, 4.x, 5.x, 6.x.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 5:00 pm
by twistedspark
alpha754293 wrote:Can you tell which cores are physical and which cores are logical on the i7? I would think that they'd just show up as 8 cores. If you have a way to tell (explicitly), then that's great.
I can't remember how I figured it out (haven't slept in awhile), but I determined that cores 0, 2, 4, and 6 are phsical and 1, 3, 5, and 7 are logical.
I'll post it when I remember how I determined that.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 5:12 pm
by twistedspark
Oh yeah. I'm running Core Temp, which monitors and displays each core's temperature in realtime. I also have Task Manager showing each core's usage. When running apps that are NOT locked to any specific core I can watch the temps rise and fall as the usage increases and decreases. When a core's usage maxes out, and no other core is being used, I can see it's temp rise, as well as it's logical or physical associate. The temps of the pair are not always identical, but within 1 or 2 degrees C.
So I can see the Task Manager shows a physical core and it's HTT core side by side on the display. Ergo, my logic says in all probability every other displayed core is the same type, physical or HTT.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 5:14 pm
by alpha754293
But that's a guess though. I don't know if you'd have any real way of testing that though.
BTW...you can set the CPU affinity in any multiprocessor/multi-core system. It's not just limited to the i7.
The only thing for you to do is try and see what you get. I don't think that it would be better, but you never know. I may be surprised. *shrug*
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 5:21 pm
by alpha754293
Apparently I was wrong. My Q9550 DOESN'T have HTT (I did not know that). Oops. My bad. So yea...I don't have a way to testing HTT at all then.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 5:23 pm
by twistedspark
alpha754293 wrote:BTW...you can set the CPU affinity in any multiprocessor/multi-core system. It's not just limited to the i7.
I have no idea about any other system. Before the i7 my last cpu was an Athlon 2500.
I realize I'm making assumptions, but not one without any merit. Plus I'm not risking anything if it turns out false. Wasted WU or two? Whatever. More's the fun in experimenting...I also plan on see if I can fold with an Nvidial gpu AND an ATI gpu at the same time since SLI and Crossfire need to be disabled to fold anyway.
If it works, I'll try folding with two 9800x2's and one HD 4870x2....and two psu's.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 6:19 pm
by alpha754293
I've been using mulit-processor systems since like...2004ish I think. Maybe slightly earlier than that. I don't really remember.
Well...I don't really know how it's going to work either with the whole hyperthreading thing because reason would have it that since HTT doesn't replicate the floating point units (the parts of the CPU that does all the actual (computation) work), even if you only had 4 threads running, and you force it to stay on the logical processors, because the FPUs won't be occupied, you should see almost no difference (maybe +/- a few %) when you run 4 threads on the physical cores vs. 4 threads on the logical ones.
If you run 8 threads (via 2 SMP clients), then it's still going to show up as being 100% utilization, but I would expect that you should take about a 50% or so hit in computational efficiency. (i.e. in the same amount of time, each client only gets half the work done).
Whereas when you have a native 8-core processor like my server, because each core have its own FPUs, therefore; if you run 2 SMP clients (like the windows one), you'd be able to do double the work.
That would be my expectation.
I've actually done a bunch of benchmarking in Linux, so if you're interested in testing it out for me (us), I can tell you exactly how I did it so that you'd be able to run it and then we can compare notes afterwards.
I didn't really do much benchmarking in Windows because there wasn't much to benchmark on the server.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 7:42 pm
by 7im
Here is one of several posts that Vijay Pande has made about Intel's HT over the years...
http://foldingforum.org/viewtopic.php?p=71929#p71929
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 8:02 pm
by alpha754293
I agree with Dr. Pande that new testing is required for the Core i7's implementation of HTT. (AFAIK, Intel doesn't have HT. They use QPI.)
Anybody who'd be willing to "donate" their system temporarily to run essentially the same benchmarks that I ran -- PM me and we'll talk, so that we'd be able to run it in the same methodology, with the same WU (which I've got saved up), and then we can compare notes at the end.
From what I've read on the technical publications on HTT, reason would have it that you'd still face a penalty, but I don't know. It might be better in the sense that it can parallelize at the instruction level, and as a fix for a faulty pipeline by design. (Such that HTT gives the CPU more options for OOE reordering; with the presumption that the OOE unit to PreEx will straight it out and fix the instruction pipeline.)
On another note though, he's yet to chime in on native 8-core (or greater) systems.
As I mentioned to Bruce, when I am running the Linux client, even with a "-smp 8" flag, I'm not always guaranteed a WU that uses the a2 core which supports at least upto 8 FahCore processes.
As a result, it is very possible (as I just actually ran through this out of the benchmarking phase, and into the production WU phase) where I got a Project: 5102 WU that can only use the a1 core which can only spawn 4 processes; which also meant that only HALF of my native 8-core system is being put to use.
So while running two "-smp 4" clients would slow down those WUs that uses the a2 cores, it ensures that the system will be at 100% utilization all the time. Additionally, the extra "delay" is still within the deadline limits, therefore; I do not see it as being a time issue so long as those deadlines are met.
If they want to shorten the deadline for all SMP 8 WUs, they can, with the provision that there's a dedicated server (or method) to ensure that all native 8-core systems pick up ONLY WUs that utilizes the a2 core; otherwise, you're wasting half of the computational power available.
And in my recent experience, I managed to crunch through 3 p2669 WUs, WHILE doing a p5102 WU.
This will become increasely more important as I am planning to move to at least 16-cores native (4x4 config) within the next year. If I can be guaranteed to be working on ONLY 8-core (or 16-core) WUs, then I'll definitely set up my client/system to do so. Otherwise, it's going to be four 4-core WUs regardless of whether they're a1 or a2s.
Re: my recommended configuration for >= 8-cores
Posted: Thu Jan 29, 2009 8:27 pm
by 7im
I think you missed the main idea of his post. Their recommendation is the run one fahcore per physical cpu core. Intel's HT's are not physical cores, they are virtual.
So if you need to run 2 clients with -smp 4 to use up 8 cores, then that's the way to do it consistently.