my recommended configuration for >= 8-cores

Moderators: Site Moderators, FAHC Science Team

alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

my recommended configuration for >= 8-cores

Post by alpha754293 »

If you have a system with at least 8-cores, my recommended configuration would either have to be (in order) Ubuntu 8.04 server (text only console), CentOS 5.2, or SLES 10 SP2.

Since you can't tell if you're going to get an a1 WU or an a2 WU, the best thing to do would be to have two clients running, both with "-smp 4" flag.

That way if you get an a1 WU, you'd be running at it's max. And can get two a2 WU, and run them at the same time.

Not entirely sure how well this would work for 4-cores (probably you'd only want to run one client anyways), or if you've a 4-core HTT capable CPU. (Although I've previously mimicked HTT and found out that you only get about 95% or so of your possible PPD value).
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

How about Win OS's on a home PC? I have an i7, so 8 cores, but no server OS. Just Vista x64 and x86 XP SP3.
Also, just the [-smp 4] flag? No other flags necessary?
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

twistedspark wrote:How about Win OS's on a home PC? I have an i7, so 8 cores, but no server OS. Just Vista x64 and x86 XP SP3.
Also, just the [-smp 4] flag? No other flags necessary?
I don't have an official statement with regards to HTT only because I do not have a system to test HTT with.

(I wished I did, but the closest thing I've got is Q9550, and there might be some substantial differences in architecture between that and the Core i7).

However, in my simulated results (I have a server with 8 native cores, and it is NOT HTT capable (AMD)), I have found that you take a 4% performance penalty when I ran with simulated HTT.

However, that may NOT be indicative of real-world HTT results though.

For Windows, because the client is restricted to only 4-cores, my only suggestion for you would be to try it with and without HTT and find the one that will work the best for you.

You can TRY to run two SMP clients, since only the Linux client (desktop or server distributions) are the only ones that a F@H client capable of using all 8 cores. (possibly more, but unverified).
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

O.K. Yeah, I was planning on testing this out running two smp cores with the -smp 4 flag on each. The i7 lets you set core affinity to the folding exe's, so I'd run one instance on all four physical cores, and the other on all four logical cores only. That way any performance differential would not matter.
I just wanted to know if I needed any other flags. I have yet to see a list of all the flags and what they each mean. EG: I've seen -smp -verbosity 9
What the heck does verbosity 9 do? :?
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

twistedspark wrote:O.K. Yeah, I was planning on testing this out running two smp cores with the -smp 4 flag on each. The i7 lets you set core affinity to the folding exe's, so I'd run one instance on all four physical cores, and the other on all four logical cores only. That way any performance differential would not matter.
I just wanted to know if I needed any other flags. I have yet to see a list of all the flags and what they each mean. EG: I've seen -smp -verbosity 9
What the heck does verbosity 9 do? :?
Don't remember.

Can you tell which cores are physical and which cores are logical on the i7? I would think that they'd just show up as 8 cores. If you have a way to tell (explicitly), then that's great.

Remember that HTT doesn't replicate the physical FPUs that does all the work, so it would be interesting to see what happens.

I really don't know. I didn't think that Windows XP supported more than 4 cores (logical or physical). *shrug*
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

Nevermind that last part. I just found the wiki flag explanations.
-verbosity x

Sets the detail level of the output written to screen and to the fahlog.txt. Options are from 1 to 9 (max). The default is 3. Level 9 is helpful for diagnosing problems and helpful when reporting them to the F@h development team. Supported in client versions v3.x, 4.x, 5.x, 6.x.
Last edited by twistedspark on Thu Jan 29, 2009 5:26 pm, edited 1 time in total.
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

alpha754293 wrote:Can you tell which cores are physical and which cores are logical on the i7? I would think that they'd just show up as 8 cores. If you have a way to tell (explicitly), then that's great.
I can't remember how I figured it out (haven't slept in awhile), but I determined that cores 0, 2, 4, and 6 are phsical and 1, 3, 5, and 7 are logical.

I'll post it when I remember how I determined that.
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

Oh yeah. I'm running Core Temp, which monitors and displays each core's temperature in realtime. I also have Task Manager showing each core's usage. When running apps that are NOT locked to any specific core I can watch the temps rise and fall as the usage increases and decreases. When a core's usage maxes out, and no other core is being used, I can see it's temp rise, as well as it's logical or physical associate. The temps of the pair are not always identical, but within 1 or 2 degrees C.

So I can see the Task Manager shows a physical core and it's HTT core side by side on the display. Ergo, my logic says in all probability every other displayed core is the same type, physical or HTT.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

But that's a guess though. I don't know if you'd have any real way of testing that though.

BTW...you can set the CPU affinity in any multiprocessor/multi-core system. It's not just limited to the i7.

The only thing for you to do is try and see what you get. I don't think that it would be better, but you never know. I may be surprised. *shrug*
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

Apparently I was wrong. My Q9550 DOESN'T have HTT (I did not know that). Oops. My bad. So yea...I don't have a way to testing HTT at all then.
twistedspark
Posts: 7
Joined: Thu Jan 08, 2009 4:02 am

Re: my recommended configuration for >= 8-cores

Post by twistedspark »

alpha754293 wrote:BTW...you can set the CPU affinity in any multiprocessor/multi-core system. It's not just limited to the i7.
I have no idea about any other system. Before the i7 my last cpu was an Athlon 2500. :lol:

I realize I'm making assumptions, but not one without any merit. Plus I'm not risking anything if it turns out false. Wasted WU or two? Whatever. More's the fun in experimenting...I also plan on see if I can fold with an Nvidial gpu AND an ATI gpu at the same time since SLI and Crossfire need to be disabled to fold anyway. :shock:
If it works, I'll try folding with two 9800x2's and one HD 4870x2....and two psu's.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

I've been using mulit-processor systems since like...2004ish I think. Maybe slightly earlier than that. I don't really remember.

Well...I don't really know how it's going to work either with the whole hyperthreading thing because reason would have it that since HTT doesn't replicate the floating point units (the parts of the CPU that does all the actual (computation) work), even if you only had 4 threads running, and you force it to stay on the logical processors, because the FPUs won't be occupied, you should see almost no difference (maybe +/- a few %) when you run 4 threads on the physical cores vs. 4 threads on the logical ones.

If you run 8 threads (via 2 SMP clients), then it's still going to show up as being 100% utilization, but I would expect that you should take about a 50% or so hit in computational efficiency. (i.e. in the same amount of time, each client only gets half the work done).

Whereas when you have a native 8-core processor like my server, because each core have its own FPUs, therefore; if you run 2 SMP clients (like the windows one), you'd be able to do double the work.

That would be my expectation.

I've actually done a bunch of benchmarking in Linux, so if you're interested in testing it out for me (us), I can tell you exactly how I did it so that you'd be able to run it and then we can compare notes afterwards.

I didn't really do much benchmarking in Windows because there wasn't much to benchmark on the server.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: my recommended configuration for >= 8-cores

Post by 7im »

Here is one of several posts that Vijay Pande has made about Intel's HT over the years...

http://foldingforum.org/viewtopic.php?p=71929#p71929
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
alpha754293
Posts: 383
Joined: Sun Jan 18, 2009 1:13 am

Re: my recommended configuration for >= 8-cores

Post by alpha754293 »

I agree with Dr. Pande that new testing is required for the Core i7's implementation of HTT. (AFAIK, Intel doesn't have HT. They use QPI.)

Anybody who'd be willing to "donate" their system temporarily to run essentially the same benchmarks that I ran -- PM me and we'll talk, so that we'd be able to run it in the same methodology, with the same WU (which I've got saved up), and then we can compare notes at the end.

From what I've read on the technical publications on HTT, reason would have it that you'd still face a penalty, but I don't know. It might be better in the sense that it can parallelize at the instruction level, and as a fix for a faulty pipeline by design. (Such that HTT gives the CPU more options for OOE reordering; with the presumption that the OOE unit to PreEx will straight it out and fix the instruction pipeline.)

On another note though, he's yet to chime in on native 8-core (or greater) systems.

As I mentioned to Bruce, when I am running the Linux client, even with a "-smp 8" flag, I'm not always guaranteed a WU that uses the a2 core which supports at least upto 8 FahCore processes.

As a result, it is very possible (as I just actually ran through this out of the benchmarking phase, and into the production WU phase) where I got a Project: 5102 WU that can only use the a1 core which can only spawn 4 processes; which also meant that only HALF of my native 8-core system is being put to use.

So while running two "-smp 4" clients would slow down those WUs that uses the a2 cores, it ensures that the system will be at 100% utilization all the time. Additionally, the extra "delay" is still within the deadline limits, therefore; I do not see it as being a time issue so long as those deadlines are met.

If they want to shorten the deadline for all SMP 8 WUs, they can, with the provision that there's a dedicated server (or method) to ensure that all native 8-core systems pick up ONLY WUs that utilizes the a2 core; otherwise, you're wasting half of the computational power available.

And in my recent experience, I managed to crunch through 3 p2669 WUs, WHILE doing a p5102 WU.

This will become increasely more important as I am planning to move to at least 16-cores native (4x4 config) within the next year. If I can be guaranteed to be working on ONLY 8-core (or 16-core) WUs, then I'll definitely set up my client/system to do so. Otherwise, it's going to be four 4-core WUs regardless of whether they're a1 or a2s.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: my recommended configuration for >= 8-cores

Post by 7im »

I think you missed the main idea of his post. Their recommendation is the run one fahcore per physical cpu core. Intel's HT's are not physical cores, they are virtual.

So if you need to run 2 clients with -smp 4 to use up 8 cores, then that's the way to do it consistently.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply