Does the A3 core scale past 8 cores/threads
Moderators: Site Moderators, FAHC Science Team
Does the A3 core scale past 8 cores/threads
As title says,
Does the A3 core scale past 8cores/threads? The recent big-adv shortage has lead to
a fair number of 990x users running -SMP. Something that has come up is that is
seems that the PPD from 8cores to 12cores/threads results in barely an increase +5%
in PPD?
I understand A3 is an ancient core that was developed way before 6 core and above
was manufactured, but can someone/anyone confirm that this (non)scaling is expected?
It's been a bit frustrating for our members with these high-end machines not to be able to
churn & burn through the A3 -smp units because they don't scale up.
cheers
Does the A3 core scale past 8cores/threads? The recent big-adv shortage has lead to
a fair number of 990x users running -SMP. Something that has come up is that is
seems that the PPD from 8cores to 12cores/threads results in barely an increase +5%
in PPD?
I understand A3 is an ancient core that was developed way before 6 core and above
was manufactured, but can someone/anyone confirm that this (non)scaling is expected?
It's been a bit frustrating for our members with these high-end machines not to be able to
churn & burn through the A3 -smp units because they don't scale up.
cheers
Re: Does the A3 core scale past 8 cores/threads
The question has more than one answer depending on what you mean my "scale"
Any SMP software scales with the number of processors within certain constraints. If you have twice as many processors, the actual processing speed can never exceed twice the speed and it will generally be less than that due to a number of factors. There's always some delay required to synchronize the various threads but in most FAH cases this is quite small. An obviously exaggerated case would be a protein with 12 atoms and 12 CPUs. The overhead of distributing one atom to each processor and collecting the results to complete the step would take A LOT more time than processing a single atom.
The points system does not scale linearly, so if you're evaluating your observations based on points rather than elapsed time, the answer will be: No points do not scale.
Processors may not scale linearly either. The i5 and the x6 (etc.) have independent cores and come very close to scaling linearly but there are still shared resources (cache/RAM/disk/etc) for which they may very well compete for. The processors in Bulldozer and the i7 (etc.) do not have fully outfitted independent cores and pairs of ALU "cores" compete with each other for access to a shared FPU. This latter phenomenon is particularly important to FAH.
I suggest you answer the question yourself. Assuming you have N independent cores running SMP in an otherwise idle machine, measure the average frame time. Restart FAH with N/2 smp threads (on the same WU). Is the new frame time twice as long and the original measurement? I think you'll find the answer is "pretty close" but we'd appreciate hearing your results, no matter whether my guess is correct or it isn't. Be sure to describe the hardware and the WU that you used for your test.
Any SMP software scales with the number of processors within certain constraints. If you have twice as many processors, the actual processing speed can never exceed twice the speed and it will generally be less than that due to a number of factors. There's always some delay required to synchronize the various threads but in most FAH cases this is quite small. An obviously exaggerated case would be a protein with 12 atoms and 12 CPUs. The overhead of distributing one atom to each processor and collecting the results to complete the step would take A LOT more time than processing a single atom.
The points system does not scale linearly, so if you're evaluating your observations based on points rather than elapsed time, the answer will be: No points do not scale.
Processors may not scale linearly either. The i5 and the x6 (etc.) have independent cores and come very close to scaling linearly but there are still shared resources (cache/RAM/disk/etc) for which they may very well compete for. The processors in Bulldozer and the i7 (etc.) do not have fully outfitted independent cores and pairs of ALU "cores" compete with each other for access to a shared FPU. This latter phenomenon is particularly important to FAH.
I suggest you answer the question yourself. Assuming you have N independent cores running SMP in an otherwise idle machine, measure the average frame time. Restart FAH with N/2 smp threads (on the same WU). Is the new frame time twice as long and the original measurement? I think you'll find the answer is "pretty close" but we'd appreciate hearing your results, no matter whether my guess is correct or it isn't. Be sure to describe the hardware and the WU that you used for your test.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Does the A3 core scale past 8 cores/threads
thanks for the quick reply bruce. the query is for one of our members, not me.
He's had a rough time lately and is taking this -bigadv shortage a bit personally I think.
It's been very frustrating for him, combined with the P7504 problem.
At the moment he's running a 990x and on smp-12 is getting the same tpf as a 2600k @ 4.5
when running a p7504. I'm not sure what speed/client he's running. I'll direct him to this thread so he can fill you in on his
specific hardware.
cheers.
He's had a rough time lately and is taking this -bigadv shortage a bit personally I think.
It's been very frustrating for him, combined with the P7504 problem.
At the moment he's running a 990x and on smp-12 is getting the same tpf as a 2600k @ 4.5
when running a p7504. I'm not sure what speed/client he's running. I'll direct him to this thread so he can fill you in on his
specific hardware.
cheers.
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Does the A3 core scale past 8 cores/threads
The PPD on a -smp vs a bigadv has nothing to do with scaling it has to do with the QRB and frame times and the QRB does not scale literally between the 2 types of projects. Remember the kfactor on a bigadv is between 26 and 38 depending on the WU and the QRB on a smp it is between .59 and 3.35 or somewhere in that neighbourhood. So if your 990X has a return time of 3min per frame faster with a kfactor of 38 it PPD is going to be quit a bit more than the slower rig but if you are folding a smp the kfactor is only going to be 3.35 at best so if you are 3 min faster on a smp the PPD difference is going to be quit a bit less.
I hope I made sense here without typing out the formulas it is not that easy to explain. And I am not good with formulas and equations.
Just curious but what is the 990X clocked at I get frame times of 1:38 on a 7504 on a 980X
I hope I made sense here without typing out the formulas it is not that easy to explain. And I am not good with formulas and equations.
Just curious but what is the 990X clocked at I get frame times of 1:38 on a 7504 on a 980X
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Re: Does the A3 core scale past 8 cores/threads
Thanks Grandpa.
I think he is more concerned about the tpf not scaling with the increase in threads/cores.
Can someone expect that a 990x running smp -12 core A3 would be faster than
a 2600k running smp8 core A3 given the same workunit, and clockspeed?
At the moment the data suggests that they actually come in about the same tpf wise.
I understand the k-factor bonus point thing; It's just that you would expect that a all things
being equal a 990x should be able to churn through a SMP unit faster than a 2600k
Additionally, as I am reasonably new at this, my understanding is that the A4 core is newer and
does use threads/cores above 8 more effectively???
I think he is more concerned about the tpf not scaling with the increase in threads/cores.
Can someone expect that a 990x running smp -12 core A3 would be faster than
a 2600k running smp8 core A3 given the same workunit, and clockspeed?
At the moment the data suggests that they actually come in about the same tpf wise.
I understand the k-factor bonus point thing; It's just that you would expect that a all things
being equal a 990x should be able to churn through a SMP unit faster than a 2600k
Additionally, as I am reasonably new at this, my understanding is that the A4 core is newer and
does use threads/cores above 8 more effectively???
Re: Does the A3 core scale past 8 cores/threads
It's possible, but we're not a benchmark site. He can tell us what he's seeing and nobody will argue with it. Then ask Intel to confirm your findings.
Apparently you're comparing a 990x with 6 cores @3.46 to a 2600k with 4 cores @4.5 and they're running the same WU. Both are running two threads per core. Right?
From the theoretical perspective of pure GFLOPS, the 990X is maybe 15% faster, but theoretical GFLOPS do not equate to actual GFLOPS until you also consider what Intel calls "uncore" speeds, including factors like cache size, cache organization, FSB speeds, etc. Moreover, the non-scalability of HyperThreading is well known It also depends greatly on the specific software you're running (and in the case of FAH, the particular project you're running).
Does the extra overhead of breaking down a problem into 12 threads compared to 8 threads make up a measurable part of that 15% difference? Who can say without actually trying it.
Apparently you're comparing a 990x with 6 cores @3.46 to a 2600k with 4 cores @4.5 and they're running the same WU. Both are running two threads per core. Right?
From the theoretical perspective of pure GFLOPS, the 990X is maybe 15% faster, but theoretical GFLOPS do not equate to actual GFLOPS until you also consider what Intel calls "uncore" speeds, including factors like cache size, cache organization, FSB speeds, etc. Moreover, the non-scalability of HyperThreading is well known It also depends greatly on the specific software you're running (and in the case of FAH, the particular project you're running).
Does the extra overhead of breaking down a problem into 12 threads compared to 8 threads make up a measurable part of that 15% difference? Who can say without actually trying it.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Does the A3 core scale past 8 cores/threads
I should be able to answer that question in a couple of weeks I am putting together a i7 2700k rig to do some testing with that is interesting, do you know what his timings are both cpu speed and memory spped and timings on both rigs. Memory speed and timings play a pretty big role when it comes to folding on the 970, 980 and 990's
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
-
- Posts: 1165
- Joined: Wed Apr 01, 2009 9:22 pm
- Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)
Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS
Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only) - Location: Jersey, Channel islands
Re: Does the A3 core scale past 8 cores/threads
I can run some tests on a lower end hex core as well but it may well take a week or so as I am awaiting parts, can run tests on win 7x64 and linux.
Just out of curiosity which OS is he running?
Just out of curiosity which OS is he running?
Re: Does the A3 core scale past 8 cores/threads
Really? I wonder what I do wrong since I get around 2:27 on a i970 @ 4.1 GHz on the same WU.Grandpa_01 wrote: Just curious but what is the 990X clocked at I get frame times of 1:38 on a 7504 on a 980X
WIN7....maybe.
Re: Does the A3 core scale past 8 cores/threads
Given the information in the thread that prompted this post, my conclusion was that A3 core doesn't scale well on Windows XP on Project 7504 with 2 GPUs running. I don't think that can be extrapolated to simply say that the A3 core doesn't scale. His frame times went from 3:03 at 8 threads to 2:57 at 10 threads and though it improved at 11 that was not a viable solution since some SMP work fails at smp 11. Frame time at 12 threads was 3:22 but that's understandable given he has 2 GPU clients running.
I did a quick test on Windows 7 and saw excellent scaling from 8 to 10 and 12 threads. However, I'm not running XP, I didn't have p7504, and I don't have any GPUs running simultaneously.
I did a quick test on Windows 7 and saw excellent scaling from 8 to 10 and 12 threads. However, I'm not running XP, I didn't have p7504, and I don't have any GPUs running simultaneously.
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Does the A3 core scale past 8 cores/threads
That sounds about right, I think my 970 at 4.3Ghz is around 2:00 per frame.Mstenholm wrote:Really? I wonder what I do wrong since I get around 2:27 on a i970 @ 4.1 GHz on the same WU.Grandpa_01 wrote: Just curious but what is the 990X clocked at I get frame times of 1:38 on a 7504 on a 980X
WIN7....maybe.
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
-
- Posts: 135
- Joined: Sun Dec 02, 2007 12:45 pm
- Hardware configuration: 4p/4 MC ES @ 3.0GHz/32GB
4p/4x6128 @ 2.47GHz/32GB
2p/2 IL ES @ 2.7GHz/16GB
1p/8150/8GB
1p/1090T/4GB - Location: neither here nor there
Re: Does the A3 core scale past 8 cores/threads
My x6 @ 3.9 running a 7504 has a TPF of around 2:54 in LINUX.Grandpa_01 wrote:That sounds about right, I think my 970 at 4.3Ghz is around 2:00 per frame.Mstenholm wrote:Really? I wonder what I do wrong since I get around 2:27 on a i970 @ 4.1 GHz on the same WU.Grandpa_01 wrote: Just curious but what is the 990X clocked at I get frame times of 1:38 on a 7504 on a 980X
WIN7....maybe.
iustus quia...
Re: Does the A3 core scale past 8 cores/threads
Numbers from a 2600K running Win7/VMware/Ubuntu 10.10 @ 4.8 GHz.
Comparing my 2600K frame times to Grandpa's frame times of 1:38 on a 7504 on a 980X, it would appear the a3 WUs scale properly above 8 threads in Linux.
Code: Select all
Project ID: 7504
Core: GRO-A3
Credit: 644
Frames: 100
Name: HTPC VM
Path: \\HTPC-UBUNTU\fah\
Number of Frames Observed: 72
Cur. Time / Frame : 00:02:23 - 34,710 PPD
R3F. Time / Frame : 00:02:23 - 34,710 PPD
Last edited by ChasR on Sun Nov 13, 2011 4:05 pm, edited 1 time in total.
-
- Posts: 1024
- Joined: Sun Dec 02, 2007 12:43 pm
Re: Does the A3 core scale past 8 cores/threads
I think Punchy made the proper point. A3 scales properly on a clean machine but the time that GPUs steal from SMP means you're not measuring a system with all of the cores dedicated to SMP.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Does the A3 core scale past 8 cores/threads
2 things. The concerned user should post here directly, so there is no loss in translation. Second, I'm with cody, turn off the GPUs if you want to get some REAL 12 core folding frame times. Otherwise there is no way to tell what is affecting frame times. Could be just as likely the GPUs as the A3 fahcore. Can't assume anything when the water is that muddy.
P.S. The a3 core is older but not ancient. bigadv was an offshoot of the a3 fahcore.
P.S. The a3 core is older but not ancient. bigadv was an offshoot of the a3 fahcore.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.