I'm using the SMP and GPU (Fermi) clients and know that GPU WUs have many more base points than CPU WUs.
My question is: generally, how much more complex are GPU WUs than SMP WUs? 3 times? 10 times? 20 times? Do the number of base points (ignore the bonus points for SMP WUs) indicate the complexity of a WU?
GPU WUs are how much more complex than SMP WUs?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: GPU WUs are how much more complex than SMP WUs?
There is no way to make this comparison yet per the base points. GPU and CPU currently use different benchmark hardware. But when the new GPU QRB system is released, you will be able to do it.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: GPU WUs are how much more complex than SMP WUs?
Complexity is a function of what is being studied....the number of atom, the forces involved, the specific type of equations used to study those subjects...not the machine used to do the calculations.
Points should be something like watts, a measure of work, or more exactly, watt hours(work over time).
A very complex computation can be multiplied by few steps, or a simple calculation can be multiplied by many steps to yield a roughly equivalent amount of work, right?
So, just pointing out that your premise of complexity, while a part, is not the only consideration in determining the base points.
When dealing with bus size, latency issues, different native calculating ability, variations in HZ, memory frequency, etc. etc. etc. trying to determine what is a roughly equal amount of calculation yield of through-put in widely different calculation projects and studies was a pretty daunting task, and I suspect will continue to be.
One way to simplify this rather difficult and complex "work yield evaluation" problem is to simply test rather than predict.
This is of course what is done everywhere else...you build an engine and predict the horsepower, by until you put it on the Dyno you don't really know.
And of course, the entire graph shows you the variation at different RPM.
The entire concept of FLOPS is the same, supercomputers vary widely in the speed of their calculations, the ability to store and access, the range of calculation that they are geared to be best at....Yet an agreed standard test yields a certain amount of work accomplished for comparisons sake.
Anyway, my point:
the new WU test for QRB on GPU moves the points of base from theoretical, to testing, by running the same calculations on both SMP and GPU for the first time.
Now the test will be direct and not subject to artificial (required by the lack of direct testing) variables and equating compensations.
So, in the very near future, your question will no longer be relevant. The "complexity" will be the same. The only relevant variables remaining will be number of steps, size, and ability to accomplish that work over time.
Points should be something like watts, a measure of work, or more exactly, watt hours(work over time).
A very complex computation can be multiplied by few steps, or a simple calculation can be multiplied by many steps to yield a roughly equivalent amount of work, right?
So, just pointing out that your premise of complexity, while a part, is not the only consideration in determining the base points.
When dealing with bus size, latency issues, different native calculating ability, variations in HZ, memory frequency, etc. etc. etc. trying to determine what is a roughly equal amount of calculation yield of through-put in widely different calculation projects and studies was a pretty daunting task, and I suspect will continue to be.
One way to simplify this rather difficult and complex "work yield evaluation" problem is to simply test rather than predict.
This is of course what is done everywhere else...you build an engine and predict the horsepower, by until you put it on the Dyno you don't really know.
And of course, the entire graph shows you the variation at different RPM.
The entire concept of FLOPS is the same, supercomputers vary widely in the speed of their calculations, the ability to store and access, the range of calculation that they are geared to be best at....Yet an agreed standard test yields a certain amount of work accomplished for comparisons sake.
Anyway, my point:
the new WU test for QRB on GPU moves the points of base from theoretical, to testing, by running the same calculations on both SMP and GPU for the first time.
Now the test will be direct and not subject to artificial (required by the lack of direct testing) variables and equating compensations.
So, in the very near future, your question will no longer be relevant. The "complexity" will be the same. The only relevant variables remaining will be number of steps, size, and ability to accomplish that work over time.
Transparency and Accountability, the necessary foundation of any great endeavor!
Re: GPU WUs are how much more complex than SMP WUs?
That is a very nice explanation. But you bring up a point that I have been wondering about, though the answer may not be known yet: will the "new" work units run the same calculations on the CPU as the GPU? In other words, would you do the same science on both, but just give the user the choice of which to use. Or alternatively will there continue to be different science projects run on the GPU as compared to the CPU?mdk777 wrote:Anyway, my point:
the new WU test for QRB on GPU moves the points of base from theoretical, to testing, by running the same calculations on both SMP and GPU for the first time.
One reason I ask is that on the World Community Grid/HCC project, they have adapted the CPU work units to run on the GPU, so they are preforming the same calculations, just a lot faster on the GPU. Given the wide variety of work on Folding, that probably won't be practical for a long time, but is that the direction they are heading?
-
- Site Moderator
- Posts: 2850
- Joined: Mon Jul 18, 2011 4:44 am
- Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4 - Location: Western Washington
Re: GPU WUs are how much more complex than SMP WUs?
Figuring out how to efficiently use a GPU for computing is significantly more difficult than using the CPU. Usually it requires a complete restructuring of the algorithms based on the GPU architecture, or the use of intermediate libraries. The PG uses the latter approach, including their own library, OpenMM. So yes, they are more complex. But if you can use them efficiently, for certain calculations GPUs are a lot faster.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
Re: GPU WUs are how much more complex than SMP WUs?
Well, I don't speak for the project, but it appears so.but is that the direction they are heading?
That is ultimately the beauty of Heterogeneous compute and open CL. It really doesn't code for one specific piece of hardware but for any that are compliant.
The simplification for PG is extreme in the long run. It is not just switching to GPU over CPU.
The efficiency of cpu vs. gpu vs a blend of cpu and gpu can be optimized in real time This has already been demonstrated at Heterogeneous compute conferences.
I agree, it will not happen overnight, but I think it is the direction they are headed.
EDIT: I agree with Jesse_V that this was the past or current approach. I am not sure in the long run if they won't give up some efficiency for simplicity and ubiquitous hardware conformity. Time will tell.
Transparency and Accountability, the necessary foundation of any great endeavor!
Re: GPU WUs are how much more complex than SMP WUs?
You're right ... the PG has not answered that question yet.JimF wrote:... the answer may not be known yet: will the "new" work units run the same calculations on the CPU as the GPU? In other words, would you do the same science on both, but just give the user the choice of which to use. Or alternatively will there continue to be different science projects run on the GPU as compared to the CPU?
They did say that they expect to be able to run both implicit and explicit projects on either SMP or GPU, which gives them the ability to benchmark a project on either one. They did NOT say whether they planned to run some/all projects on the Donor's choice of platforms so all we can do at this point is guess. The possibility of hybrid projects (mentioned above) introduces a third possibility of running a single project on all available resources. To me, that seems really complicated, but I really don't know if it would be worth the software development costs.
I'm pretty confident that some projects will be limited to SMP by the size of GPU VRAM just like some SMP projects are currently limited to 64-bit OSs.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: GPU WUs are how much more complex than SMP WUs?
It may also complicate the choice of a graphics card. If Gromacs 4.6 uses OpenCL, there is the possibility that AMD will out-perform Nvidia. But as you say, that might not be used on all projects, and Nvidia with CUDA might do better on the others. It seems to me that this is the time to sit tight on the hardware and wait to see which way the software is going.bruce wrote:The possibility of hybrid projects (mentioned above) introduces a third possibility of running a single project on all available resources. To me, that seems really complicated, but I really don't know if it would be worth the software development costs.