Can you explain why does FAH depend upon GPU while it uses OpenCL or CUDA ?
Thanks
why GPU dependancy ?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 212
- Joined: Tue Aug 07, 2012 11:59 am
- Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300
Re: why GPU dependancy ?
These are only my opinions and understanding.
I'm sure this raises more questions than it answers.
What benefit do GPUs give us? - The are capable of doing a massive amount calculations at the same time. The vastly outstrip CPUs in this regard. This is also why the GPU work units are scored considerably higher than CPU work units. In my Ryzen system the GPU scores roughly 9-10 times higher than that of the CPU (the Ryzen itself is running a 20 thread slot).
Why not exclusively use GPUs? - GPUs can do a vast number of calculations but they're largely simpler calculations with lower precision. CPUs are capable of doing complex calculations and , I think, a higher level of accuracy. Which one the researcher uses will depend upon what they are trying to achieve.
Why use GPUs at all? They are capable of doing a significantly higher number of calculations and their output is "good enough" for the researcher(s) who chooses to use it.
A few other notes:-
CUDA is NVidia exclusive and will only run on NVidia cards, assuming the GPU has it and the drivers are new enough.
FAH only started leveraging the increased performance of CUDA about a year ago. This is why the NVidia cards score roughly 25-30% higher than the equivalent AMD cards.
OpenCL will run just about anywhere, including CPUs, but runs quickest on GPUs.
OpenCL is the fallback for NVidia cards which have failed to initialize CUDA.
Because it's the GPUs carrying out the calculations.
I'm sure this raises more questions than it answers.
What benefit do GPUs give us? - The are capable of doing a massive amount calculations at the same time. The vastly outstrip CPUs in this regard. This is also why the GPU work units are scored considerably higher than CPU work units. In my Ryzen system the GPU scores roughly 9-10 times higher than that of the CPU (the Ryzen itself is running a 20 thread slot).
Why not exclusively use GPUs? - GPUs can do a vast number of calculations but they're largely simpler calculations with lower precision. CPUs are capable of doing complex calculations and , I think, a higher level of accuracy. Which one the researcher uses will depend upon what they are trying to achieve.
Why use GPUs at all? They are capable of doing a significantly higher number of calculations and their output is "good enough" for the researcher(s) who chooses to use it.
A few other notes:-
CUDA is NVidia exclusive and will only run on NVidia cards, assuming the GPU has it and the drivers are new enough.
FAH only started leveraging the increased performance of CUDA about a year ago. This is why the NVidia cards score roughly 25-30% higher than the equivalent AMD cards.
OpenCL will run just about anywhere, including CPUs, but runs quickest on GPUs.
OpenCL is the fallback for NVidia cards which have failed to initialize CUDA.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: why GPU dependancy ?
Welcome to Folding@Home!
Again, just my opinion, not necessarily the complete facts.
If Folding@Home wrote to the hardware level, then it would need intimate knowledgeable that the vendors do not publish, about every single graphics card in the world. Instead, F@H writes to publicly defined abstraction layers. OpenCL 1.2 contains the minimum defined instructions to execute the folding needed.
Nvidia's OpenCL is written using a proprietary abstraction layers called CUDA, so if F@H can write to CUDA, they avoid abstracting an abstraction. This saves them execution speed at the cost of twice as much programming.
F@H could write a separate program for every single GPU, but I doubt they woul be done with programming cards released a decade ago, let alone anything we still use.
Again, just my opinion, not necessarily the complete facts.
If Folding@Home wrote to the hardware level, then it would need intimate knowledgeable that the vendors do not publish, about every single graphics card in the world. Instead, F@H writes to publicly defined abstraction layers. OpenCL 1.2 contains the minimum defined instructions to execute the folding needed.
Nvidia's OpenCL is written using a proprietary abstraction layers called CUDA, so if F@H can write to CUDA, they avoid abstracting an abstraction. This saves them execution speed at the cost of twice as much programming.
F@H could write a separate program for every single GPU, but I doubt they woul be done with programming cards released a decade ago, let alone anything we still use.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 212
- Joined: Tue Aug 07, 2012 11:59 am
- Hardware configuration: openSUSE Tumbleweed, x86_64,Asrock B760M-HDV/M.2 D4, Intel Core i3-12100, 16 GB, Intel UHD Graphics 730, NVIDIA GeForce GT 1030, Edup-Love EP-9651GS Wi-Fi Bluetooth, multicard reader USB 3.0 startech.com 35fcreadbu3, Epson XP 7100, Headset Bluetooth 3.0 Philips SHQ7300
Re: why GPU dependancy ?
My question is not accurate, then another one.
Why some GPU are compliant (white list) and some others not, if using an abstraction layer as OpenCL or CUDA ?
Why some GPU are compliant (white list) and some others not, if using an abstraction layer as OpenCL or CUDA ?
Re: why GPU dependancy ?
Again, my opinion and (mis)understanding.
I think there's a number of factors affecting this:-
GPU compatibility
Driver support
GPU capability
GPU performance
Compatibility - the first thing to realise is that FAH want to leverage as wide a range of hardware as possible but need to draw the line somewhere as to what is useful. This does mean leveraging the most common features across the available hardware.
Just like a computer game, FAH has minimum requirements.
They don't require latest drivers but can't draw the line so far back it excludes the latest cards. There was some discussion about this problem while support for CUDA was being added.
I believe it needs OpenCL 1.2 and for CUDA it's something like 11.2 (I think that translated to the geforce 47x series of drivers).
viewtopic.php?t=37391
viewtopic.php?t=37545
Driver support - this will automatically fail a number of GPUs simply because the manufacturer has stopped updating drivers for the card and FAH requirements have increased to a point where the gap can no longer be bridged.
GPU capability - even where driver support is still active, or recent enough, the GPU may not meet FAH requirements. Typically FAH looks for FP32 (32-bit floating point number) and FP64 (64-bit floating point) on a GPU. Typically the FP32 performance is significantly higher than FP64. FAH uses a mix of the two, I think they would use exclusively FP64 if they could but right now it's too big a performance hit. A card missing, or extremely low performing, FP64 would black list a card.
GPU performance - to be honest, I think this is more likely to result in a GPU being deranked to a lower species than actual blacklisting.
I think the best indicator to some of this is the recent/current testing with the intel GPUs. Some GPUs simply don't have the capability, some are proving unreliable and others are throttled to the point of being useless. I think the most telling thing about this is that it's still in testing and has not been released to full FAH yet.
It's worth bearing in mind the GPU.txt file is more than just a white/black list. It's a very rough breakdown of GPU capability and performance to allow a slightly more focussed matching of projects to GPUs. This is to, hopefully, prevent large powerful GPUs being underused by work units with small proteins and small slower GPUs being overwhelmed by large work units (a large work unit could be a large protein and/or a long simulation time).
At the end of the day FAH have limited resources and they need to make the most of them. This may mean investigating possible avenues and deciding to not follow them if the ongoing support is not worth the return.
I think there's a number of factors affecting this:-
GPU compatibility
Driver support
GPU capability
GPU performance
Compatibility - the first thing to realise is that FAH want to leverage as wide a range of hardware as possible but need to draw the line somewhere as to what is useful. This does mean leveraging the most common features across the available hardware.
Just like a computer game, FAH has minimum requirements.
They don't require latest drivers but can't draw the line so far back it excludes the latest cards. There was some discussion about this problem while support for CUDA was being added.
I believe it needs OpenCL 1.2 and for CUDA it's something like 11.2 (I think that translated to the geforce 47x series of drivers).
viewtopic.php?t=37391
viewtopic.php?t=37545
Driver support - this will automatically fail a number of GPUs simply because the manufacturer has stopped updating drivers for the card and FAH requirements have increased to a point where the gap can no longer be bridged.
GPU capability - even where driver support is still active, or recent enough, the GPU may not meet FAH requirements. Typically FAH looks for FP32 (32-bit floating point number) and FP64 (64-bit floating point) on a GPU. Typically the FP32 performance is significantly higher than FP64. FAH uses a mix of the two, I think they would use exclusively FP64 if they could but right now it's too big a performance hit. A card missing, or extremely low performing, FP64 would black list a card.
GPU performance - to be honest, I think this is more likely to result in a GPU being deranked to a lower species than actual blacklisting.
I think the best indicator to some of this is the recent/current testing with the intel GPUs. Some GPUs simply don't have the capability, some are proving unreliable and others are throttled to the point of being useless. I think the most telling thing about this is that it's still in testing and has not been released to full FAH yet.
It's worth bearing in mind the GPU.txt file is more than just a white/black list. It's a very rough breakdown of GPU capability and performance to allow a slightly more focussed matching of projects to GPUs. This is to, hopefully, prevent large powerful GPUs being underused by work units with small proteins and small slower GPUs being overwhelmed by large work units (a large work unit could be a large protein and/or a long simulation time).
At the end of the day FAH have limited resources and they need to make the most of them. This may mean investigating possible avenues and deciding to not follow them if the ongoing support is not worth the return.
Re: why GPU dependancy ?
My understanding is, that certain instructions can only be done with the CPU.
There's CPU folding (high accuracy, slow speed), and GPU folding (high performance, lower accuracy).
And with lower accuracy, I mean 16 to 32 bit calculations.
A modern GPU has thousands of of 16-32 bit shaders, that can calculate smaller math problems very quickly, because there are so many shaders all working on their individual math problems.
It's easy to see how a thousand slower (1,4Ghz to 2,4Ghz) shaders, can overall be much faster than only 4, 8, 12, or 16 CPU cores running at almost double (3,5Ghz to 5,5Ghz) the speed.
For GPU folding, CPU and GPU work in unison, usually requiring about 3 to 4Ghz per modern GPU.
The CPU does the job of compressing and decompressing data, logs, background down and uploads, runs the folding client software, ETA prediction, as well as handles what could be considered the "containers", containing the calculated data by sending the content to and from the GPU VRAM memory over the PCIE bus, and handles save states (perhaps some more background tasks).
I believe, but am not sure, that each "container" (WU), contains lots of 32 bit data, but some have 64bit precision (CPU precision) data to be calculated.
This is either done on the GPU, or on the CPU.
At least, on Boinc (a very similar program) they use the GPU cores for that (not to be confused with Cuda cores, which are actually shaders).
But not all projects or programs work like this.
The CPU can also be responsible for the advanced math problems (like 64bit calculations), and the GPU doing the standard 32bit math.
There's CPU folding (high accuracy, slow speed), and GPU folding (high performance, lower accuracy).
And with lower accuracy, I mean 16 to 32 bit calculations.
A modern GPU has thousands of of 16-32 bit shaders, that can calculate smaller math problems very quickly, because there are so many shaders all working on their individual math problems.
It's easy to see how a thousand slower (1,4Ghz to 2,4Ghz) shaders, can overall be much faster than only 4, 8, 12, or 16 CPU cores running at almost double (3,5Ghz to 5,5Ghz) the speed.
For GPU folding, CPU and GPU work in unison, usually requiring about 3 to 4Ghz per modern GPU.
The CPU does the job of compressing and decompressing data, logs, background down and uploads, runs the folding client software, ETA prediction, as well as handles what could be considered the "containers", containing the calculated data by sending the content to and from the GPU VRAM memory over the PCIE bus, and handles save states (perhaps some more background tasks).
I believe, but am not sure, that each "container" (WU), contains lots of 32 bit data, but some have 64bit precision (CPU precision) data to be calculated.
This is either done on the GPU, or on the CPU.
At least, on Boinc (a very similar program) they use the GPU cores for that (not to be confused with Cuda cores, which are actually shaders).
But not all projects or programs work like this.
The CPU can also be responsible for the advanced math problems (like 64bit calculations), and the GPU doing the standard 32bit math.