FP16, VS INT8 VS INT4?
Moderators: Site Moderators, FAHC Science Team
FP16, VS INT8 VS INT4?
I see some websites quote different performances on cards, like in picture below:
Can FAH use INT4 or INT8 on their WUs?
If so, would it increase folding performance?
Can FAH use INT4 or INT8 on their WUs?
If so, would it increase folding performance?
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: FP16, VS INT8 VS INT4?
If F@H could use FP16, Int8 or Int4, it would indeed speed up the simulation.
Sadly, even FP32 is 'too small' and sometimes FP64 is used. Always using FP64 would be ideal, but it is just too slow. (Some cards may do FP64 32 times as slow as FP32)
As the simulation Programs (mostly OpenMM for GPUs) get updated with Volta and Turing in mind I would expect the developers to make use of them in scenarios where the errors do not accumulate. I have my doubts there are any such subroutines in OpenMM.
As examples, here is a wikipedia page on Nvidia GPUs, I started with Pascal, but you can scroll up for older micro-architectures. Wikipedia calls FR32 Single Precision, FP64 Double Precision, and FP16 Half Precision.
https://en.wikipedia.org/wiki/List_of_N ... _10_series
You will notice that Pascal supports Half Precision, but very slowly. It would not be useful to modify Pascal Code. Volta is very fast at both Double Precision and Half Precision, it would make a great F@H micro-architecture (because Double Precision or FP64 is very fast) but is VERY expensive. Turing does Half Precision very rapidly, but not Double Precision very fast. (Even the slowest Volta is 10 times as fast as the fastest Turing at Double precision)
FP16 is going to be most useful when you never plug the results of one equation into the inputs of the next equation. Modeling Proteins does a great deal of plugging the results of one time frame into the inputs of the next time frame.
Again, I have no reason to suspect F@H can use Half Precision, I suspect it would cause rounding errors that would overwhelm the simulation.
Sadly, even FP32 is 'too small' and sometimes FP64 is used. Always using FP64 would be ideal, but it is just too slow. (Some cards may do FP64 32 times as slow as FP32)
As the simulation Programs (mostly OpenMM for GPUs) get updated with Volta and Turing in mind I would expect the developers to make use of them in scenarios where the errors do not accumulate. I have my doubts there are any such subroutines in OpenMM.
As examples, here is a wikipedia page on Nvidia GPUs, I started with Pascal, but you can scroll up for older micro-architectures. Wikipedia calls FR32 Single Precision, FP64 Double Precision, and FP16 Half Precision.
https://en.wikipedia.org/wiki/List_of_N ... _10_series
You will notice that Pascal supports Half Precision, but very slowly. It would not be useful to modify Pascal Code. Volta is very fast at both Double Precision and Half Precision, it would make a great F@H micro-architecture (because Double Precision or FP64 is very fast) but is VERY expensive. Turing does Half Precision very rapidly, but not Double Precision very fast. (Even the slowest Volta is 10 times as fast as the fastest Turing at Double precision)
FP16 is going to be most useful when you never plug the results of one equation into the inputs of the next equation. Modeling Proteins does a great deal of plugging the results of one time frame into the inputs of the next time frame.
Again, I have no reason to suspect F@H can use Half Precision, I suspect it would cause rounding errors that would overwhelm the simulation.
Last edited by JimboPalmer on Tue Mar 26, 2019 7:41 am, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: FP16, VS INT8 VS INT4?
I believe NVidia Quadro had a Volta chipset, But it was reported to be quite slow compared to the much cheaper RTX2060,
Am I wrong about this?
The Tesla V performance was about on par with 2xRTX 2060s.
Am I wrong about this?
The Tesla V performance was about on par with 2xRTX 2060s.
Last edited by Theodore on Tue Mar 26, 2019 4:39 pm, edited 1 time in total.
-
- Site Admin
- Posts: 7990
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: FP16, VS INT8 VS INT4?
Reported where? The Quadro GV100 based on the Volta chipset is rated at ~12,000 GFLOPS for single precision FP operations, even higher when running at boost clocks. A RTX 2060 gets a rating that is half that, ~6000 GFLOPS. So yo appear to have got the information backwards.
The comparison is even worse if you look at double precision FP operations, the 2060 is rated at less than 200 GFLOPS, the Quadro GV100 is at about 6,000 GFLOPS.
The comparison is even worse if you look at double precision FP operations, the 2060 is rated at less than 200 GFLOPS, the Quadro GV100 is at about 6,000 GFLOPS.
Re: FP16, VS INT8 VS INT4?
Perhaps, but from what I read online, people running FAH on those cards, have reported very low PPDs.
-
- Site Admin
- Posts: 7990
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: FP16, VS INT8 VS INT4?
You still have not said where. Details on what projects they were seeing this on and which drivers are important.
In particular that may be an issue with F@h, but not the GPGPU these cards were designed for. No mainstream Geforce cards are based on Volta. Most of the Volta based cards support NVLink for high speed interconnections between multiple cards and the CPUs.
In particular that may be an issue with F@h, but not the GPGPU these cards were designed for. No mainstream Geforce cards are based on Volta. Most of the Volta based cards support NVLink for high speed interconnections between multiple cards and the CPUs.
Re: FP16, VS INT8 VS INT4?
I can see that the Nvidia Quadro cards based on Pascal, don't have high PPD.
But they're entirely different from those based on Volta.
But they're entirely different from those based on Volta.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: FP16, VS INT8 VS INT4?
from the above link at Wikipedia:
Quadro GV100 14,800 Single Precision GFLOPS (FP32) 7,400 Double Precision GFLOPS (FP64)
GeForce RTX 2060 6,451.20 Single Precision GFLOPS (FP32) 201.60 Double Precision GFLOPS (FP64)
F@H uses Single Precision when it can, and Double Precision only when necessary.
Quadro GV100 14,800 Single Precision GFLOPS (FP32) 7,400 Double Precision GFLOPS (FP64)
GeForce RTX 2060 6,451.20 Single Precision GFLOPS (FP32) 201.60 Double Precision GFLOPS (FP64)
F@H uses Single Precision when it can, and Double Precision only when necessary.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: FP16, VS INT8 VS INT4?
Well, if you browser around a bit, you'll find PPD ratings in the likes of a GTX1070 ti..
Most certainly not equivalent to their price disparity.
Most certainly not equivalent to their price disparity.
Re: FP16, VS INT8 VS INT4?
Quadro is intended for the "professional" marketplace whereas FAH is designed to work on home computers. The professional marketplace emphasizes performance in Double Precision. As Jimbo says, its use is minimized for FAH by design.Theodore wrote:... you'll find PPD ratings in the likes of a GTX1070 ti. Most certainly not equivalent to their price disparity.
The RTX series is aimed at the AI market and the gaming market, both of which can use features which FAH doesn't need.
I make it a point to stick to GPUs intended for the home market (GeForce, etc.) which help me avoid buying features that I don't need.
FAH's design goals will not be changed to use other types of arithmetic because that would limit its use to more expensive hardware with almost no change in performance for scientific work-- and it will not use more FP64 unless it's required for better science. It uses a maximum amount of Single Precision math (FP32) and whatever integers are needed to support the code.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: FP16, VS INT8 VS INT4?
Your GPU must support double precision, (fp64) but it is so minimized that the speed of double precision does not slow f@h.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends