FP16, VS INT8 VS INT4?

MeeLee · Post by **MeeLee** » Tue Mar 26, 2019 2:08 am

I see some websites quote different performances on cards, like in picture below:

Can FAH use INT4 or INT8 on their WUs?
If so, would it increase folding performance?

JimboPalmer · Post by **JimboPalmer** » Tue Mar 26, 2019 2:40 am

If F@H could use FP16, Int8 or Int4, it would indeed speed up the simulation.

Sadly, even FP32 is 'too small' and sometimes FP64 is used. Always using FP64 would be ideal, but it is just too slow. (Some cards may do FP64 32 times as slow as FP32)

As the simulation Programs (mostly OpenMM for GPUs) get updated with Volta and Turing in mind I would expect the developers to make use of them in scenarios where the errors do not accumulate. I have my doubts there are any such subroutines in OpenMM.

As examples, here is a wikipedia page on Nvidia GPUs, I started with Pascal, but you can scroll up for older micro-architectures. Wikipedia calls FR32 Single Precision, FP64 Double Precision, and FP16 Half Precision.

https://en.wikipedia.org/wiki/List_of_N ... _10_series

You will notice that Pascal supports Half Precision, but very slowly. It would not be useful to modify Pascal Code. Volta is very fast at both Double Precision and Half Precision, it would make a great F@H micro-architecture (because Double Precision or FP64 is very fast) but is VERY expensive. Turing does Half Precision very rapidly, but not Double Precision very fast. (Even the slowest Volta is 10 times as fast as the fastest Turing at Double precision)

FP16 is going to be most useful when you never plug the results of one equation into the inputs of the next equation. Modeling Proteins does a great deal of plugging the results of one time frame into the inputs of the next time frame.

Again, I have no reason to suspect F@H can use Half Precision, I suspect it would cause rounding errors that would overwhelm the simulation.

Theodore · Post by **Theodore** » Tue Mar 26, 2019 6:39 am

I believe NVidia Quadro had a Volta chipset, But it was reported to be quite slow compared to the much cheaper RTX2060,
Am I wrong about this?
The Tesla V performance was about on par with 2xRTX 2060s.

Post by **Joe_H** » Tue Mar 26, 2019 3:01 pm

Reported where? The Quadro GV100 based on the Volta chipset is rated at ~12,000 GFLOPS for single precision FP operations, even higher when running at boost clocks. A RTX 2060 gets a rating that is half that, ~6000 GFLOPS. So yo appear to have got the information backwards.

The comparison is even worse if you look at double precision FP operations, the 2060 is rated at less than 200 GFLOPS, the Quadro GV100 is at about 6,000 GFLOPS.

Theodore · Post by **Theodore** » Tue Mar 26, 2019 4:38 pm

Perhaps, but from what I read online, people running FAH on those cards, have reported very low PPDs.

Post by **Joe_H** » Tue Mar 26, 2019 4:51 pm

You still have not said where. Details on what projects they were seeing this on and which drivers are important.

In particular that may be an issue with F@h, but not the GPGPU these cards were designed for. No mainstream Geforce cards are based on Volta. Most of the Volta based cards support NVLink for high speed interconnections between multiple cards and the CPUs.

MeeLee · Post by **MeeLee** » Tue Mar 26, 2019 9:01 pm

I can see that the Nvidia Quadro cards based on Pascal, don't have high PPD.
But they're entirely different from those based on Volta.

JimboPalmer · Post by **JimboPalmer** » Tue Mar 26, 2019 9:57 pm

from the above link at Wikipedia:
Quadro GV100 14,800 Single Precision GFLOPS (FP32) 7,400 Double Precision GFLOPS (FP64)

GeForce RTX 2060 6,451.20 Single Precision GFLOPS (FP32) 201.60 Double Precision GFLOPS (FP64)

F@H uses Single Precision when it can, and Double Precision only when necessary.

Theodore · Post by **Theodore** » Thu Mar 28, 2019 3:03 pm

Well, if you browser around a bit, you'll find PPD ratings in the likes of a GTX1070 ti..
Most certainly not equivalent to their price disparity.

Post by **bruce** » Thu Mar 28, 2019 3:28 pm

Theodore wrote:... you'll find PPD ratings in the likes of a GTX1070 ti. Most certainly not equivalent to their price disparity.

Quadro is intended for the "professional" marketplace whereas FAH is designed to work on home computers. The professional marketplace emphasizes performance in Double Precision. As Jimbo says, its use is minimized for FAH by design.

The RTX series is aimed at the AI market and the gaming market, both of which can use features which FAH doesn't need.

I make it a point to stick to GPUs intended for the home market (GeForce, etc.) which help me avoid buying features that I don't need.

FAH's design goals will not be changed to use other types of arithmetic because that would limit its use to more expensive hardware with almost no change in performance for scientific work-- and it will not use more FP64 unless it's required for better science. It uses a maximum amount of Single Precision math (FP32) and whatever integers are needed to support the code.

JimboPalmer · Post by **JimboPalmer** » Thu Mar 28, 2019 10:44 pm

Your GPU must support double precision, (fp64) but it is so minimized that the speed of double precision does not slow f@h.

Folding Forum

FP16, VS INT8 VS INT4?

FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?

Re: FP16, VS INT8 VS INT4?