How can I use F@H for stress testing?

Post by **bruce** » Tue Dec 31, 2019 7:21 pm

HayesK wrote:Do not know if CPU wu started on intel will run on AMD.

Yes. All CPUs must now have the AMD64 instruction set (which includes Intel CPUs) and is later tested for SSE and for AVX so it should work with whatever it finds.

HayesK · Post by **HayesK** » Tue Dec 31, 2019 9:29 pm

@Bruce. Do you think a cpu wu started on AVX could be finished on SSE2 or vice versa?

MeeLee · Post by **MeeLee** » Tue Dec 31, 2019 10:32 pm

foldy wrote:
HayesK wrote:the work unit will fail if no compatible hardware is found. Pretty much needs to be identical.
Identical means a CPU with same core count for FAH CPU slot and a GPU from same vendor nvidia vs. AMD?

It goes even further.
A 2070 Wu does not get continued on a 2060.
I'm not sure about if the reverse is true.
However, a WU will continue on a GPU (eg: RTX 2070), whether it is a different model between brands (eg: EVGA 2 fan, EVGA 3 FAN, or EVGA watercooled); as well as cross brands (Zotac 2070, MSI, Gigabyte,...)

Post by **bruce** » Tue Dec 31, 2019 10:46 pm

HayesK wrote:@Bruce. Do you think a cpu wu started on AVX could be finished on SSE2 or vice versa?

Somebody should test that, but my first guess is going to be that the answer is yes.

Post by **bruce** » Tue Dec 31, 2019 11:11 pm

MeeLee wrote:A 2070 Wu does not get continued on a 2060.
I'm not sure about if the reverse is true

In my understanding, the reverse is always going to be true but I'm not sure you're right about the 2060/2070 statement. FAH categorizes WUs by a value called GPUSpecies which is based on the internal hardware, not on some decision made by some advertising executive.

From what I know about FAH, it's all going to depend on how the GPUs are classified by species. A WU started on Species M is not going to want to work on a GPU of species N for M <> N. I could be wrong, but I don't think so ... and if anybody finds a example that proves me wrong, tell me.

The RTX2060 are species 7 using a TU106 or TU106M as are the TU106 RTX2070. There are two RTX2070Super's ... one is a TU106 and the other is a TU104 and only the latter is a species 8. The model number on the box doesn't always correspond to the internal processor chip that's doing the work.

Go figure.

Oh, BTW, the species is listed in GPUs.txt but it's also shown in your system configuration. This machine has two GPUs

Code: Select all

  GPUs: 2
GPU 0: Bus:3 Slot:0 Func:0 NVIDIA:3 GK107 [GeForce GT 740]
GPU 1: Bus:4 Slot:0 Func:0 NVIDIA:5 GM206 [GeForce GTX 960]

They happen to be NVIDIA:3 and NVIDIA:5. The newer, more powerful, GPUs with newer on-chip features happen to be species 6, 7, or 8. I suppose OpenCL has to insert extra code for my species 3 GPU to perform the same function that your species 8 can do with a simpler set of instructions. (Your hardware can do Tensor math, mine can't without a lot of extra instructions ... but that FAH doesn't really care since the FAHCore doesn't use tensor math.)

In another example, AMD GPUs are either species 4 or species 5. The only difference is whether they support double precision math or not, so there should be a lot of interchangeability with AMD GPUs, (all being 5's).

Post by **toTOW** » Wed Jan 01, 2020 12:42 am

HayesK wrote:@Bruce. Do you think a cpu wu started on AVX could be finished on SSE2 or vice versa?

Yes, no, maybe ... there's no easy answer ...

In theory the answer is yes (unless you try to run the AVX core on a CPU that doesn't support it).

But what happens is that the client ties the cores used (SSE2 or AVX) to the WU. And I don't know how the client hardware detection routine would react to such changes.

Post by **toTOW** » Wed Jan 01, 2020 12:45 am

The same would apply to GPU slots : in theory we should be able to resume a WU started with GPU X on GPU Y since OpenCL binaries are compiled upon execution for the GPU and driver installed.

The only thing that bother me is how the client would react to this.

MeeLee · Post by **MeeLee** » Wed Jan 01, 2020 10:45 am

toTOW wrote:The same would apply to GPU slots : in theory we should be able to resume a WU started with GPU X on GPU Y since OpenCL binaries are compiled upon execution for the GPU and driver installed.

The only thing that bother me is how the client would react to this.

Going from a larger GPU (with more cores) to a smaller GPU, would either overload the GPU, or make it run less than optimal, constantly swapping out data, high power consumption, low performance.

Going from a slower to a faster GPU, will cause only part of the GPU to be utilized. Low power consumption, cooler running GPU, higher boost clocks.
So the PPD would be higher than if the WU was done on the slower card, but the PPD still is lower on the faster GPU with a smaller WU, than if it had a larger WU (90-100% GPU load).

I think FAH is pretty good in assigning WUs to GPUs. There's about a 2-5% treshold, in case some cores go bad, or one has a lower end GPU that had reduced core count from the factory (or reverse; people with AA or A++-binned chips will have more cores available, and see less GPU utilization than those who bought the GPUs with more budget binned chips).

Post by **bruce** » Fri Jan 03, 2020 3:16 am

You're assuming that the characteristics of the WU need to precisely match the GPU. That's not true.

Suppose you look at a WU with 200 000 atoms running on a GPUs with 2000 shaders or 4000 shaders. If you look at the portion of the code that calculates the forces on each atom, clearly a portion of the code would be repeated 100 times or 50 times, potentially summing forces associated with as many of 199 999 other atoms. That's a lot of parallelization, and since neither the number of atoms or the number of shaders is going to be a convenient number, it's always going to be a matter of doing as much of that calculation as "will fit" as many times as necessary to to all the necessary calculations.

When the FAHCore starts up, it detects the number of active shaders to fill and repeats that filling process as many times as necessary.

That's not the only part of the code, and other parts will also be processed in whatever number of times it takes before doing a different segment of the calculation.

Anyway, a process which is performed only 50 times won't keep the GPU busy the whole time -- or at least not quite as busy as a process that's repeated 100 times (before doing something else) so twice as many shaders doesn't change the average busy percentage number linearly. Similarly, a protein with only 100 000 atoms won't spend half as long being busy ON THE STEP but not producing the same percentage of the time doing that particular calculation.

Post by **toTOW** » Fri Jan 03, 2020 8:40 pm

With OpenCL, everything is handled by the driver, and the executable is compiled when started to match to your GPU/driver pair. You may have not noticed it before, but everytime you update the drivers, a new compilation occurs next time you start your OpenCL program.

In the past (and I'm afraid that's still the case), CUDA required the code to be compiled for each GPU (well, compute capability to be more precise) before, and the code had to be packaged in the executable. It explains why when a new GPU architecture was released, we had to recompile the executable to support it.

MeeLee · Post by **MeeLee** » Sat Jan 04, 2020 1:56 pm

toTOW wrote:The same would apply to GPU slots : in theory we should be able to resume a WU started with GPU X on GPU Y since OpenCL binaries are compiled upon execution for the GPU and driver installed.

The only thing that bother me is how the client would react to this.

It doesn't. It'll just dump the WU.
About the 2070 going down to a 2060 maybe. But a 2070 Super to a 2060 definitely won't work.

Post by **Joe_H** » Sat Jan 04, 2020 3:54 pm

MeeLee wrote:
toTOW wrote:The same would apply to GPU slots : in theory we should be able to resume a WU started with GPU X on GPU Y since OpenCL binaries are compiled upon execution for the GPU and driver installed.

The only thing that bother me is how the client would react to this.
It doesn't. It'll just dump the WU.
About the 2070 going down to a 2060 maybe. But a 2070 Super to a 2060 definitely won't work.

Do you know this from actual observation, or are you just guessing?

Post by **bruce** » Sun Jan 05, 2020 4:58 am

Joe_H wrote:Do you know this from actual observation, or are you just guessing?

I've had WUs dumped for no apparent reason except maybe it's managed too closely by FAHClient.

I don't think it's being managed too closely, though. If your GPU advertises itself as supporting double precision and that happens to be a requirement for a given project, you wouldn't want to resume processing on a GPU that doesn't support Double Precision. It's not going to work. It doesn't matter if DP runs a half the speed of single precision or your hardware only runs it a 1/24 the speed of single precision. As long as it can perform the required arithmetic operation, it will still run.

The fact is, some of the new hardware also supports half precision and we could repeat the same words I just said with respect to that new feature -- except for the fact that neither FAHCore21 nor FAHCore22 make any use of that new feature. I also know of no plans to require half precision support.

MeeLee · Post by **MeeLee** » Sun Jan 05, 2020 11:06 am

Joe_H wrote:
MeeLee wrote:
toTOW wrote:The same would apply to GPU slots : in theory we should be able to resume a WU started with GPU X on GPU Y since OpenCL binaries are compiled upon execution for the GPU and driver installed.

The only thing that bother me is how the client would react to this.
It doesn't. It'll just dump the WU.
About the 2070 going down to a 2060 maybe. But a 2070 Super to a 2060 definitely won't work.
Do you know this from actual observation, or are you just guessing?

Observation. Swapping out higher end GPUs for lower ones, results in Wu's being dumped.

Post by **toTOW** » Sun Jan 05, 2020 11:31 am

As I already said, OpenCL allows it, but I don't know how the client would react ... and the client can be pretty much sensitive in how it identifies the GPUs (platform ID/openCL device ID) ...

Folding Forum

How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?

Re: How can I use F@H for stress testing?