Page 1 of 2

Maxwell GPUs

Posted: Tue Oct 06, 2015 9:17 am
by billford
I've got two nominally identical systems using NVidia GTX 980's. I've always seen slight differences in the behaviour of the two cards, but with the advent of Core_21 that difference is becoming significant.

I believe (from occasional comments here) that there are two types of Maxwell cards- the cards were bought several months apart so it's quite possible I've got one of each.

But I don't know how to tell... could someone give me a clue what to look for please?

(If it involves software in some way, both are running under Linux)

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 9:24 am
by ChristianVirtual
What differences in behavior ?

Are the system really the same, CPU, memory, storage ...
Some of the newer core are writing more frequent checkpoints which might impact runtime characteristics.

And of course differences within different PRCGs might hit you, too.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 9:58 am
by billford
ChristianVirtual wrote:What differences in behavior ?

Are the system really the same, CPU, memory, storage ...
Some of the newer core are writing more frequent checkpoints which might impact runtime characteristics.

And of course differences within different PRCGs might hit you, too.
There are slight differences in the motherboards (Gigabyte GA-H81M-DS2V @ 3.2GHz, GA-H81M-S2PV @ 3.0GHz), otherwise identical.

Cards are both from MSI- these ones. The newer card is on the 3.0GHz mobo.

OS (Mint 17.2), drivers (346.35) etc- identical.

The differences on Core_17 and 18 were slight but noticeable- mainly in the way the boost clock adjusted itself, to the extent that the average PPD (over several thousand WUs) was about 10k (in ~420k) higher on the (newer) card in the slower mobo.

On Core_21, the newer card seems quite happy (so far) with 125MHz of further overclocking whereas the older one struggles with no extra overclock. Particularly on the later P69xx series- the earlier ones that show up in psummary as "UNKNOWN_ENUM" were more or less OK. Though I'll admit I haven't had many of the new ones.

If I can find that the cards are of different generations then I'll just put it down to one of those things and play with the clocks accordingly... and that's all I'm asking- how do I find out?

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 12:55 pm
by toTOW
The new core 21 tends to push the hardware harder than we're used to. If your GPU quality are very different, one might be throttling more than the other (on Windows, GPU-Z gives the value which is called PerfCap reason).

Also, core 21 projects are often bigger (number of atoms) and the sanity checks take more time. Those sanity checks are performed on the CPU, so different CPUs might also be a reason that exacerbate the differences.

P.S : I have the same card as yours ;)

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 2:41 pm
by billford
toTOW wrote:The new core 21 tends to push the hardware harder than we're used to.
Yes, and Linux pushes it harder than Windows...
If your GPU quality are very different..
It's looking as if that may be the case, regardless of whether there are two versions of Maxwells or not :( .

I've put it back to full overclock and taken the adv flag off while I have a think. Can only be short-term of course, until Core_21s get released to full FAH.
P.S : I have the same card as yours ;)
Clearly a man of excellent judgement :wink:

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 3:17 pm
by Joe_H
billford wrote: I believe (from occasional comments here) that there are two types of Maxwell cards- the cards were bought several months apart so it's quite possible I've got one of each.

But I don't know how to tell... could someone give me a clue what to look for please?
The difference in Maxwell generations is not applicable here. What the difference refers to is the the chips used in different models of nVidia GPU's. First generation Maxwell chips were the GM107 and 108 used in the 745, 750, 750ti and some 8nnM GPU's. The second generation Maxwells are the GM20n chips used on the GTX 900 series cards. So your two cards would have the same generation Maxwell GPU chips on them.

The difference you are seeing might be related to slight variances seen during production of any chip. The cards you have are already factory overclocked, so one might be closer to its limit for stable computations as opposed to video output which is the criteria MSI is using when binning GPU chips.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 3:32 pm
by billford
Joe_H wrote: The difference in Maxwell generations is not applicable here. What the difference refers to is the the chips used in different models of nVidia GPU's. First generation Maxwell chips were the GM107 and 108 used in the 745, 750, 750ti and some 8nnM GPU's. The second generation Maxwells are the GM20n chips used on the GTX 900 series cards. So your two cards would have the same generation Maxwell GPU chips on them.
That explains where I got the (wrong!) thought from, thank you :)
The difference you are seeing might be related to slight variances seen during production of any chip. The cards you have are already factory overclocked, so one might be closer to its limit for stable computations as opposed to video output which is the criteria MSI is using when binning GPU chips.
Seems likely :(

When I've got a bit more time to baby-sit it I'll clock it right back to NVidia base and see what happens. It'll be a shame, it'll cut the card's PPD by about 20% even on Core_17 and 18, which are working fine. Such is life.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 3:45 pm
by toTOW
If you can run GPU Z on Windows (I don't know if a Linux tool can give such detailed informations), it's pretty easy to compare chip quality :

- chip with lower quality will run at higher voltage and lower boost clock
- as a consequence, chips with lower quality will run closer to their maximum TPD.

GPU-Z will also give you the PerfCap reason, which is the code of what is limiting your GPU.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 4:00 pm
by billford
toTOW wrote:If you can run GPU Z on Windows (I don't know if a Linux tool can give such detailed informations)
Linux tools for GPUs are a bit thin on the ground, if anyone knows of one I'll be interested. Installing Windows is not an acceptable option :wink:
- chip with lower quality will run at higher voltage and lower boost clock
Thanks, that might be a clue- I can't monitor the voltage but I mentioned up-topic that the older card tends to run at lower boost than the new one. Looks like a marginal chip on that card.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 4:32 pm
by 7im
My impression was that Core_21 ran better with drivers 350.xx and higher.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 5:32 pm
by billford
7im wrote:My impression was that Core_21 ran better with drivers 350.xx and higher.
You may well be right, but although they may alleviate the problem a little I doubt they'll cure it. And I put updating Linux GPU drivers firmly under the heading "If it ain't broke, don't fix it" :wink:

I've ended up re-installing the OS too often for my liking.

I hadn't expected the cards to be completely identical even with the same spec, but couldn't understand the drastic difference... it was Joe's observation that MSI sort them by video rather than computational performance that made that clearer to me.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 7:05 pm
by 7im
Correct, drivers will not change differences between cards. However, if newer drivers are better coded for computing, then it might bring both cards back within a similar performance envelope, and then run at about the same speed again.

Also make sure both GPUs are running the same version of Core_21. That's more likely an issue than a driver version.

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 8:18 pm
by bollix47
Another point re later drivers is that nvidia-smi gives a bit more info than previous versions. The following is from 355.11:

Code: Select all

 nvidia-smi
Tue Oct  6 16:12:46 2015
+------------------------------------------------------+
| NVIDIA-SMI 355.11     Driver Version: 355.11         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
| 70%   55C    P2   142W / 185W |    239MiB /  4090MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1062    G   /usr/bin/X                                      95MiB |
|    0      1750    G   compiz                                          31MiB |
|    0     19528    C   ...AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18    94MiB |
+-----------------------------------------------------------------------------+

Re: Maxwell GPUs

Posted: Tue Oct 06, 2015 9:01 pm
by billford
@ 7im - yes, both cards are using 0.0.11 which is the latest afaik.

@ bollix47 - I use the "NVIDIA X Server Settings" app to get the info in the top box, it also gives the PCIe bandwidth utilisation. But not the power, which might be useful.

Maybe I'll pluck up courage to update the drivers to see if they've added anything else useful. But not tonight, too close to bedtime in my time zone :wink:

Re: Maxwell GPUs

Posted: Wed Oct 07, 2015 2:21 am
by bruce
billford wrote:I've got two nominally identical systems using NVidia GTX 980's. I've always seen slight differences in the behaviour of the two cards, but with the advent of Core_21 that difference is becoming significant.
As long as they're identical, you can switch them and resume work. That would eliminate the slight chance that one is positioned in a hotter part of it's case than the other.