Page 1 of 2
Maxwell GPUs
Posted: Tue Oct 06, 2015 9:17 am
by billford
I've got two nominally identical systems using NVidia GTX 980's. I've always seen slight differences in the behaviour of the two cards, but with the advent of Core_21 that difference is becoming significant.
I believe (from occasional comments here) that there are two types of Maxwell cards- the cards were bought several months apart so it's quite possible I've got one of each.
But I don't know how to tell... could someone give me a clue what to look for please?
(If it involves software in some way, both are running under Linux)
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 9:24 am
by ChristianVirtual
What differences in behavior ?
Are the system really the same, CPU, memory, storage ...
Some of the newer core are writing more frequent checkpoints which might impact runtime characteristics.
And of course differences within different PRCGs might hit you, too.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 9:58 am
by billford
ChristianVirtual wrote:What differences in behavior ?
Are the system really the same, CPU, memory, storage ...
Some of the newer core are writing more frequent checkpoints which might impact runtime characteristics.
And of course differences within different PRCGs might hit you, too.
There are slight differences in the motherboards (Gigabyte GA-H81M-DS2V @ 3.2GHz, GA-H81M-S2PV @ 3.0GHz), otherwise identical.
Cards are both from MSI-
these ones. The newer card is on the 3.0GHz mobo.
OS (Mint 17.2), drivers (346.35) etc- identical.
The differences on Core_17 and 18 were slight but noticeable- mainly in the way the boost clock adjusted itself, to the extent that the average PPD (over several thousand WUs) was about 10k (in ~420k) higher on the (newer) card in the slower mobo.
On Core_21, the newer card seems quite happy (so far) with 125MHz of further overclocking whereas the older one struggles with no extra overclock. Particularly on the later P69xx series- the earlier ones that show up in psummary as "UNKNOWN_ENUM" were more or less OK. Though I'll admit I haven't had many of the new ones.
If I can find that the cards are of different generations then I'll just put it down to one of those things and play with the clocks accordingly... and that's all I'm asking- how do I find out?
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 12:55 pm
by toTOW
The new core 21 tends to push the hardware harder than we're used to. If your GPU quality are very different, one might be throttling more than the other (on Windows, GPU-Z gives the value which is called PerfCap reason).
Also, core 21 projects are often bigger (number of atoms) and the sanity checks take more time. Those sanity checks are performed on the CPU, so different CPUs might also be a reason that exacerbate the differences.
P.S : I have the same card as yours
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 2:41 pm
by billford
toTOW wrote:The new core 21 tends to push the hardware harder than we're used to.
Yes, and Linux pushes it harder than Windows...
If your GPU quality are very different..
It's looking as if that may be the case, regardless of whether there are two versions of Maxwells or not
.
I've put it back to full overclock and taken the adv flag off while I have a think. Can only be short-term of course, until Core_21s get released to full FAH.
P.S : I have the same card as yours
Clearly a man of excellent judgement
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 3:17 pm
by Joe_H
billford wrote:
I believe (from occasional comments here) that there are two types of Maxwell cards- the cards were bought several months apart so it's quite possible I've got one of each.
But I don't know how to tell... could someone give me a clue what to look for please?
The difference in Maxwell generations is not applicable here. What the difference refers to is the the chips used in different models of nVidia GPU's. First generation Maxwell chips were the GM107 and 108 used in the 745, 750, 750ti and some 8nnM GPU's. The second generation Maxwells are the GM20n chips used on the GTX 900 series cards. So your two cards would have the same generation Maxwell GPU chips on them.
The difference you are seeing might be related to slight variances seen during production of any chip. The cards you have are already factory overclocked, so one might be closer to its limit for stable computations as opposed to video output which is the criteria MSI is using when binning GPU chips.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 3:32 pm
by billford
Joe_H wrote:
The difference in Maxwell generations is not applicable here. What the difference refers to is the the chips used in different models of nVidia GPU's. First generation Maxwell chips were the GM107 and 108 used in the 745, 750, 750ti and some 8nnM GPU's. The second generation Maxwells are the GM20n chips used on the GTX 900 series cards. So your two cards would have the same generation Maxwell GPU chips on them.
That explains where I got the (wrong!) thought from, thank you
The difference you are seeing might be related to slight variances seen during production of any chip. The cards you have are already factory overclocked, so one might be closer to its limit for stable computations as opposed to video output which is the criteria MSI is using when binning GPU chips.
Seems likely
When I've got a bit more time to baby-sit it I'll clock it right back to NVidia base and see what happens. It'll be a shame, it'll cut the card's PPD by about 20% even on Core_17 and 18, which are working fine. Such is life.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 3:45 pm
by toTOW
If you can run GPU Z on Windows (I don't know if a Linux tool can give such detailed informations), it's pretty easy to compare chip quality :
- chip with lower quality will run at higher voltage and lower boost clock
- as a consequence, chips with lower quality will run closer to their maximum TPD.
GPU-Z will also give you the PerfCap reason, which is the code of what is limiting your GPU.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 4:00 pm
by billford
toTOW wrote:If you can run GPU Z on Windows (I don't know if a Linux tool can give such detailed informations)
Linux tools for GPUs are a bit thin on the ground, if anyone knows of one I'll be interested. Installing Windows is not an acceptable option
- chip with lower quality will run at higher voltage and lower boost clock
Thanks, that might be a clue- I can't monitor the voltage but I mentioned up-topic that the older card tends to run at lower boost than the new one. Looks like a marginal chip on that card.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 4:32 pm
by 7im
My impression was that Core_21 ran better with drivers 350.xx and higher.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 5:32 pm
by billford
7im wrote:My impression was that Core_21 ran better with drivers 350.xx and higher.
You may well be right, but although they may alleviate the problem a little I doubt they'll cure it. And I put updating Linux GPU drivers firmly under the heading "If it ain't broke, don't fix it"
I've ended up re-installing the OS too often for my liking.
I hadn't expected the cards to be
completely identical even with the same spec, but couldn't understand the drastic difference... it was Joe's observation that MSI sort them by video rather than computational performance that made that clearer to me.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 7:05 pm
by 7im
Correct, drivers will not change differences between cards. However, if newer drivers are better coded for computing, then it might bring both cards back within a similar performance envelope, and then run at about the same speed again.
Also make sure both GPUs are running the same version of Core_21. That's more likely an issue than a driver version.
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 8:18 pm
by bollix47
Another point re later drivers is that nvidia-smi gives a bit more info than previous versions. The following is from 355.11:
Code: Select all
nvidia-smi
Tue Oct 6 16:12:46 2015
+------------------------------------------------------+
| NVIDIA-SMI 355.11 Driver Version: 355.11 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Off | 0000:01:00.0 On | N/A |
| 70% 55C P2 142W / 185W | 239MiB / 4090MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1062 G /usr/bin/X 95MiB |
| 0 1750 G compiz 31MiB |
| 0 19528 C ...AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 94MiB |
+-----------------------------------------------------------------------------+
Re: Maxwell GPUs
Posted: Tue Oct 06, 2015 9:01 pm
by billford
@ 7im - yes, both cards are using 0.0.11 which is the latest afaik.
@ bollix47 - I use the "NVIDIA X Server Settings" app to get the info in the top box, it also gives the PCIe bandwidth utilisation. But not the power, which might be useful.
Maybe I'll pluck up courage to update the drivers to see if they've added anything else useful. But not tonight, too close to bedtime in my time zone
Re: Maxwell GPUs
Posted: Wed Oct 07, 2015 2:21 am
by bruce
billford wrote:I've got two nominally identical systems using NVidia GTX 980's. I've always seen slight differences in the behaviour of the two cards, but with the advent of Core_21 that difference is becoming significant.
As long as they're identical, you can switch them and resume work. That would eliminate the slight chance that one is positioned in a hotter part of it's case than the other.