Page 1 of 2
Project 13402
Posted: Mon May 04, 2020 4:55 am
by Nuitari
From the announcement, it is expected that WUs will take 3 to 4h and should only be assigned for fast GPUs.
viewtopic.php?f=24&t=35056
I do see that a WU got assigned to a RX 560, I'm not sure this counts as "fast". It should take about 15h to do 1 WU on that project.
Re: Project 13402
Posted: Mon May 04, 2020 5:46 am
by JohnChodera
That's odd---I'm not finding the RX 560 in GPUs.txt:
Code: Select all
$ grep 560 GPUs.txt
0x1002:0x68b9:::Juniper [Radeon HD 5600/5700]
0x1002:0x68c1:::Redwood [Radeon HD 5600 Series]
0x1002:0x731f:1:6:Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
0x1002:0x958c:::RV630GL [FireGL v5600]
0x1002:0x9904:1:5:Trinity [Radeon HD 7560D] S 389
0x10de:0x019d:::G80 [Quadro FX 5600]
0x10de:0x0311:::NV31 [GeForce FX 5600 Ultra]
0x10de:0x0312:::NV31 [GeForce FX 5600]
0x10de:0x0314:::NV31 [GeForce FX 5600XT]
0x10de:0x031a:::NV31M [GeForce FX Go5600]
0x10de:0x039e:::G73GL [Quadro FX 560]
0x10de:0x1082:2:2:GF114 [GeForce GTX 560 Ti]
0x10de:0x1084:2:2:GF114 [GeForce GTX 560]
0x10de:0x1087:2:2:GF110 [GeForce GTX 560 Ti]
0x10de:0x1200:2:2:GF114 [GeForce GTX 560 Ti]
0x10de:0x1201:2:2:GF114 [GeForce GTX 560]
0x10de:0x1202:2:2:GF114 [GeForce GTX 560 Ti OEM]
0x10de:0x1208:2:2:GF114 [GeForce GTX 560 SE]
0x10de:0x1251:2:2:GF116 [GeForce GTX 560M]
What GPU device ID is this, and which GPUSpecies does it get assigned to?
Re: Project 13402
Posted: Mon May 04, 2020 5:47 am
by JohnChodera
I've restricted the AMD GPU Species to >=6 until we can sort this out.
~ John Chodera // MSKCC
Re: Project 13402
Posted: Mon May 04, 2020 6:10 am
by Joe_H
JohnChodera wrote:That's odd---I'm not finding the RX 560 in GPUs.txt:
The RX 560 is most likely to show as a RX 460 in GPUs.txt, one version is basically the same GPU chip at a slightly higher clock rate. Or, since there were 3 different variants of the RX 560, it might match one of the other entries for cards based on the Baffin chip from AMD.
Re: Project 13402
Posted: Mon May 04, 2020 6:14 am
by Nuitari
pci id is 1002:67ef
In GPUs.txt it is identified as:
Code: Select all
0x1002:0x67ef:1:5:Baffin XT [Radeon RX 460]
The lspci output shows
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev e5)
The RX570 is also considered as species 5 and can do it at a TPF of 4m 3s, so about 6.75h, not sure if you want to exclude those or not.
Re: Project 13402
Posted: Mon May 04, 2020 6:20 am
by JohnChodera
There's no way to selectively exclude a subset of Species 5 cards, so I'm forced to exclude all Species 5 until we can further refine the GPU Species for AMD.
We have some plans to do data analytics on some standard benchmark systems this week to improve the state of things!
~ John Chodera // MSKCC
Re: Project 13402
Posted: Mon May 04, 2020 8:08 am
by VegaZhree3
I also got assigned this WU. GPU is GTX 1660 Super, but this WU utilizes the hardware differenty. Usually the WUs would make the GPU draw about 110-120W. When running this one it draws max. 90W, and there is no power limit or something like that. Also this one have %70 "Copy" load when checking from the task manager, normally it was around %30. The usual GPU load from GPU-Z is similiar with the other WUs. The PPD is the lowest i've seen, around 450K.
Re: Project 13402
Posted: Mon May 04, 2020 2:04 pm
by JohnChodera
Thanks for the report, @VegaZhree3! This is a new type of workload for us---one that allows us to directly make predictions for the chemists about which molecules to make---and we're still iteratively refining performance.
~ John Chodera // MSKCC
Re: Project 13402
Posted: Mon May 04, 2020 2:21 pm
by Nuitari
@JohnChodera the same seems to apply to project 13403 (75,10,2)
I'm seeting about 91W usage on the nvidia GTX 1660 Super, 85% gpu utilisation, PCIe usage at 19%
The core thread is using 50% of a core at all times.
Increasing the priority of the CPU FAHCore_22 thread does increase the GPU to 99% usage and about 100W (out of 125W). PCIe usage jumps to 23%
Re: Project 13402
Posted: Mon May 04, 2020 2:28 pm
by HaloJones
13403 (11, 94, 2) on a dedicated 1070 with Windows 10. TPF 2:35 for an estimated 732225 ppd.
GPU Usage is showing at around 93%, Bus at 38%
PPD is a little low for this card but not excessively so.
Re: Project 13402
Posted: Mon May 04, 2020 2:46 pm
by Nuitari
Tweaking the CPU priority for the RX560 reduced the TPF enough to save about 2h on a whole WU.
Re: Project 13402
Posted: Mon May 04, 2020 3:01 pm
by muziqaz
Problem with AMD is they have no clue how to systematically codename their GPUs, and they just assigned same "string" if you like to a wide bunch of their GPUs.
You have
AMDSpecies6 which is 5600, 5700, 5700xt
AMDSpecies5 which is Radeon7, Vega56/64 and all the frikkin GPUs bellow that.
Researchers can only exclude GPUs per Species. It similar to nVidia as well, where one species includes everything from that series of GPU including ultra fast high end and ultra slow low end.
If you exclude everything from AMD except AMDSpecies6 you end up with 4-6 cards available from AMD side, which are 5600, 5600xt, 5700, 5700xt, and probably much slower 5500, 5500xt. In process you exclude super fast Radeon 7, which is doing extremely well on these two projects, and also you exclude Vega56 and 64 which again are doing wonderful on this project. Suggestion was given to leave Species5 in before these projects went live. I still am for it, because again, we are losing crap ton of fast cards in Species5 if we leave Species6 only for these projects.
My suggestion would be to increase the deadline for these projects a bit to accommodate a bit slower cards. When does the server re-issue a WU? on Preferred deadline or Final deadline? If its on preferred deadline, then shorten it, but then again, no one wants to fold a WU for a day and see it being dumped by server just because someone with much faster GPU finished it before you
There is no favorable outcome out of this situation until we come up with new identification process in new fahclient.
On the other hand, these projects are getting chewed up by masses of nVidia GPU folders anyway, I would guess
Re: Project 13402
Posted: Mon May 04, 2020 3:04 pm
by muziqaz
By the way, these two projects love high shader count, wide mem controller cards, which is higher end hardware. All the mid range, low end cards, be it nvidia or AMD will see lower than normal PPD, while higher, ultra high end GPUs will shine on these.
We see these kind of discrepancies with PPD on CPU projects a lot, I believe it's time to get used to similar behavior on GPU side as well. Some projects work well on certain hardware, others on other
Re: Project 13402
Posted: Mon May 04, 2020 3:26 pm
by jrweiss
Currently running 13402 (65, 64, 0) on my 1050ti. With 61% complete, it's showing 6:28 per frame, and est 172602 PPD. Some other current projects get over 200K PPD, but this is within range...
Re: Project 13402
Posted: Mon May 04, 2020 5:02 pm
by Nuitari
TBH I really don't care about the PPD. Its nice, but its far from the whole point of doing this.
The only reason I posted about it is the announcement seems to indicate they want a quick (3 to 4h) turnaround and that there are cards which are clearly not going to match this window of expectation. If they are ok with slower turnaround (like 15h) then I'm fine with it, even if I'd get less PPD then otherwise. The original WU completed in 14h 11m on the slowest RX560 I have.
From what I read, the server will reissue a WU once the preferred deadline has been reached.