Extremely low performance on project 13409 with Radeon VII

Crawdaddy79 · Post by **Crawdaddy79** » Thu Jun 04, 2020 7:08 pm

ThWuensche wrote:As for the PPD, I don't care much. What I care about is the volume of calculations required to support scientific results. I just bought another PC and two more Radeon VII to support the scientific results (not the PPD). As I understand the support of project moonshot will need a lot of computational power and it's sad if big part of it stays unused (for the electricity bill it's good, but ...)

What I conclude from the observations from Crawdaddy and BobWilliams supports the second of the reasons mentioned above, latency in interaction with the CPU. For a low end GPU, the GPU is limiting and not the interaction with the CPU, for a high end GPU, the interaction with the CPU becomes limiting. Thus latency would be a big show-stopper for high-end GPUs, but mostly go by unnoticed on smaller GPUs. The latency issue seems also confirmed from Crawdaddy's test with stopped CPU folding. The low memory footprint I observed is good for GPUs with limited memory and memory bandwidth, but leaves the benefit of the 16G fast RAM on the Radeon VII unused. So bigger work packages and less interaction might help to make good use of high-end GPUs (if that is possible from the nature of the WUs). Don't know about behaviour of the bigger NVidia GPUs in this situation, any ideas?

This latency exists with all work units on my system. Generally pausing the CPU fold will decrease the GPU TPF by about 10% on any given WU.

Here is the results of an 11762 WU:

CPU Unpaused, GPU nets 1.25M PPD: Image
CPU Paused, GPU nets 1.15M PPD: Image

It costs more GPU points than the CPU earns when using both simultaneously - this will vary by architecture.

_r2w_ben · Post by **_r2w_ben** » Thu Jun 04, 2020 10:21 pm

ThWuensche wrote:First possible reason might be that the problem does not scale well to highly parallel architectures, parts of code with parallelization are intermixed with and dependent on parts which can not be parallelized, that way limiting the overall performance by those parts which can not be executed in parallel. This probably would be hard to circumvent unless the algorithm may be changed. Only help in that case would be to run WUs in parallel on a GPU.

With 3840 shaders, Radeon VII is a very wide. p13409 only has 4,082 atoms, reducing available parallelization. Theoretically, if the same calculation was done for each atom and each atom received it's own shader, it would take two passes and only 242 of 3840 (5% of the GPU) would be active on the second pass. Combined the two passes would represent 52.5% utilization. As the number of atoms increases, the effect of less than full utilization on the last pass is reduced. For example, you'll probably see above average performance on p14201, which has 453,348 atoms. The same calculation per atom would take 119 passes with 228 of 3840 (6% of the GPU) utilized on the last pass. All passes combined would represent 99.2% utilization.

BobWilliams757's onboard video benefits from the lack of utilization on the higher end GPUs used when p13409 was benchmarked. The low number of shaders results in close to full utilization regardless of the number of atoms.

This is an oversimplification since Molecular Dynamics involves a lot of calculations per atom and between pairs of atoms each timestep but should give an idea of what's happening.

mad_martn · Post by **mad_martn** » Fri Jun 05, 2020 9:27 am

i want to add extreme low performance with 13409 on Vega 56 (debian testing with opencl bits from amdgpu-pro 20.10) as well as 13409 very low performance on Polaris RX 570 (debian like the other rig and another one with windows 10)

ajm · Post by **ajm** » Fri Jun 05, 2020 10:11 am

Same here with a 5700XT on Win 10 using 7.5.1

project:13409 run:444 clone:37 gen:1
project:13409 run:331 clone:40 gen:0
project:13409 run:163 clone:62 gen:0
project:13409 run:461 clone:57 gen:0
project:13409 run:144 clone:62 gen:1
project:13409 run:310 clone:69 gen:0
project:13409 run:265 clone:71 gen:0
project:13409 run:176 clone:71 gen:1
project:13409 run:299 clone:82 gen:0

Most of those WUs have been crunched by others before me with faulty results.
See https://apps.foldingathome.org/wu#proje ... e=40&gen=0 for example.

EDIT: Strangely, none of my other 3 GPUs working in the same time has received any 13409 WUs.

BobWilliams757 · Post by **BobWilliams757** » Fri Jun 05, 2020 10:47 am

_r2w_ben wrote:
ThWuensche wrote:First possible reason might be that the problem does not scale well to highly parallel architectures, parts of code with parallelization are intermixed with and dependent on parts which can not be parallelized, that way limiting the overall performance by those parts which can not be executed in parallel. This probably would be hard to circumvent unless the algorithm may be changed. Only help in that case would be to run WUs in parallel on a GPU.
With 3840 shaders, Radeon VII is a very wide. p13409 only has 4,082 atoms, reducing available parallelization. Theoretically, if the same calculation was done for each atom and each atom received it's own shader, it would take two passes and only 242 of 3840 (5% of the GPU) would be active on the second pass. Combined the two passes would represent 52.5% utilization. As the number of atoms increases, the effect of less than full utilization on the last pass is reduced. For example, you'll probably see above average performance on p14201, which has 453,348 atoms. The same calculation per atom would take 119 passes with 228 of 3840 (6% of the GPU) utilized on the last pass. All passes combined would represent 99.2% utilization.

BobWilliams757's onboard video benefits from the lack of utilization on the higher end GPUs used when p13409 was benchmarked. The low number of shaders results in close to full utilization regardless of the number of atoms.

This is an oversimplification since Molecular Dynamics involves a lot of calculations per atom and between pairs of atoms each timestep but should give an idea of what's happening.

I think this is fairly solid thinking. But there are occasions where even my modest APU isn't maxed out. I just had another instance of 13409 running, and GPU utilization was peaking at around 85%, with the memory use also low at 25%. IIRC project 13408 was also on the low side for use. On most mid to larger atom count WU's, GPU is sustained at 95-100%, and memory utilization can get up near 60%.

I can't for the life of me figure out why my APU will work fine on WU's that run slow and/or cause errors on some of the higher end AMD cards. I've yet to have an error on a WU. Maybe it's just going so slow that nothing can really push is over the edge, where as the much faster cards are operating closer to the edge of the envelope when folding.

Crawdaddy79 · Post by **Crawdaddy79** » Fri Jun 05, 2020 11:06 am

I take back what I said about points. I got seven of these in the last 12 hours. Not cool.

foldy · Post by **foldy** » Fri Jun 05, 2020 11:14 am

The problem with this project is it has only 4082 atoms count. It is either misconfigured by the project owner. Or it should only be send to very slow GPUs.

Normally a small project has 25k atoms count, average is 65k atoms count, big projects have >150k atoms count.

A Radeon VII or Vega could run 10 of these 13049 projects in parallel but seems is not possible anymore.

It seems 13409 is still in beta? So if any of you use FAH beta flag you should remove it to get bigger work units.

But 13407 seems non beta and has the same issue with only 4082 atoms count.

Post by **PantherX** » Fri Jun 05, 2020 11:27 am

Crawdaddy79 wrote:I take back what I said about points. I got seven of these in the last 12 hours. Not cool.

My I suggest a suggest perspective that might change your POV... while these 1340X Projects yield a low PPD, they are in fact highly experimental and were specifically created to help the Moonshot project (https://covid.postera.ai/covid). Researchers are working to understand this and make it a more efficient process which would mean that the next set of these Projects could be more efficient. Helping to create a potential drug for COVID-19 would be a historic milestone for humanity given that it would be released to the public for free without any restricts. It might seem frustrating in the short term but IMO, it's totally worth for the long term

_r2w_ben · Post by **_r2w_ben** » Fri Jun 05, 2020 11:29 am

ajm wrote:EDIT: Strangely, none of my other 3 GPUs working in the same time has received any 13409 WUs.

Radeon VII is GCN so it's excluded from projects that would throw a shortSortList error. This will be fixed in a new build of core 22. GCN restrictions for projects will be removed and then a greater variety of work will be available.

Adding or removing client-type of advanced from extra sllot options will also influence assignments. Advanced work units are newer projects that haven't been as thoroughly tested.

Crawdaddy79 · Post by **Crawdaddy79** » Fri Jun 05, 2020 5:11 pm

Since 6 AM I've gotten four more of these. The server seems to like my system; I've only had one faulty return and I think it was a legit bad work unit.

PantherX wrote:
Crawdaddy79 wrote:I take back what I said about points. I got seven of these in the last 12 hours. Not cool.
My I suggest a suggest perspective that might change your POV... while these 1340X Projects yield a low PPD, they are in fact highly experimental and were specifically created to help the Moonshot project (https://covid.postera.ai/covid). Researchers are working to understand this and make it a more efficient process which would mean that the next set of these Projects could be more efficient. Helping to create a potential drug for COVID-19 would be a historic milestone for humanity given that it would be released to the public for free without any restricts. It might seem frustrating in the short term but IMO, it's totally worth for the long term

Science...

Just kidding. I did look at the project page last night and I agree it is worthy. Had to turn the temp on my A/C up though because my PC is no longer keeping my basement warm.

Hopefully this will drag some of the teams creeping up on us back some as well.

zoboomafu · Post by **zoboomafu** » Fri Jun 05, 2020 7:58 pm

Crawdaddy79 wrote:Since 6 AM I've gotten four more of these. The server seems to like my system; I've only had one faulty return and I think it was a legit bad work unit.

I'm glad to see I'm not alone. I've gotten quite a few of them in the last 24 hours. I ended up setting up the GPU twice, and at least 1 of the 2 instances is almost always running one. I did get a faulty one, which 5 people ended up returning as faulty. I have seen a few times that both of my slots have a 13409 project running, and while frustrating, I have to remind myself of why I do this. So I'm glad to help them "fix" the core or the project and to support their efforts.

BobWilliams757 · Post by **BobWilliams757** » Sat Jun 06, 2020 11:14 am

Crawdaddy79 wrote:Since 6 AM I've gotten four more of these. The server seems to like my system; I've only had one faulty return and I think it was a legit bad work unit.

PantherX wrote:
Crawdaddy79 wrote:I take back what I said about points. I got seven of these in the last 12 hours. Not cool.
My I suggest a suggest perspective that might change your POV... while these 1340X Projects yield a low PPD, they are in fact highly experimental and were specifically created to help the Moonshot project (https://covid.postera.ai/covid). Researchers are working to understand this and make it a more efficient process which would mean that the next set of these Projects could be more efficient. Helping to create a potential drug for COVID-19 would be a historic milestone for humanity given that it would be released to the public for free without any restricts. It might seem frustrating in the short term but IMO, it's totally worth for the long term
Science...

Just kidding. I did look at the project page last night and I agree it is worthy. Had to turn the temp on my A/C up though because my PC is no longer keeping my basement warm.

Hopefully this will drag some of the teams creeping up on us back some as well.

If it makes you feel any better, my 13409 projects have been interrupted by 16435 projects a couple of times now. And though I didn't have the error issues on 16435, it is a long slow grind on my computer to return a low PPD, even for this thing. Taking the better part of a full day and only returning 79K PPD, while 13409 will return 165K+ at the same system settings. And even this onboard chip won't use full GPU utilization, so if this was in your basement you'd be really cold.

I've also had this and 16435 have a trend of being issued back to back a number of times. With other projects that seems more rare, but if the servers are issuing that many there must be a lot of them that need to be done.

In the end, if it keeps folding, I'll keep it running. TBH the points for me is just a gauge of how my hardware deals with certain WU's. I wasn't even planning on getting a passkey until I realized that the majority of comparisons on PPD were based on the passkey use with QRB points.

digiTTal · Post by **digiTTal** » Sun Jun 07, 2020 1:26 pm

On RX 470 same situation with project 13409. Low GPU usage (70%), high CPU core usage (>80% 1 core). 1 bad WU. ppd <50% of average.

Code: Select all

Project Run	Clone Gen Core        FrameTime  Result			 PPD	   Credit
13409	94 	28 	0	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523
13409	294	29 	1	OPENMM_22	00:00:59	FINISHED_UNIT	171585	11717
13409	380	54 	0	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523
13409	433	57 	0	OPENMM_22	00:01:15	FINISHED_UNIT	119719	10392
13409	91 	62 	0	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523
13409	651	62 	1	OPENMM_22	00:01:00	FINISHED_UNIT	167313	11619
13409	641	63 	1	OPENMM_22	00:01:00	FINISHED_UNIT	167313	11619
13409	43 	98 	0	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523
13409	278	56 	1	OPENMM_22	00:00:58	FINISHED_UNIT	176041	11818
13409	257	129	1	OPENMM_22	00:00:58	FINISHED_UNIT	176041	11818
13409	343	143	1	OPENMM_22	00:01:04	FINISHED_UNIT	151875	11250
13409	528	149	1	OPENMM_22	00:01:00	FINISHED_UNIT	167313	11619
13409	382	91 	1	OPENMM_22	00:00:00	BAD_WORK_UNIT	0     	2500
13409	496	175	0	OPENMM_22	00:00:58	FINISHED_UNIT	176041	11818
13409	492	174	1	OPENMM_22	00:00:59	FINISHED_UNIT	171585	11717
13409	26 	181	1	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523
13409	578	183	1	OPENMM_22	00:01:00	FINISHED_UNIT	167313	11619
13409	477	190	0	OPENMM_22	00:01:01	FINISHED_UNIT	163216	11523

Folding Forum

Extremely low performance on project 13409 with Radeon VII

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V

Re: Extremely low performance on project 13409 with Radeon V