Page 1 of 1
					
				PPD differences in x8 vs x16 4.0 lanes on RTX4070 and higher?
				Posted: Sun Dec 17, 2023 11:39 pm
				by CaptainHalon
				Anyone noticed a performance difference in this case?
			 
			
					
				Re: PPD differences in x8 vs x16 4.0 lanes on RTX4070 and higher?
				Posted: Sun Dec 17, 2023 11:53 pm
				by bikeaddict
				One of my RTX 4070 Ti cards was in an HP Z440 for a while, which is PCIe Gen 3.0 x16 (equivalent to Gen 4.0 x8 15.754 GB/s), with no noticeable reduction in performance compared to Gen 4.0 x16.
All the times I've checked PCIe Bandwidth Utilization in the NVIDIA Settings app, it's using a single digit percentage.
			 
			
					
				Re: PPD differences in x8 vs x16 4.0 lanes on RTX4070 and higher?
				Posted: Mon Dec 18, 2023 1:25 am
				by CaptainHalon
				Hmmm...well, I feel like something odd is going on.
Got a 4070 by itself on a 13th gen Intel platform (13100), and then another 4070 along with a 4060 on an 11th gen Intel platform (11400). The 4070 by itself (13th gen platform) tends to run about 750k-1m ppd higher consistently whenever on project 18725. I've observed it enough to see the consistency from WU to WU on that particular project, so it's not random. 
Both platforms are running 16GB DDR4-3200 (verified in CPU-Z). Neither CPU is spiked to an excessive load. Both boxes running SATA SSD's. The only differences I can tell are 13th vs 11th gen, and x16 lanes vs x8 lanes. GPU-Z shows both are running PCIE 4.0. 
Both cards have the same mem clock, same GPU clock, both running temps in the 50's. You're right that bus bandwidth always shows negligible. The card with the lower PPD does show a slightly lower GPU Load percentage, maybe mid 80's vs low 90's on the higher PPD card. But I have no explanation as to why, given both cards are running at the same clocks, and observations are while they're on the same project. Also same Nvidia driver version. 
The only other experiment I could think to run is to swap the two 4070's and see if the lower PPD follows the card or stays with the box. But TBH, I just can't be arsed to do that. My curiosity only gives me so much motivation.
			 
			
					
				Re: PPD differences in x8 vs x16 4.0 lanes on RTX4070 and higher?
				Posted: Fri Dec 29, 2023 6:05 pm
				by CaptainHalon
				Update: CPU choice is definitely affecting folding performance. I needed the 13100 out of the first folding box for another project, so I replaced it with a Pentium Gold G7400. The PPD dropped on the 4070 by about 3-4%, which is roughly 300-400k ppd. It's still doing better than the 4070 that's in the dual GPU setup with the 11400, though. Keep in mind folding is all these boxes are doing, their CPU's are never pegged, and Windows Update is paused.
I suspect the difference is due to IPC and CPU cache. The i3-13100 has the same L3 as the 11400, but the 11400 is handling two GPU's. Then the G7400 has half the L3 of either, but only has to handle one GPU. So to sum up in my specific scenarios, keeping the 13100 as the bar, the G7400 show about 3-4% less ppd, and the 11400 (with two GPU's) shows about 8-10% less ppd. 
Since no one has chimed in on this thread, I'm guessing not much is really known about this. But just keep in mind that with folding unlike coin mining, CPU choice does seem to matter somewhat.
			 
			
					
				Re: PPD differences in x8 vs x16 4.0 lanes on RTX4070 and higher?
				Posted: Sat Dec 30, 2023 12:23 am
				by BobWilliams757
				I would imagine a deep dive would show that CPU clocks/architecture/memory speeds/drive speeds/etc/etc all contribute to overall folding performance, even with slower GPU's.  The newer really fast GPU's are simply fast enough to make the strengths and weaknesses more obvious, since the numbers are now big enough to notice.  Anything that slightly slows any process involved would add up to some loss, and with faster GPU's, or two faster GPU"s it's going to become more and more apparent.
If there were a version of F@H Bench with all the newer cores and such, someone could probably test away and figure out at what points performance is impacted, for just about any variable.  But since even on a machine that just folds there can be variables work unit to work unit, it takes more digging to see the trends.  Since you can't change variables "on the fly" like you can with say GPU settings, it would still take a lot more time and patience to find out which settings have impact.