NVidia folding - *real* FahCore_1?.exe CPU utilization?

Napoleon · Post by **Napoleon** » Sun Sep 16, 2012 12:04 am

It looks like an established fact that folding with NVidia GPUs requires next to nothing from the CPU, and utilities like Task Manager may report zero CPU utilization percentage. Indeed, as I'm writing this, Task Manager reports 0% for both FahCore_11.exe & FahCore_15.exe (projects 5768 and 8018). However, the reality does not seem quite so blissful. Total CPU time spent in the GPU FahCores keeps increasing gradually. Furthermore, when I take a peek with Process Explorer, I steadily get something like this when measuring over 10s intervals:

Notice anything wrong? Like, FahCore_11.exe uses uses about 3 300 million CPU cycles and FahCore_15.exe uses about 2 000 million CPU cycles during a 10s interval, but the CPU utilization percentage is practically zero? My CPU is running at 1.71GHz, which means 1.71GHz * 1000 * 10s == 17 100 million CPU cycles in 10s, per logical CPU. I've got FahCore_11.exe affinity set to logical CPU 1 and FahCore_15.exe affinity set to logical CPU3 in order to keep things simple. So FahCore_11.exe actually uses something like 3300 / 17100 * 100% == 19% of one logical CPU and FahCore_15.exe uses about 2000 / 17100 * 100% == 12%, all the time.

If I increase FahCore_1?.exe priorities to High and run something else on logical CPUs 1 & 3, I actually see a noticeable increase in plain old wall clock time to complete some relatively long running CPU task once I start my GPU folding. Unsurprisingly, the increase is very neatly explained with the actual (logical) CPU cycles / 10s it takes to fold with the GPUs. Yet even Process Explorer reports practically 0% CPU utilization percentage!

I'm running 306.23 WHQL drivers, FahCore_11.exe v1.31 and FahCore_15.exe v2.25, but I've actually noticed the issue with older driver and FahCore_15.exe as well. OK, it isn't exactly news that some people doing SMP+NVidia GPU with a HyperThreaded Intel CPU have actually gotten better PPD by allocating dedicated logical CPU(s) to GPU folding and tweaking the FahCore affinities carefully. In any case, I'm wondering if mine is some sort of weird special case, as far as the erroneous CPU utilization percentage reporting goes?

Not that I'm having actual folding problems; the GPUs are producing just fine. However, I'm interested in seeing if I am actually able to do a little bit of myth busting regarding "practically zero percent CPU utilization" when folding with NVidia GPUs...

7im · Post by **7im** » Sun Sep 16, 2012 12:43 am

Nope. Nothing weird or special.

For example, GPU1 did the exact opposite. Showed 100% CPU utilization, but didn't really use all those cycles, just polling that much.

P5-133XL · Post by **P5-133XL** » Sun Sep 16, 2012 12:44 am

Nvidia GPU's don't use 0% that really is a misconception. There is always overhead and if you are using the task manager to measure then there is OS overhead associated with the video subsystem (driver overhead) as well as general OS overhead, as well as task-switching that isn't ever reported by that particular tool that is really designed to hide the OS. My guess is that what you are observing is some of that overhead that is not being reported directly but that is just a guess. The actual amount of CPU usage will vary by processor but on my 3.2GHz hyper-threaded P4 running just GPU folding with a GTX 460 the task manager measures about 4% while the task manager itself is at 11%. If I uniprocessor fold on the same machine while GPU folding I'll see the GPU core go much higher (45+% of the hyper-threaded core) while the uniprocessor folding will still starve the GPU dropping the total PPD significantly I've seen my Nvidia GTX 460 GPU folding cores on a Q6600 go up to 8% each while SMP folding. It also matters what the core is doing. Is it folding, or setting itself up to fold or shutting down and getting ready to send each of those processes use a different amount of the CPU?

Really the big difference is the comparison between Nvidia and ATI (which requires a full core all by itself) not that Nvidia is always at 0%.

JimboPalmer · Post by **JimboPalmer** » Sun Sep 16, 2012 3:46 am

I suspect DMA is using clock cycles even though the CPU is not using them. That would match your results without any CPU time being lost.

http://en.wikipedia.org/wiki/Direct_memory_access

It specifically mentions that getting data on and off a graphics card will use DMA, certainly F@H will move data on and off the Nvidia card.

codysluder · Post by **codysluder** » Sun Sep 16, 2012 4:09 am

So if I understand correctly, NVIDIA uses a DMA access method which moves the data without actually posting enough CPU interrupts for Task Manager to pick them up and AMD uses a CPU based polling method to move data with the CPU that generates a high interrupt rate. Both will take some resources away from SMP, but AMD will cause a large effect while NVIDIA will cause a small effect. In the early days of CD-ROMs folks made a big deal about whether they used DMA or not.

Somwhere I read that to run AMD, leaving one hyperthreaded (virtual) core per GPU is sufficient but in a non-hyperthreaded CPU you still need one full core per GPU. I don't suppose the resources stolen by NVIDIA follow the same rules, but whether they do or not isn't likely to be important. My computer has whatever CPU it has and I'm not going to switch just because I want to run GPU+SMP.

Rel25917 · Post by **Rel25917** » Sun Sep 16, 2012 11:36 am

I just did a quick test on my core 2 duo system here. Frame times on a project 8066 were running about 2:53 to 2:55 while sitting here browsing the web with no gpu client running. Read this thread and fired up my gpu client(gtx560 ti) and frame times are still only 2:55 after 20 minutes. Whatever the gpu client is using it is not having any noticeable effect on my smp client

computer details if interested
Core 2 duo e8400 3ghz overclocked to 4ghz
4k ppd avg high of 5.5k on some projects.
gtx 560 ti slightly underclocked for heat reasons(one of the two fans on it died)
video driver version 280.26
WinXP sp3

artoar_11 · Post by **artoar_11** » Sun Sep 16, 2012 1:14 pm

In early summer did underclock of the GTX 460 with 50MHz. The next day I noticed that SMP client has reduced TPF. When I calculate with Bonus Calculator, I saw that the total PPD (GPU+CPU) is the same. I wondered, why I need to OC of the GPU?

Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.

Post by **bruce** » Sun Sep 16, 2012 7:10 pm

artoar_11 wrote:Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.

The amount of CPU probably depends on the size of the protein and the size of your GPU. It seems likely that with more data to move, more time is spent moving it. In the days when GPU2 was new, most GPU proteins were 300-900 atoms. As time has passed proteins increased to 1000 - 1500. Now we're seeing more in the 2000+ range. One thing for sure, the trend toward bigger proteins is going to continue, whether that actually means more measurable CPU time or not.

Napoleon · Post by **Napoleon** » Sun Sep 16, 2012 7:59 pm

Thanks for all the replies, most informative. On my behalf, I'm glad I finally figured out (with some proof) exactly why there is some previously unexplained CPU process performance variation in my case. Windows Task Manager gave me misleading information, plain and simple. Not a FAH problem, and of course I'll keep my GPUs folding anyway. Just one of those "nice to know" things, I suppose. At least it isn't some mystery malware stealing my precious CPU cycles.

Post by **bruce** » Sun Sep 16, 2012 8:13 pm

You'll find some more information on Microsoft.com. There were some WhitePapers written many years ago about what can and cannot be accurately measured by TaskMan (etc.) and the trade-offs of how much overhead it takes to track things that are "estimated." It's also closely tied to how much overhead the dispatcher is allowed to use when sorting priorities, etc.

Though not actually related, there's some interesting overlap with http://en.wikipedia.org/wiki/Uncertainty_principle

Napoleon · Post by **Napoleon** » Thu Sep 20, 2012 12:41 pm

Indeed, the uncertainty principle is universal, I just didn't expect the Task Manager readouts to be off the mark quite as much as they were in my case. Since I've been receiving relatively CPU heavy projects for my GPUs lately (10505 & 8010) and once again gotten into the habit of running two OGR crunchers (pure integer work) along with two classic slots and the two GPU slots, I considered it worthwhile to post some updated info about my case, especially after cutting back on my CPU OC.

First off, I get ~20Mnodes/s performance running just OGR. When I start the CPU slots, OGR drops to ~17Mndoes/s. I'm really suprised by how well HyperThreading manages to parallelize these two different workloads - (mostly) floating point work vs pure integer work. With careful arrangement of workload affinities, it's almost like getting the performance of an equivalent true quad. Not quite 100%, but I consider about 17 / 20 * 100% == 85% rather amazing.

When I throw GPU folding into the mix, OGR performance drops to about ~5Mnodes/s. Just so you know, I've carefully arranged priorities and affinities so that GPU folding "steals" its CPU cycles from OGR only while FAH CPU folding remains largely unaffected by the addition of GPU folding. Task Manager may paint a much prettier picture, but in my case the real life impact of GPU folding on the OGR performance is huge: drop from 17Mnodes/s to 5Mnodes/s!

EDIT: Looks like the "real" CPU utilization may vary A LOT between various NVidia GPU projects. Currently running P10502 & P7623 (as opposed to 10505 & 8010) and the OGR is crunching at 14Mnodes/s (as opposed to 5Mnodes/s).

Of course, cramming abovementioned 6 active threads on a 2C/4T CPU causes scheduling conflicts, cache contention, younameit. Admittedly, I'm running a bit strange mix of semi-dedicated 24/7 folding at the moment. Until now, I've thought my Atom330 platform to be a miniature version of its stronger siblings, so I couldn't help wondering if my observations can be extrapolated to a setup which has a stronger CPU but stronger GPUs as well. I suppose the conclusion is that more powerful setups also deal with this kind of resource race much more gracefully, whatever the reason may be.

Post by **bruce** » Thu Sep 20, 2012 6:25 pm

We can speculate about "younameit" but without a better monitoring tool than Taskmgr, it would be just that ... speculation. (The fact that Taskmgr [sometimes] gives optimistic results that happen to favor HyperThreading isn't going to give them a good reason to improve it.) Still, you're certainly getting a lot out of your 330.

P5-133XL · Post by **P5-133XL** » Thu Sep 20, 2012 8:37 pm

If you want better accuracy try using performance monitor (built into Windows too). You can isolate and measure individual characteristics and thereby determine what factors are important and what are not.

PinHead · Post by **PinHead** » Thu Sep 20, 2012 10:37 pm

I didn't go thru all the research that Napoleon did ( thanks for the info btw ), but when I put a GTX570 into a slower Q6700 PC; I noticed that it was getting a bottleneck. CPU frame times were dropping. The GPU was new, so I had no PPD reference.

When I changed the SMP slot to 99%; both the SMP and GPU PPD increased. I think 99.5 or 99.8 would have done the same thing, but I don't think I have that option.

7im · Post by **7im** » Thu Sep 20, 2012 11:22 pm

Try this same old v6 client trick. Change both back to 100% usage.

Keep SMP slot at default priority of idle. Change GPU slot priority to "low"... slot option = "core-priority" with setting = "low"

Restart clients. What happens to PPD?

Folding Forum

NVidia folding - real FahCore_1?.exe CPU utilization?

NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?

Re: NVidia folding - real FahCore_1?.exe CPU utilization?