NVidia folding - *real* FahCore_1?.exe CPU utilization?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
NVidia folding - *real* FahCore_1?.exe CPU utilization?
It looks like an established fact that folding with NVidia GPUs requires next to nothing from the CPU, and utilities like Task Manager may report zero CPU utilization percentage. Indeed, as I'm writing this, Task Manager reports 0% for both FahCore_11.exe & FahCore_15.exe (projects 5768 and 8018). However, the reality does not seem quite so blissful. Total CPU time spent in the GPU FahCores keeps increasing gradually. Furthermore, when I take a peek with Process Explorer, I steadily get something like this when measuring over 10s intervals:
Notice anything wrong? Like, FahCore_11.exe uses uses about 3 300 million CPU cycles and FahCore_15.exe uses about 2 000 million CPU cycles during a 10s interval, but the CPU utilization percentage is practically zero? My CPU is running at 1.71GHz, which means 1.71GHz * 1000 * 10s == 17 100 million CPU cycles in 10s, per logical CPU. I've got FahCore_11.exe affinity set to logical CPU 1 and FahCore_15.exe affinity set to logical CPU3 in order to keep things simple. So FahCore_11.exe actually uses something like 3300 / 17100 * 100% == 19% of one logical CPU and FahCore_15.exe uses about 2000 / 17100 * 100% == 12%, all the time.
If I increase FahCore_1?.exe priorities to High and run something else on logical CPUs 1 & 3, I actually see a noticeable increase in plain old wall clock time to complete some relatively long running CPU task once I start my GPU folding. Unsurprisingly, the increase is very neatly explained with the actual (logical) CPU cycles / 10s it takes to fold with the GPUs. Yet even Process Explorer reports practically 0% CPU utilization percentage!
I'm running 306.23 WHQL drivers, FahCore_11.exe v1.31 and FahCore_15.exe v2.25, but I've actually noticed the issue with older driver and FahCore_15.exe as well. OK, it isn't exactly news that some people doing SMP+NVidia GPU with a HyperThreaded Intel CPU have actually gotten better PPD by allocating dedicated logical CPU(s) to GPU folding and tweaking the FahCore affinities carefully. In any case, I'm wondering if mine is some sort of weird special case, as far as the erroneous CPU utilization percentage reporting goes?
Not that I'm having actual folding problems; the GPUs are producing just fine. However, I'm interested in seeing if I am actually able to do a little bit of myth busting regarding "practically zero percent CPU utilization" when folding with NVidia GPUs...
Notice anything wrong? Like, FahCore_11.exe uses uses about 3 300 million CPU cycles and FahCore_15.exe uses about 2 000 million CPU cycles during a 10s interval, but the CPU utilization percentage is practically zero? My CPU is running at 1.71GHz, which means 1.71GHz * 1000 * 10s == 17 100 million CPU cycles in 10s, per logical CPU. I've got FahCore_11.exe affinity set to logical CPU 1 and FahCore_15.exe affinity set to logical CPU3 in order to keep things simple. So FahCore_11.exe actually uses something like 3300 / 17100 * 100% == 19% of one logical CPU and FahCore_15.exe uses about 2000 / 17100 * 100% == 12%, all the time.
If I increase FahCore_1?.exe priorities to High and run something else on logical CPUs 1 & 3, I actually see a noticeable increase in plain old wall clock time to complete some relatively long running CPU task once I start my GPU folding. Unsurprisingly, the increase is very neatly explained with the actual (logical) CPU cycles / 10s it takes to fold with the GPUs. Yet even Process Explorer reports practically 0% CPU utilization percentage!
I'm running 306.23 WHQL drivers, FahCore_11.exe v1.31 and FahCore_15.exe v2.25, but I've actually noticed the issue with older driver and FahCore_15.exe as well. OK, it isn't exactly news that some people doing SMP+NVidia GPU with a HyperThreaded Intel CPU have actually gotten better PPD by allocating dedicated logical CPU(s) to GPU folding and tweaking the FahCore affinities carefully. In any case, I'm wondering if mine is some sort of weird special case, as far as the erroneous CPU utilization percentage reporting goes?
Not that I'm having actual folding problems; the GPUs are producing just fine. However, I'm interested in seeing if I am actually able to do a little bit of myth busting regarding "practically zero percent CPU utilization" when folding with NVidia GPUs...
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
Nope. Nothing weird or special.
For example, GPU1 did the exact opposite. Showed 100% CPU utilization, but didn't really use all those cycles, just polling that much.
For example, GPU1 did the exact opposite. Showed 100% CPU utilization, but didn't really use all those cycles, just polling that much.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
Nvidia GPU's don't use 0% that really is a misconception. There is always overhead and if you are using the task manager to measure then there is OS overhead associated with the video subsystem (driver overhead) as well as general OS overhead, as well as task-switching that isn't ever reported by that particular tool that is really designed to hide the OS. My guess is that what you are observing is some of that overhead that is not being reported directly but that is just a guess. The actual amount of CPU usage will vary by processor but on my 3.2GHz hyper-threaded P4 running just GPU folding with a GTX 460 the task manager measures about 4% while the task manager itself is at 11%. If I uniprocessor fold on the same machine while GPU folding I'll see the GPU core go much higher (45+% of the hyper-threaded core) while the uniprocessor folding will still starve the GPU dropping the total PPD significantly I've seen my Nvidia GTX 460 GPU folding cores on a Q6600 go up to 8% each while SMP folding. It also matters what the core is doing. Is it folding, or setting itself up to fold or shutting down and getting ready to send each of those processes use a different amount of the CPU?
Really the big difference is the comparison between Nvidia and ATI (which requires a full core all by itself) not that Nvidia is always at 0%.
Really the big difference is the comparison between Nvidia and ATI (which requires a full core all by itself) not that Nvidia is always at 0%.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
I suspect DMA is using clock cycles even though the CPU is not using them. That would match your results without any CPU time being lost.
http://en.wikipedia.org/wiki/Direct_memory_access
It specifically mentions that getting data on and off a graphics card will use DMA, certainly F@H will move data on and off the Nvidia card.
http://en.wikipedia.org/wiki/Direct_memory_access
It specifically mentions that getting data on and off a graphics card will use DMA, certainly F@H will move data on and off the Nvidia card.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 1024
- Joined: Sun Dec 02, 2007 12:43 pm
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
So if I understand correctly, NVIDIA uses a DMA access method which moves the data without actually posting enough CPU interrupts for Task Manager to pick them up and AMD uses a CPU based polling method to move data with the CPU that generates a high interrupt rate. Both will take some resources away from SMP, but AMD will cause a large effect while NVIDIA will cause a small effect. In the early days of CD-ROMs folks made a big deal about whether they used DMA or not.
Somwhere I read that to run AMD, leaving one hyperthreaded (virtual) core per GPU is sufficient but in a non-hyperthreaded CPU you still need one full core per GPU. I don't suppose the resources stolen by NVIDIA follow the same rules, but whether they do or not isn't likely to be important. My computer has whatever CPU it has and I'm not going to switch just because I want to run GPU+SMP.
Somwhere I read that to run AMD, leaving one hyperthreaded (virtual) core per GPU is sufficient but in a non-hyperthreaded CPU you still need one full core per GPU. I don't suppose the resources stolen by NVIDIA follow the same rules, but whether they do or not isn't likely to be important. My computer has whatever CPU it has and I'm not going to switch just because I want to run GPU+SMP.
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
I just did a quick test on my core 2 duo system here. Frame times on a project 8066 were running about 2:53 to 2:55 while sitting here browsing the web with no gpu client running. Read this thread and fired up my gpu client(gtx560 ti) and frame times are still only 2:55 after 20 minutes. Whatever the gpu client is using it is not having any noticeable effect on my smp client
computer details if interested
Core 2 duo e8400 3ghz overclocked to 4ghz
4k ppd avg high of 5.5k on some projects.
gtx 560 ti slightly underclocked for heat reasons(one of the two fans on it died)
video driver version 280.26
WinXP sp3
computer details if interested
Core 2 duo e8400 3ghz overclocked to 4ghz
4k ppd avg high of 5.5k on some projects.
gtx 560 ti slightly underclocked for heat reasons(one of the two fans on it died)
video driver version 280.26
WinXP sp3
-
- Posts: 652
- Joined: Sun Nov 22, 2009 8:42 pm
- Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
- Location: Bulgaria/Team #224497/artoar11_ALL_....
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
In early summer did underclock of the GTX 460 with 50MHz. The next day I noticed that SMP client has reduced TPF. When I calculate with Bonus Calculator, I saw that the total PPD (GPU+CPU) is the same. I wondered, why I need to OC of the GPU?
Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.
Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
The amount of CPU probably depends on the size of the protein and the size of your GPU. It seems likely that with more data to move, more time is spent moving it. In the days when GPU2 was new, most GPU proteins were 300-900 atoms. As time has passed proteins increased to 1000 - 1500. Now we're seeing more in the 2000+ range. One thing for sure, the trend toward bigger proteins is going to continue, whether that actually means more measurable CPU time or not.artoar_11 wrote:Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
Thanks for all the replies, most informative. On my behalf, I'm glad I finally figured out (with some proof) exactly why there is some previously unexplained CPU process performance variation in my case. Windows Task Manager gave me misleading information, plain and simple. Not a FAH problem, and of course I'll keep my GPUs folding anyway. Just one of those "nice to know" things, I suppose. At least it isn't some mystery malware stealing my precious CPU cycles.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
You'll find some more information on Microsoft.com. There were some WhitePapers written many years ago about what can and cannot be accurately measured by TaskMan (etc.) and the trade-offs of how much overhead it takes to track things that are "estimated." It's also closely tied to how much overhead the dispatcher is allowed to use when sorting priorities, etc.
Though not actually related, there's some interesting overlap with http://en.wikipedia.org/wiki/Uncertainty_principle
Though not actually related, there's some interesting overlap with http://en.wikipedia.org/wiki/Uncertainty_principle
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
Indeed, the uncertainty principle is universal, I just didn't expect the Task Manager readouts to be off the mark quite as much as they were in my case. Since I've been receiving relatively CPU heavy projects for my GPUs lately (10505 & 8010) and once again gotten into the habit of running two OGR crunchers (pure integer work) along with two classic slots and the two GPU slots, I considered it worthwhile to post some updated info about my case, especially after cutting back on my CPU OC.
First off, I get ~20Mnodes/s performance running just OGR. When I start the CPU slots, OGR drops to ~17Mndoes/s. I'm really suprised by how well HyperThreading manages to parallelize these two different workloads - (mostly) floating point work vs pure integer work. With careful arrangement of workload affinities, it's almost like getting the performance of an equivalent true quad. Not quite 100%, but I consider about 17 / 20 * 100% == 85% rather amazing.
When I throw GPU folding into the mix, OGR performance drops to about ~5Mnodes/s. Just so you know, I've carefully arranged priorities and affinities so that GPU folding "steals" its CPU cycles from OGR only while FAH CPU folding remains largely unaffected by the addition of GPU folding. Task Manager may paint a much prettier picture, but in my case the real life impact of GPU folding on the OGR performance is huge: drop from 17Mnodes/s to 5Mnodes/s!
EDIT: Looks like the "real" CPU utilization may vary A LOT between various NVidia GPU projects. Currently running P10502 & P7623 (as opposed to 10505 & 8010) and the OGR is crunching at 14Mnodes/s (as opposed to 5Mnodes/s).
Of course, cramming abovementioned 6 active threads on a 2C/4T CPU causes scheduling conflicts, cache contention, younameit. Admittedly, I'm running a bit strange mix of semi-dedicated 24/7 folding at the moment. Until now, I've thought my Atom330 platform to be a miniature version of its stronger siblings, so I couldn't help wondering if my observations can be extrapolated to a setup which has a stronger CPU but stronger GPUs as well. I suppose the conclusion is that more powerful setups also deal with this kind of resource race much more gracefully, whatever the reason may be.
First off, I get ~20Mnodes/s performance running just OGR. When I start the CPU slots, OGR drops to ~17Mndoes/s. I'm really suprised by how well HyperThreading manages to parallelize these two different workloads - (mostly) floating point work vs pure integer work. With careful arrangement of workload affinities, it's almost like getting the performance of an equivalent true quad. Not quite 100%, but I consider about 17 / 20 * 100% == 85% rather amazing.
When I throw GPU folding into the mix, OGR performance drops to about ~5Mnodes/s. Just so you know, I've carefully arranged priorities and affinities so that GPU folding "steals" its CPU cycles from OGR only while FAH CPU folding remains largely unaffected by the addition of GPU folding. Task Manager may paint a much prettier picture, but in my case the real life impact of GPU folding on the OGR performance is huge: drop from 17Mnodes/s to 5Mnodes/s!
EDIT: Looks like the "real" CPU utilization may vary A LOT between various NVidia GPU projects. Currently running P10502 & P7623 (as opposed to 10505 & 8010) and the OGR is crunching at 14Mnodes/s (as opposed to 5Mnodes/s).
Of course, cramming abovementioned 6 active threads on a 2C/4T CPU causes scheduling conflicts, cache contention, younameit. Admittedly, I'm running a bit strange mix of semi-dedicated 24/7 folding at the moment. Until now, I've thought my Atom330 platform to be a miniature version of its stronger siblings, so I couldn't help wondering if my observations can be extrapolated to a setup which has a stronger CPU but stronger GPUs as well. I suppose the conclusion is that more powerful setups also deal with this kind of resource race much more gracefully, whatever the reason may be.
Last edited by Napoleon on Wed Oct 24, 2012 1:51 pm, edited 2 times in total.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
We can speculate about "younameit" but without a better monitoring tool than Taskmgr, it would be just that ... speculation. (The fact that Taskmgr [sometimes] gives optimistic results that happen to favor HyperThreading isn't going to give them a good reason to improve it.) Still, you're certainly getting a lot out of your 330.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 2948
- Joined: Sun Dec 02, 2007 4:36 am
- Hardware configuration: Machine #1:
Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).
Machine #2:
Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.
Machine 3:
Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32
I am currently folding just on the 5x GTX 460's for aprox. 70K PPD - Location: Salem. OR USA
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
If you want better accuracy try using performance monitor (built into Windows too). You can isolate and measure individual characteristics and thereby determine what factors are important and what are not.
-
- Posts: 285
- Joined: Tue Jan 24, 2012 3:43 am
- Hardware configuration: Quad Q9550 2.83 contains the GPU 57xx - running SMP and GPU
Quad Q6700 2.66 running just SMP
2P 32core Interlagos SMP on linux
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
I didn't go thru all the research that Napoleon did ( thanks for the info btw ), but when I put a GTX570 into a slower Q6700 PC; I noticed that it was getting a bottleneck. CPU frame times were dropping. The GPU was new, so I had no PPD reference.
When I changed the SMP slot to 99%; both the SMP and GPU PPD increased. I think 99.5 or 99.8 would have done the same thing, but I don't think I have that option.
When I changed the SMP slot to 99%; both the SMP and GPU PPD increased. I think 99.5 or 99.8 would have done the same thing, but I don't think I have that option.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?
Try this same old v6 client trick. Change both back to 100% usage.
Keep SMP slot at default priority of idle. Change GPU slot priority to "low"... slot option = "core-priority" with setting = "low"
Restart clients. What happens to PPD?
Keep SMP slot at default priority of idle. Change GPU slot priority to "low"... slot option = "core-priority" with setting = "low"
Restart clients. What happens to PPD?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.