Page 1 of 2
7611 very slow
Posted: Sun Jun 19, 2011 8:55 pm
by BonaSwirl
I downloaded a project 7611 work unit yesterday and it is only utilising about 77-80% of my CPU. Normal SMP units will use up about 90% due to one core being dedicated to GPU folding. I know there have been some issues with these units but the one I had was folding incredibly slowly so I thought it may have just been my SMP settings so I waited to see if everything would revert to normal on receiving another work unit. However, I received another project 7611 work unit and it is doing the same thing. As an indication in the performance drop: before I was looking at about 16-18k PPD from my CPU and with the 7611 projects it took 13 hours to complete a 788 point project. With bonus this equated to about 4100points. That equates to not even 7600 PPD which is less than 50% of what I usually receive from other 'normal' SMP projects running on the a3 core.
Any ideas as to why these are running so slowly?
Thanks
Re: 7611 very slow
Posted: Mon Jun 20, 2011 6:38 am
by tjlane
Hi BonaSwirl,
There have been some issues with P7611 in the past but I've done my best to resolve them - most had to do with client compatibility. Since that doesn't seem to be your problem, can you post your hardware/run settings? Specifically how many cores are you running on?
Thanks!
TJ
Re: 7611 very slow
Posted: Mon Jun 20, 2011 7:06 am
by BonaSwirl
Hi,
Yes, I had heard there had been some issues. I don't know how best to post this but I'll my best.
Hardware:
i7 920 @ 3.6GHz
6GB 1600MHz RAM
GTX275
Software:
v7 client
I use SMP 7 although I worry this may be related to the issue as the v7 doesn't let you choose SMP 7 in the drop down so I entered this by hand. SMP 7 yielded much faster TPFs on a3 core units but I worry that this may be related to me setting this manually and the v7 client not fully supporting it? However, in system resources 7 cores show to be loaded but they're not as consistently loaded as when folding one of the older units. Instead of seeing 7 cores show a graph with a steady line at near 100% now only 3 or so cores show a steady line and the another 4 show utilisation varying between 100% and about 80%.
Hopefully this is all the information you need!
Thanks
Re: 7611 very slow
Posted: Mon Jun 20, 2011 7:09 am
by BonaSwirl
As I posted this it seems that resource monitor seems to have settled out now and is now showing 6 cores at full load and just the 6th core varying between 100% and now about 90% utilisation and then the 7th core at low usage due to running a GPU slot too.
Thanks
Re: 7611 very slow
Posted: Mon Jun 20, 2011 7:37 am
by GreyWhiskers
What's the trade for you in having what I think you described now, essentially, -smp 6, leaving one whole core, two threads to support the GPU and other stuff, and having -smp 8? The amount of CPU that Core 11 uses (I guess that's what processes on the non-Fermi Nvidia cards) will impact the SMP folding, but the question for you to examine is whether that impact is more or less than giving up a whole core/2 threads with essentially SMP 6.
Re: 7611 very slow
Posted: Mon Jun 20, 2011 3:00 pm
by BonaSwirl
You see at the moment, I have it set to SMP 7, it's what I used with the v6 clients and it ran a lot quicker than both SMP 6 or SMP 8 by quite a margin. Obviously in v7 there is not the option to add SMP 7 but if you type your own value into the SMP number then it is possible for it to run as an SMP 7 unit. I'm assuming that the a4 core doesn't scale well on 7 threads though as the settings are normally only even numbers. I shall wait and see once I get another unit that isn't project 76xx and see if it reverts to how it was. Although, I've just picked up another 7611 unit. I'll set it to SMP 8 for this one and we'll see how we go!
Re: 7611 very slow
Posted: Mon Jun 20, 2011 5:13 pm
by tjlane
BonaSwirl,
Because of the way that our simulations get parallelized, -smp 7 is a particularly bad number to choose
(large prime numbers = bad). Especially for smaller SMP WUs like the 76** series, I think this could really kill performance. Even numbers are best, 6 or 8 should work better. If you time WUs at -smp 6, 7, and 8, let me know how things go and this could help inform us as to the most efficient way to distribute WUs!
Thanks for reporting this stuff!
TJ
Re: 7611 very slow
Posted: Mon Jun 20, 2011 6:01 pm
by BonaSwirl
Okay, I will let you know, although obviously this could be a long process as I only fold for around 16 hours a day and when each 7611 work unit is taking about 13 hours it'll be a few days. I will however keep you informed. I've got myself a project 7610 unit currently though and this is running a lot faster - 5mins 40secs per frame against the 7-8 mins per frame for the 7611 units. This is still using -smp 7 though. I'll update as soon as I can!
Just a quick question - just to clarify - I can't change SMP number part way through folding can I? It says in the settings that this will result in the WU being dumped. I wouldn't usually be ignorant to this statement it's just I believe that I have changed SMP number before then inadvertently restarted my computer, hence engaging the new SMP flag, and the WU has continued as per usual.
Re: 7611 very slow
Posted: Mon Jun 20, 2011 6:24 pm
by bruce
Look in the log soon after the WU is (re-)started for a message like Mapping NT from x to y. Some FahCore's recognize the problem and correct it automatically, but I'm not sure about which cores have this feature or which numbers they decide to correct.
If you're running an ATI GPU and SMP on the same machine, you'll probably not want to use smp:8. I'd choose between GPU + smp:6 or no GPU and just smp:8. I don't have enough information to recommend a specific choice between the two.
Ticket #292 describes the issue of changing the number of CPUs in the middle of a run. This has been discussed in many venues but the ticket is still open. You'll want to use Finish so that the current WU is completed (unless it is so slow that it will miss the deadline anyway) before changing the number of CPUs. Then you can Fold and a new WU will be downloaded that fits the new smp:N setting.
One question: If you're currently running smp:7 on an i7, FahCore_a4 should be using 87% of the CPU. If you're also running a GPU, there will be another FahCore running which will be using between 0 and 12% of the CPU plus some other tasks that might be getting non-zero amounts of processing. Is that what you see on TaskMon?
Re: 7611 very slow
Posted: Mon Jun 20, 2011 6:36 pm
by BonaSwirl
I have an nvidia GPU and i know they aren't too bad with regards to CPU usage when folding. SMP 7 seemed to be perfect on the old client but I suppose I'll have to make the decision between 6 and 8. I'll test it over the next couple of days and find what the optimum setting is.
The one question I have to ask though is if these units are taking so long on my CPU why are they not worth more points or have a higher K factor? This is quite a large drop in PPD. (Obviously its the work within the WUs that matters but as the points system is the best way to monitor my systems performance the points drop has to be questioned.)
Thanks
Re: 7611 very slow
Posted: Mon Jun 20, 2011 9:52 pm
by bruce
Perhaps that information can be deduced if you answer the question I asked.
bruce wrote:One question: If you're currently running smp:7 on an i7, FahCore_a4 should be using 87% of the CPU. If you're also running a GPU, there will be another FahCore running which will be using between 0 and 12% of the CPU plus some other tasks that might be getting non-zero amounts of processing. Is that what you see on TaskMon?
The points are established based on the time the project takes on Stanford's benchmark machine. Your question boils down to (A) What's the difference between the benchmark machine and your machine or (B) Did Stanford make a mistake in benchmarking. Either answer is possible, but (B) is rather rare.
Re: 7611 very slow
Posted: Mon Jun 20, 2011 10:02 pm
by tjlane
Bruce is correct that all of our points values are determined based on benchmarking on a single machine. Performance varies based on hardware & simulation protocol, so we do expect some differences between projects/user/computers etc. This is regrettable, but coming up with a "fairer" system is pretty difficult. What we can try to do is ensure that clients receive work that is best for their hardware - this is good from a points & science standpoint, and is why I'm curious about your results
.
Re: 7611 very slow
Posted: Mon Jun 20, 2011 10:32 pm
by BonaSwirl
I'm really sorry Bruce. I seem to have managed to completely miss that last paragraph of your post. I don't mean to be ignorant - it just seems to come naturally
Regarding your question though: It is completely dependent on which program I use to measure the CPU usage. I use Real Temp to record temps and that has a handy load percentage which is my primary point of call. Currently I'm folding a project 7610 WU and Real Temp states 85-88%. However, if I use windows Resource Monitor, it seems to read 1-2% higher, it's currently showing 87/88% which falls in line with the a4 core utilising 7 cores. The FahCore_11 is using practically nothing so Resource Monitor is showing an overall CPU load of 87/88% with the odd spike.
This all seems fine but I know that when I had the project 7611 WU this wasn't all the same. I will wait until this WU is finished and see what I get next. I seem to be becoming a dab hand at receiving these 76xx units so if I receive another 7611 WU I'll fill you in with what's happening!
I had read over how the points are calculated but it just seemed strange as to why they were so much lower - sounds like a great excuse for an upgrade or the start of a folding farm
Re: 7611 very slow
Posted: Tue Jun 21, 2011 8:14 pm
by BonaSwirl
So I've had the chance to get a few rough bits of information for different SMP numbers. These are all for project 7611 WUs.
SMP 6 - 7mins 20secs
SMP 7 - 7mins 40secs
SMP 8 - 8mins 30secs
These are all recorded with a GPU folding slot running at the same time. Which obviously lowers the TPF quite drastically. I realised where my original issue of only being at 80% load was coming from too. It stemmed from an old version of Real Temp providing incorrect readings but with this updated and now showing correct load percentages everything was as expected.
One problem I did notice though is that these projects had the same TPF running on SMP 7 with a GPU folding simultaneously as if it were on SMP 8 and being the only slot folding. I don't know what this could be but I would expect something to fold much faster if it can use the whole 8 threads of the processor to itself.
Re: 7611 very slow
Posted: Tue Jun 21, 2011 9:12 pm
by bruce
BonaSwirl wrote:I don't know what this could be but I would expect something to fold much faster if it can use the whole 8 threads of the processor to itself.
There are two issues here.
First, you don't have 8 whole threads, you have 8 half threads. The threads that you're talking about are on HyperThreaded (virtual) CPUs rather than real CPUs. When you use 7 threads, at any instant, your three of your four real CPUs will be splitting their resources between two threads each. The other real CPU will be only processing one thread, so that thread will necessarily be running faster than the other six threads. If you happen to start a GPU client in the free thread will make that real CPU to deal with two threads again, but the GPU code is
very different so that smp thread will still run faster than the other 6. How Windows decides to allocate those 7 SMP threads is hard to predict, so things will tend to balance out in strange ways, but the facts don't change. The difference between -smp 7 and -smp 8 on a hyperthreaded CPU like the i7 isn't going to be as logical as you think it should be.
Second, it depends on how Gromacs assigns work to 7 tasks compared to how it assigns work to non-prime values such as 6 or 8. (See above.)