Page 1 of 1
Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 5:29 pm
by DrSpalding
I think I have another P7611 anomaly. It is currently running at a TPF of about 65 minutes on an i7-920 running at a slight overclock. The P7611 TPF has averaged about 10 minutes on a nearly identical machine running a stock speed i7-920, and I would expect that the average for the machine in question is about 8 minutes. The averages are now skewed by the slow WU. Other 2-way and 4-way machines I have running are all < 17 minutes TPF.
I am planning on updating the client and control software to v7.3.6 today, so I may just throw this one out anyway, but if someone could check and mark it as possibly bad, that would probably be a good idea.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 5:41 pm
by bruce
How many steps does it have? IIRC, the bad WUs that were "very slow" had more than the expected number of steps.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 6:03 pm
by DrSpalding
According to the log:
Code: Select all
16:32:26:WU00:FS00:0xa4:Project: 7611 (Run 4, Clone 19, Gen 332)
16:32:26:WU00:FS00:0xa4:
16:32:26:WU00:FS00:0xa4:Entering M.D.
16:32:32:WU00:FS00:0xa4:Using Gromacs checkpoints
16:32:32:WU00:FS00:0xa4:Mapping NT from 8 to 8
16:32:32:WU00:FS00:0xa4:Resuming from checkpoint
16:32:32:WU00:FS00:0xa4:Verified 00/wudata_01.log
16:32:33:WU00:FS00:0xa4:Verified 00/wudata_01.trr
16:32:33:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
16:32:33:WU00:FS00:0xa4:Verified 00/wudata_01.edr
16:32:34:WU00:FS00:0xa4:Completed 523770 out of 2000000 steps (26%)
16:45:21:Server connection id=4 on 0.0.0.0:36330 from 192.168.1.20
17:26:34:WU00:FS00:0xa4:Completed 540000 out of 2000000 steps (27%)
So, it is 2,000,000 steps.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 6:26 pm
by Joe_H
bruce wrote:How many steps does it have? IIRC, the bad WUs that were "very slow" had more than the expected number of steps.
Bruce may be recalling a problem in a different project than 7611, most of the problem 7611's had the correct number of steps. I just checked an old log on my system, 2000000 steps is normal for 7611. As I recall, Dr. Kasson posted that the problem WU's appeared to have calculation errors that were not quite enough to trigger an EUE with the current version of the A4 core, but ran much slower after encountering the error.
One person has successfully turned in this WU, it took them nearly 3 days. However, without information on their system I can not tell if that is "normal". If you have ruled out other processes taking away CPU time from F@H, then you can safely drop this WU. I can not tell at this time if the WU is bad.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 6:31 pm
by P5-133XL
Whenever looking at an abnormal TPF for an SMP WU one of the things you should be looking at is the task manager (process tab) to see if there is an activity that is using a significant amount of even one CPU core. The cores are designed to suspend themselves so if any other process needs the CPU folding will be suspended in preference to any other process. The goal here is not to interfere with your usage of the machine.
Unfortunately, SMP has several threads, one for each CPU core, but is highly synchronized so that if a thread is suspended then the others just sit in a loop waiting for the suspended thread to come back. The net result can be a dramatic slowdown in folding for what looks to be an insignificant external CPU usage.
So please check the task manager process tab for any processes that are not FahCore_??.exe that are using a significant amount of CPU time. What I mean by very significant would be a single hyper-threaded CPU core's worth. Occasionally, even FAHControl will suddenly start using a full core and that will totally kill off SMP productivity just like any other process that uses a full CPU core.
If there is such a process then see what you can do to stop it like restarting FAHControl if that is the problem. If you can't stop or control the process then the next best thing is to allocate less cores to folding for It is much better to under-subscribe the number of cores than to oversubscribe especially with hyper-threaded cores. This can be done from Advanced Control configure then go to the slot tab and double-click on the CPU slot. From there you can explicitly specify the number of CPU cores/threads (-1 means automatically configure).
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 6:58 pm
by DrSpalding
I am aware of everything that runs on the machine and well-versed in detecting process time hogs, but thanks for your input nonetheless. I'm sure it will aid someone.
There are no time hogs running besides FAHCore_A4 and it is running at its usual 99% CPU level.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 7:24 pm
by DrSpalding
I have just terminated and removed the slow 7611 WU's work/slot directory. I archived the whole directory though, if someone needs to see the raw data of it at 28% completion. The machine picked up a P6347 WU and is happily crunching it with a TPF of 79s.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sat Sep 07, 2013 10:58 pm
by bruce
The 2,000,000 steps is, in fact, correct. This Project has had a variety of reports of being slow, but at one point, the points were adjusted to take that into account:
viewtopic.php?p=221046.
I'm not aware of a dependable method of knowing what should be expected except by those who are currently processing a P7611 assignment.
Re: Project: 7611 (4, 19, 332) very slow
Posted: Sun Sep 08, 2013 1:20 am
by Joe_H
There is a fairly long topic on the subject of Project 7611 WU's with long TPF's farther down in this sub-forum - viewtopic.php?f=19&t=22591. I misremembered who posted from the PG, it was tjlane instead of Dr. Kasson. It has been a few months since the last report, so possibly they have identified most of the problem WU's. The affected WU's were less than a tenth of a percent of the total being processed.