Project 7611 (0, 23, 259) - High TPF
Moderators: Site Moderators, FAHC Science Team
Project: 7611 (Run 3, Clone 26, Gen 107) Very slow
I run this on an I7 2600K which has run as expected up until now.
This WU has a TPF of about 39 min. It is at 11% now and ETA is 2.35 days. PPD for these days are 680.
At first I thought that some of the cores were occupied by other tasks, but I think I have ruled that out now.
I don't have trouble running long lasting WUs, I don't have trouble running low scoreing WUs either, but I don't like low scoreing WUs that take several days to complete.
Is it really necessary to have such a big variance in scoring?
This WU has a TPF of about 39 min. It is at 11% now and ETA is 2.35 days. PPD for these days are 680.
At first I thought that some of the cores were occupied by other tasks, but I think I have ruled that out now.
I don't have trouble running long lasting WUs, I don't have trouble running low scoreing WUs either, but I don't like low scoreing WUs that take several days to complete.
Is it really necessary to have such a big variance in scoring?
-
- Posts: 53
- Joined: Wed Aug 26, 2009 12:10 am
Re: 7611 (3, 26, 107) Very slow
7611 DOES run slower and the variation in PPD has been reported on several occasions. The last ones I documented got about 75% of the PPD I expected. Other users have reported even slower (or lower PPD) results. PPD varies greatly depending upon the WU and systems it is being run on but once you get comfortable with a system, the slower jobs stand out.
You can find all of the other 7611 comments using the search feature in the top right corner of the page.
You can find all of the other 7611 comments using the search feature in the top right corner of the page.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: 7611 (3, 26, 107) Very slow
Yes, I have seen 7611 WU's be very variable, but 36 TPF is high. It is pointing to something else taking resources even though you think that is ruled out. Best I can suggest is pausing the unit and restarting your system to make sure nothing else is running. I have also seen these WU's being very sensitive to the slightest amount of thread synchronization issues, so it might be better running at SMP 6 instead of 8.
Edit: Thought came to me later, any chance HT has been disabled on your CPU? OS and system was not mentioned, HT can be disabled with some and would also result in very long frame times when 8 threads try to go on a 4-thread CPU.
Edit: Thought came to me later, any chance HT has been disabled on your CPU? OS and system was not mentioned, HT can be disabled with some and would also result in very long frame times when 8 threads try to go on a 4-thread CPU.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: 7611 (3, 26, 107) Very slow
Thanks Joe_H,
this is Win7 and the v7 client. I do run 2 nvidia 570 clients together with the smp. As I said it has run ok priorly. I checked the log and the wu just prior did run as expected. I stopped the gpu clients with not much difference to the smp client. Reducing from smp 8 to smp 6 only increased the TPF. However, when I rebooted, I lost the smp wu and a new one was started. I don't know why, I closed it properly with the quit command. the new wu runs more as expected again.
It could have been another process not showing in task manager taking cpu power. I'll keep an eye on the system to see if it happens again.
this is Win7 and the v7 client. I do run 2 nvidia 570 clients together with the smp. As I said it has run ok priorly. I checked the log and the wu just prior did run as expected. I stopped the gpu clients with not much difference to the smp client. Reducing from smp 8 to smp 6 only increased the TPF. However, when I rebooted, I lost the smp wu and a new one was started. I don't know why, I closed it properly with the quit command. the new wu runs more as expected again.
It could have been another process not showing in task manager taking cpu power. I'll keep an eye on the system to see if it happens again.
-
- Posts: 144
- Joined: Sun Jun 21, 2009 10:42 pm
- Contact:
Re: 7611 (3, 26, 107) Very slow
Seen a couple of these now that have really long frame times for no apparent reason. I dunno, it's normal to get one thats abnormal on occasion with this project?
Re: 7611 (3, 26, 107) Very slow
Start with the list of projects http://fah-web.stanford.edu/psummary.html
Divide the list into categories: Projects available to the Uniprocessor, Projects available to SMP, Projects available to GPUs.Projects available to bigadv.
Within each category, the BASELINE points should be approximately proportional to the time the WU runs. (That's true on the benchmark machine but it may not be true when there are significant hardware differences.)
Bonus points are NOT included in that statement. The fundamental problem that I see is that everybody assumes that they'll always get a similar bonus, which is simply not true.
The other problem that I see is that many uniprocessor projects can also be assigned to SMP clients. That probably makes a difference too, but I've never researched it.
Divide the list into categories: Projects available to the Uniprocessor, Projects available to SMP, Projects available to GPUs.Projects available to bigadv.
Within each category, the BASELINE points should be approximately proportional to the time the WU runs. (That's true on the benchmark machine but it may not be true when there are significant hardware differences.)
Bonus points are NOT included in that statement. The fundamental problem that I see is that everybody assumes that they'll always get a similar bonus, which is simply not true.
The other problem that I see is that many uniprocessor projects can also be assigned to SMP clients. That probably makes a difference too, but I've never researched it.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 144
- Joined: Sun Jun 21, 2009 10:42 pm
- Contact:
Re: 7611 (3, 26, 107) Very slow
True, the bonus is different for every unit, but his time per frame is way out of whack, even for p7611. He's got frame times i normally see on a c2d chip for those units. He should be blowing that chip out of the water, setting it on fire, and reducing it to a set of unrelated molecules. There are some real clunkers in there for some reason that just take foooooooooooooreverrrrrrrr to run. I'd be asking questions too, its certainly far enough out of whack to be asking if something is wrong.
-
- Site Admin
- Posts: 7937
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: 7611 (3, 26, 107) Very slow
Exactly, something may have been wrong, though unless the unit gets re-assigned to the same machine it may be impossible to find out what. But as an example, on my 2.8 GHz i7-860 the normal TPF for a 7611 varies between just under 8 minutes to around 9 minutes. At the worst I have seen a few 7611's show TPF's of 12-15 minutes, and as best as I can tell nothing else was running as the CPU for the folding core was reported at around 790+%. At 39 minutes on a faster i7, that specific WU is well outside a reasonable expectation based on others in the same project.
Re: 7611 (3, 26, 107) Very slow
Please post the first few pages from the top of your log (or from the System Info tab).
Also, a screenshot of Task manager with and without SMP running.
Are you trying to run both FAH and BOINC?
Also, a screenshot of Task manager with and without SMP running.
Are you trying to run both FAH and BOINC?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: 7611 (3, 26, 107) Very slow
bruce wrote:Please post the first few pages from the top of your log (or from the System Info tab).
Code: Select all
09:04:13:WU01:FS02:0xa4:Completed 9800000 out of 10000000 steps (98%)
09:06:19:WU01:FS02:0xa4:Completed 9900000 out of 10000000 steps (99%)
09:06:20:WU03:FS02:Connecting to assign3.stanford.edu:8080
09:06:21:WU03:FS02:News: Welcome to Folding@Home
09:06:21:WU03:FS02:Assigned to work server 171.64.65.104
09:06:21:WU03:FS02:Requesting new work unit for slot 02: RUNNING smp:8 from 171.64.65.104
09:06:21:WU03:FS02:Connecting to 171.64.65.104:8080
09:06:22:WU03:FS02:Downloading 29.66KiB
09:06:22:WU03:FS02:Download complete
09:06:22:WU03:FS02:Received Unit: id:03 state:DOWNLOAD error:OK project:7611 run:3 clone:26 gen:107 core:0xa4 unit:0x000000a0664f2dd04df0f5b6477c19dd
09:08:26:WU01:FS02:0xa4:Completed 10000000 out of 10000000 steps (100%)
09:08:26:WU01:FS02:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
09:08:36:WU01:FS02:0xa4:
09:08:36:WU01:FS02:0xa4:Finished Work Unit:
09:08:36:WU01:FS02:0xa4:- Reading up to 2131908 from "01/wudata_01.trr": Read 2131908
09:08:36:WU01:FS02:0xa4:trr file hash check passed.
09:08:36:WU01:FS02:0xa4:- Reading up to 220724 from "01/wudata_01.xtc": Read 220724
09:08:36:WU01:FS02:0xa4:xtc file hash check passed.
09:08:36:WU01:FS02:0xa4:edr file hash check passed.
09:08:36:WU01:FS02:0xa4:logfile size: 79221
09:08:36:WU01:FS02:0xa4:Leaving Run
09:08:39:WU01:FS02:0xa4:- Writing 2456325 bytes of core data to disk...
09:08:39:WU01:FS02:0xa4:Done: 2455813 -> 1716806 (compressed to 69.9 percent)
09:08:39:WU01:FS02:0xa4: ... Done.
09:08:39:WU01:FS02:0xa4:- Shutting down core
09:08:39:WU01:FS02:0xa4:
09:08:39:WU01:FS02:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
09:08:39:WU01:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
09:08:39:WU01:FS02:Sending unit results: id:01 state:SEND error:OK project:7006 run:1 clone:123 gen:54 core:0xa4 unit:0x000000930001329c4dfb8f06f915c638
09:08:39:WU01:FS02:Uploading 1.64MiB to 129.74.85.15
09:08:39:WU03:FS02:Starting
09:08:39:WU01:FS02:Connecting to 129.74.85.15:8080
09:08:39:WU03:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 03 -suffix 01 -version 701 -checkpoint 15 -np 8
09:08:39:WU03:FS02:Started FahCore on PID 5720
09:08:39:WU03:FS02:Core PID:6964
09:08:39:WU03:FS02:FahCore 0xa4 started
09:08:40:WU03:FS02:0xa4:
09:08:40:WU03:FS02:0xa4:*------------------------------*
09:08:40:WU03:FS02:0xa4:Folding@Home Gromacs GB Core
09:08:40:WU03:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
09:08:40:WU03:FS02:0xa4:
09:08:40:WU03:FS02:0xa4:Preparing to commence simulation
09:08:40:WU03:FS02:0xa4:- Looking at optimizations...
09:08:40:WU03:FS02:0xa4:- Created dyn
09:08:40:WU03:FS02:0xa4:- Files status OK
09:08:40:WU03:FS02:0xa4:- Expanded 29855 -> 644556 (decompressed 2158.9 percent)
09:08:40:WU03:FS02:0xa4:Called DecompressByteArray: compressed_data_size=29855 data_size=644556, decompressed_data_size=644556 diff=0
09:08:40:WU03:FS02:0xa4:- Digital signature verified
09:08:40:WU03:FS02:0xa4:
09:08:40:WU03:FS02:0xa4:Project: 7611 (Run 3, Clone 26, Gen 107)
09:08:40:WU03:FS02:0xa4:
09:08:40:WU03:FS02:0xa4:Assembly optimizations on if available.
09:08:40:WU03:FS02:0xa4:Entering M.D.
09:08:45:WU01:FS02:Upload 22.90%
09:08:46:WU03:FS02:0xa4:Mapping NT from 8 to 8
09:08:46:WU03:FS02:0xa4:Completed 0 out of 2000000 steps (0%)
09:08:51:WU01:FS02:Upload 57.24%
09:08:57:WU01:FS02:Upload 95.40%
09:08:59:WU01:FS02:Upload complete
09:08:59:WU01:FS02:Server responded WORK_ACK (400)
09:08:59:WU01:FS02:Final credit estimate, 4518.00 points
09:08:59:WU01:FS02:Cleaning up
09:47:44:WU03:FS02:0xa4:Completed 20000 out of 2000000 steps (1%)
10:27:00:WU03:FS02:0xa4:Completed 40000 out of 2000000 steps (2%)
The task errored out with "mdrun returned 255" when I restarted the machine.bruce wrote:Also, a screenshot of Task manager with and without SMP running.
No, this machine do only folding.bruce wrote:Are you trying to run both FAH and BOINC?
As I posted earlier, I can't rule out that the machine had a problem since I no longer have the task.
-
- Posts: 6
- Joined: Tue Feb 26, 2008 2:06 pm
- Hardware configuration: -1-
EVGA GeForce GTX 970 SC
-2-
EVGA GeForce GTX 980 SC
-3-
EVGA GeForce GTX 980 SC - Location: Shalimar, Florida
Re: 7611 (3, 26, 107) Very slow
I'm having the same 39 min TPF issue with WU 7610 (Run 501, Clone 0, Gen 54). Received unit around 2100 UTC and it's completed less than 5 frames since.
-
- Posts: 336
- Joined: Fri Jun 26, 2009 4:34 am
Re: Project: 7611 (Run 3, Clone 26, Gen 107) Very slow
Add me to the list of people with slow 7611 SMP work units. I had a couple last week and didn't report them. I've got another. Here is a log extract from my computer on the 14th processing a 7611. Normal TPF for this machine is about 18 minutes.
Now, here is a filtered log from the 16th with one of the 36 1/2 minute TFP 7611 work units on the same machine. It was restarted with no change in TFP. Task manager shows a4 core with a consistent 99% CPU load. PPD for this quad core machine has been pushed down to about 700. This is the first time I've seen such a huge range in TFP within one project. They do seem to share the same Run/Clone series.
Code: Select all
19:09:33:WU01:FS01:Starting
19:09:33:WU01:FS01:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 01 -suffix 01 -version 701 -lifeline 3256 -checkpoint 15 -np 4
19:09:33:WU01:FS01:Started FahCore on PID 1728
19:09:33:Started thread 14 on PID 3256
19:09:33:WU01:FS01:Core PID:3016
19:09:33:WU01:FS01:FahCore 0xa4 started
19:09:34:WU01:FS01:0xa4:
19:09:34:WU01:FS01:0xa4:*------------------------------*
19:09:34:WU01:FS01:0xa4:Folding@Home Gromacs GB Core
19:09:34:WU01:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
19:09:34:WU01:FS01:0xa4:
19:09:34:WU01:FS01:0xa4:Preparing to commence simulation
19:09:34:WU01:FS01:0xa4:- Looking at optimizations...
19:09:34:WU01:FS01:0xa4:- Created dyn
19:09:34:WU01:FS01:0xa4:- Files status OK
19:09:34:WU01:FS01:0xa4:- Expanded 339028 -> 644556 (decompressed 190.1 percent)
19:09:34:WU01:FS01:0xa4:Called DecompressByteArray: compressed_data_size=339028 data_size=644556, decompressed_data_size=644556 diff=0
19:09:34:WU01:FS01:0xa4:- Digital signature verified
19:09:34:WU01:FS01:0xa4:
19:09:34:WU01:FS01:0xa4:Project: 7611 (Run 0, Clone 9, Gen 138)
19:09:34:WU01:FS01:0xa4:
19:09:34:WU01:FS01:0xa4:Assembly optimizations on if available.
19:09:34:WU01:FS01:0xa4:Entering M.D.
19:09:36:WU00:FS01:Upload complete
19:09:36:WU00:FS01:Server responded WORK_ACK (400)
19:09:36:WU00:FS01:Final credit estimate, 3462.00 points
19:09:36:WU00:FS01:Cleaning up
19:09:40:WU01:FS01:0xa4:Mapping NT from 4 to 4
19:09:40:WU01:FS01:0xa4:Completed 0 out of 2000000 steps (0%)
******************************** Date: 14/03/12 ********************************
19:27:43:WU01:FS01:0xa4:Completed 20000 out of 2000000 steps (1%)
19:45:55:WU01:FS01:0xa4:Completed 40000 out of 2000000 steps (2%)
20:03:54:WU01:FS01:0xa4:Completed 60000 out of 2000000 steps (3%)
20:22:05:WU01:FS01:0xa4:Completed 80000 out of 2000000 steps (4%)
20:40:04:WU01:FS01:0xa4:Completed 100000 out of 2000000 steps (5%)
20:57:30:WU01:FS01:0xa4:Completed 120000 out of 2000000 steps (6%)
21:14:35:WU01:FS01:0xa4:Completed 140000 out of 2000000 steps (7%)
21:31:37:WU01:FS01:0xa4:Completed 160000 out of 2000000 steps (8%)
Code: Select all
*********************** Log Started 2012-03-16T21:09:54Z ***********************
21:10:14:FS01:Unpaused
21:10:14:WU02:FS01:Starting
21:10:14:WU02:FS01:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" "C:/Documents and Settings/All Users/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 02 -suffix 01 -version 701 -lifeline 3060 -checkpoint 15 -np 4
21:10:15:WU02:FS01:Started FahCore on PID 3384
21:10:15:WU02:FS01:Core PID:3404
21:10:15:WU02:FS01:FahCore 0xa4 started
21:10:15:WU02:FS01:0xa4:
21:10:15:WU02:FS01:0xa4:*------------------------------*
21:10:15:WU02:FS01:0xa4:Folding@Home Gromacs GB Core
21:10:15:WU02:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
21:10:15:WU02:FS01:0xa4:
21:10:15:WU02:FS01:0xa4:Preparing to commence simulation
21:10:15:WU02:FS01:0xa4:- Looking at optimizations...
21:10:15:WU02:FS01:0xa4:- Files status OK
21:10:15:WU02:FS01:0xa4:- Expanded 29855 -> 644556 (decompressed 2158.9 percent)
21:10:15:WU02:FS01:0xa4:Called DecompressByteArray: compressed_data_size=29855 data_size=644556, decompressed_data_size=644556 diff=0
21:10:15:WU02:FS01:0xa4:- Digital signature verified
21:10:15:WU02:FS01:0xa4:
21:10:15:WU02:FS01:0xa4:Project: 7611 (Run 3, Clone 26, Gen 110)
21:10:15:WU02:FS01:0xa4:
21:10:15:WU02:FS01:0xa4:Assembly optimizations on if available.
21:10:15:WU02:FS01:0xa4:Entering M.D.
21:10:21:WU02:FS01:0xa4:Using Gromacs checkpoints
21:10:21:WU02:FS01:0xa4:Mapping NT from 4 to 4
21:10:22:WU02:FS01:0xa4:Resuming from checkpoint
21:10:22:WU02:FS01:0xa4:Verified 02/wudata_01.log
21:10:22:WU02:FS01:0xa4:Verified 02/wudata_01.trr
21:10:22:WU02:FS01:0xa4:Verified 02/wudata_01.xtc
21:10:22:WU02:FS01:0xa4:Verified 02/wudata_01.edr
21:10:22:WU02:FS01:0xa4:Completed 394730 out of 2000000 steps (19%)
21:19:58:WU02:FS01:0xa4:Completed 400000 out of 2000000 steps (20%)
21:56:26:WU02:FS01:0xa4:Completed 420000 out of 2000000 steps (21%)
22:32:53:WU02:FS01:0xa4:Completed 440000 out of 2000000 steps (22%)
23:09:20:WU02:FS01:0xa4:Completed 460000 out of 2000000 steps (23%)
23:45:47:WU02:FS01:0xa4:Completed 480000 out of 2000000 steps (24%)
00:22:18:WU02:FS01:0xa4:Completed 500000 out of 2000000 steps (25%)
00:58:48:WU02:FS01:0xa4:Completed 520000 out of 2000000 steps (26%)
01:35:15:WU02:FS01:0xa4:Completed 540000 out of 2000000 steps (27%)
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Project: 7611 (Run 3, Clone 26, Gen 107) Very slow
I have informed the Project owner about this so let's see what happens.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: Project: 7611 (Run 3, Clone 26, Gen 107) Very slow
Hi All,
First, thanks for reporting issues with a specific WU. Based on this information, I was able to track down a problem with a very small subset (< 1%) of p7611 WUs, including (3, 26, 107). I believe I have been able to address most of the immediate issues by simply ending these bad WUs. Please continue to report any issues, especially wild deviations from normal WU behavior.
Thanks!
TJ
First, thanks for reporting issues with a specific WU. Based on this information, I was able to track down a problem with a very small subset (< 1%) of p7611 WUs, including (3, 26, 107). I believe I have been able to address most of the immediate issues by simply ending these bad WUs. Please continue to report any issues, especially wild deviations from normal WU behavior.
Thanks!
TJ
Re: Project: 7611 (Run 3, Clone 26, Gen 107) Very slow
Can you modify the FahCore so that these <1% of WUs issue an error? Tracking down a problem which doesn't produce an error message is a lot harder to do that one which the software can identify as defective. I'm concerned that what happened with this project will happen again on the next project, even if you have been successful in stopping all of WUs that were bad for this project.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.