Page 1 of 1

Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Thu Aug 19, 2010 4:21 pm
by KneeDeep
(This is my first trouble report: apologies if I've failed miserably!)

System: Win 7 x64, AMD 1055t@3.36GHz, 4GB RAM@800MHz, (R5750 & R4350 folding as well); SMP running as a service, GPU's as consoles.

SLOOOooowwwwwww..... far slower than anything I've noticed in my previous 1600 WUs. Nothing is truly hanging, but 1-2 hrs/% isn't right for a 921 credit WU. This has dropped my 6cpu 1055t from 1200PPD to 80PPD.

63%-85% of CPU cycles are going to the A3 task, judging from Task Mgr display (most of rest are going to two GPU folders). Rebooting, and stop-starting the Service, had no effect on the run times.

The deadline is in 1d14hr, but the remaining 83% of the task will take 9d at the current rate.

What action should I take at this point?


Code: Select all

 -- Log since last reboot --
--- Opening Log file [August 18 18:39:50 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.30

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@home\SMP
Service: C:\Folding@home\SMP\Folding@home-Win32-x86.exe
Arguments: -svcstart -d C:\Folding@home\SMP -smp -verbosity 9 

Launched as a service.
Entered C:\Folding@home\SMP to do work.

[18:39:50] - Ask before connecting: No
[18:39:50] - User name: MudHole (Team 0)
[18:39:50] - User ID: 2979EE7E723C1EF7
[18:39:50] - Machine ID: 13
[18:39:50] 
[18:39:50] Loaded queue successfully.
[18:39:50] 
[18:39:50] + Processing work unit
[18:39:50] Core required: FahCore_a3.exe
[18:39:50] Core found.
[18:39:50] - Autosending finished units... [August 18 18:39:50 UTC]
[18:39:50] Trying to send all finished work units
[18:39:50] + No unsent completed units remaining.
[18:39:50] - Autosend completed
[18:39:50] Working on queue slot 08 [August 18 18:39:50 UTC]
[18:39:50] + Working ...
[18:39:50] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 08 -np 6 -checkpoint 15 -service -verbose -lifeline 4620 -version 630

[18:39:51] 
[18:39:51] *------------------------------*
[18:39:51] Folding@Home Gromacs SMP Core
[18:39:51] Version 2.22 (Mar 12, 2010)
[18:39:51] 
[18:39:51] Preparing to commence simulation
[18:39:51] - Looking at optimizations...
[18:39:51] - Files status OK
[18:39:51] - Expanded 764006 -> 1404481 (decompressed 183.8 percent)
[18:39:51] Called DecompressByteArray: compressed_data_size=764006 data_size=1404481, decompressed_data_size=1404481 diff=0
[18:39:51] - Digital signature verified
[18:39:51] 
[18:39:51] Project: 6701 (Run 24, Clone 15, Gen 34)
[18:39:51] 
[18:39:51] Assembly optimizations on if available.
[18:39:51] Entering M.D.
[18:39:57] Using Gromacs checkpoints
[18:40:57] Resuming from checkpoint
[18:40:57] Verified work/wudata_08.log
[18:40:57] Verified work/wudata_08.trr
[18:40:57] Verified work/wudata_08.xtc
[18:40:57] Verified work/wudata_08.edr
[18:40:58] Completed 157700 out of 2000000 steps  (7%)
[18:57:39] Completed 160000 out of 2000000 steps  (8%)
[20:30:59] Completed 180000 out of 2000000 steps  (9%)
[22:41:56] Completed 200000 out of 2000000 steps  (10%)
[00:39:50] - Autosending finished units... [August 19 00:39:50 UTC]
[00:39:50] Trying to send all finished work units
[00:39:50] + No unsent completed units remaining.
[00:39:50] - Autosend completed
[00:49:12] Completed 220000 out of 2000000 steps  (11%)
[01:39:49] Completed 240000 out of 2000000 steps  (12%)
[03:42:04] Completed 260000 out of 2000000 steps  (13%)
[06:39:50] - Autosending finished units... [August 19 06:39:50 UTC]
[06:39:50] Trying to send all finished work units
[06:39:50] + No unsent completed units remaining.
[06:39:50] - Autosend completed
[07:10:06] Completed 280000 out of 2000000 steps  (14%)
[10:03:25] Completed 300000 out of 2000000 steps  (15%)
[12:39:50] - Autosending finished units... [August 19 12:39:50 UTC]
[12:39:50] Trying to send all finished work units
[12:39:50] + No unsent completed units remaining.
[12:39:50] - Autosend completed
[13:49:21] Completed 320000 out of 2000000 steps  (16%)
[15:29:06] Completed 340000 out of 2000000 steps  (17%)

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Thu Aug 19, 2010 8:38 pm
by Arnette
Try disabling your SMP service and see what % cpu your GPU's are using while folding.

I've found the ATI cards can use a ton of CPU while folding - though not enough to yield those low results...

Are you sure all of the cores are being utilized for folding? It might be worth trying -smp 5 in your client instead of -smp
This will leave you 1 free core for your GPU clients and/or windows to use.

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Thu Aug 19, 2010 9:12 pm
by John_Weatherman
"4GB RAM@800MHz" - I assume that's wrong?

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Thu Aug 19, 2010 10:29 pm
by KneeDeep
Arnette wrote:Try disabling your SMP service and see what % cpu your GPU's are using while folding.

I've found the ATI cards can use a ton of CPU while folding - though not enough to yield those low results...

Are you sure all of the cores are being utilized for folding? It might be worth trying -smp 5 in your client instead of -smp
This will leave you 1 free core for your GPU clients and/or windows to use.
* - Disabling SMP showed each GPU'folder consuming about 17% -- as before the shutdown.
* - Note: I set the GPU's to nocpulock so they wouldn't each contend with a single locked CPU-folder, but would distribute their CPU competition over all CPU-folders.
*** Edit: I just noticed that the R5750 has nocpulock set while the R4350 doesn't! Oh well...
* - The CPU monitor has always shown complete CPU saturation of all 6 cores whenever SMP's been running
* - A (second) system reboot and a switch to using "-smp 5" is running: I'll post the results in a couple of hours -- or less if it makes a significant improvement. But... 921 points suggests a normal WU time of ~16 hours; that's 10 min/%, and in 30 minutes since the restart with "-smp 5" there hasn't been a % increase, so I continue to think there's something hosed with this WU. And I expect it ain't gonna finish it by the deadline. And I remain open to suggestions.
John_Weatherman wrote:"4GB RAM@800MHz" - I assume that's wrong?
? - Why assume anything? That's the DRAM Frequency CPU-Z shows, although I suppose the memory is doubling that to 1600 -- I'm tired of these variations on the ways of stating memory speeds: DDR3-1600 vs PC3-12800 vs DRAM freq = 800. It's DDR3 on a 240MHz bus with an FSB:DRAM ratio of 3:10. Have I sinned? Probably!

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Fri Aug 20, 2010 5:05 am
by John_Weatherman
I would say just write DDR3-1600 and that's enough.(unless you've brought something cheap on Ebay that was made by a Chinese prisoner with a hammer and screwdriver in a labor camp - "Re-education through work!")

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Fri Aug 20, 2010 9:00 pm
by KneeDeep
... I ran a dozen tests, and got fed up editing this post to explain each in detail.

In summary: if I shut down the GPU's, the SMP clicked off a % of the WU every 11 minutes.

If I ran the GPU's with "nocpulock=0" [the default setting] the SMP slowed 30-40% -- which would be expected as the GPU's each seemed to take over a CPU.

It appears that SMP threads are NOT CPU locked: running "smp -4" with NO GPU's should have shown two nearly idle CPU's and four fully active CPU's if threads were CPU locked. Instead, all six CPU's were running around 2/3 utilized.

It appears that GPU threads are NOT CPU locked whether "nocpulock" is zero or one: without SMP running, there should have only been two CPU's showing heavy activity; instead, each CPU was showing partial activity.

There is some SMP-cramping impact of setting a GPU to "nocpulock=1", but... I can't guess at the mechanism. The SMP slowdown was reduced by using "smp -5" and "smp -4".

In the GPU forum, I've started a thread asking what nocpulock is supposed to do.
http://foldingforum.org/viewtopic.php?f ... 43#p156038

o&o

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Fri Aug 20, 2010 9:43 pm
by sortofageek
I know this doesn't help you, but since you reported this WU in this forum, I'm just confirming it was not a bad WU. It was successfully completed and returned by another folder.

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Fri Aug 20, 2010 10:41 pm
by KneeDeep
sortofageek wrote:I know this doesn't help you, but since you reported this WU in this forum, I'm just confirming it was not a bad WU. It was successfully completed and returned by another folder.
... -I- am probably that other folder. Once I found it was the GPU setting that was retarding the SMP work, I set the SMP back on track.

Or, do they distribute a WU to multiple folders?! [I beat the deadline.]

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Sun Aug 22, 2010 6:08 am
by bruce
"I beat the deadline" might mean the Preferred Deadline or the Final Deadline. When a WU passes the Preferred Deadline, it is reissued. When certain types of error results are uploaded, a WU is reissued. Occasionally there may be other reasons to reissue a WU, but I think they're rather rare. (Stanford doesn't want to waste processing resources any more than you do.)

Yes, you did return the WU for full credit plus bonus.

Re: Project: 6701 (Run 24, Clone 15, Gen 34) ???

Posted: Sun Aug 22, 2010 7:24 am
by sortofageek
Sorry, KneeDeep. I didn't look at the log, so didn't realize you are actually:

[18:39:50] - User name: MudHole (Team 0)

So, I thought this was a different folder:

Hi MudHole (team 0),
Your WU (P6701 R24 C15 G34) was added to the stats database on 2010-08-20 08:10:54 for 2848.65 points of credit.