Page 1 of 1

Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 1:27 am
by kerryd
I am thinking some thing is not right here.2hr37 min tpf with 17.68% done got 8.99 days left on eta.Now that is running 10 cores at 4400Mhz.
I have stopped v7 closed it rebooted the computer. my cpu temp is 50c so its not overheating. both gpu's are running wu 9401 and seem in the ball park of what i have seen posted for there ppd even if there over a min. different in tpf. all 12 cpu cores show 100% .
So how do I delete that WU in v7 with tracker it was easy?

Code: Select all

*********************** Log Started 2014-02-10T06:54:07Z ***********************
06:54:08:WU04:FS02:Starting
06:54:08:WU04:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/kerry/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 04 -suffix 01 -version 702 -lifeline 4604 -checkpoint 15 -np 10
06:54:08:WU04:FS02:Started FahCore on PID 4712
06:54:08:WU04:FS02:Core PID:4724
06:54:08:WU04:FS02:FahCore 0xa4 started
06:54:08:WU04:FS02:0xa4:
06:54:08:WU04:FS02:0xa4:*------------------------------*
06:54:08:WU04:FS02:0xa4:Folding@Home Gromacs GB Core
06:54:08:WU04:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
06:54:08:WU04:FS02:0xa4:
06:54:08:WU04:FS02:0xa4:Preparing to commence simulation
06:54:08:WU04:FS02:0xa4:- Looking at optimizations...
06:54:08:WU04:FS02:0xa4:- Files status OK
06:54:08:WU04:FS02:0xa4:- Expanded 29730 -> 307092 (decompressed 1032.9 percent)
06:54:08:WU04:FS02:0xa4:Called DecompressByteArray: compressed_data_size=29730 data_size=307092, decompressed_data_size=307092 diff=0
06:54:08:WU04:FS02:0xa4:- Digital signature verified
06:54:08:WU04:FS02:0xa4:
06:54:08:WU04:FS02:0xa4:Project: 7044 (Run 0, Clone 158, Gen 14)
06:54:08:WU04:FS02:0xa4:
06:54:08:WU04:FS02:0xa4:Assembly optimizations on if available.
06:54:08:WU04:FS02:0xa4:Entering M.D.
06:54:14:WU04:FS02:0xa4:Using Gromacs checkpoints
06:54:15:WU04:FS02:0xa4:Mapping NT from 10 to 10 
06:54:16:WU04:FS02:0xa4:Resuming from checkpoint
06:54:16:WU04:FS02:0xa4:Verified 04/wudata_01.log
06:54:16:WU04:FS02:0xa4:Verified 04/wudata_01.trr
06:54:16:WU04:FS02:0xa4:Verified 04/wudata_01.xtc
06:54:16:WU04:FS02:0xa4:Verified 04/wudata_01.edr
06:54:16:WU04:FS02:0xa4:Completed 2653450 out of 25000000 steps  (10%)
07:55:03:WU04:FS02:0xa4:Completed 2750000 out of 25000000 steps  (11%)
10:31:15:WU04:FS02:0xa4:Completed 3000000 out of 25000000 steps  (12%)
******************************** Date: 10/02/14 ********************************
13:07:59:WU04:FS02:0xa4:Completed 3250000 out of 25000000 steps  (13%)
15:45:15:WU04:FS02:0xa4:Completed 3500000 out of 25000000 steps  (14%)
18:22:20:WU04:FS02:0xa4:Completed 3750000 out of 25000000 steps  (15%)
******************************** Date: 10/02/14 ********************************
20:59:31:WU04:FS02:0xa4:Completed 4000000 out of 25000000 steps  (16%)
23:36:39:WU04:FS02:0xa4:Completed 4250000 out of 25000000 steps  (17%)
******************************** Date: 11/02/14 ********************************
Mod Edit: Changed List Tags To Code Tags - PantherX

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 1:51 am
by EXT64
I'm not certain, as I have not seen that WU recently, however 25 million steps seems pretty high. You might have gotten one where they gave you 10x the amount of work that you are supposed to get.

Easiest way to delete is to delete the CPU slot in FAHControl and then add it back after it cleans up.

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 1:53 am
by bruce
Windows, right?

How much idle time is shown in Task Manager. What percentages are allocated to each of the top tasks?

On an idle system, you can expect FahCore_a4 to be taking 10/12 = 83% and if you have two GPUs, then FahCore_1* would be taking 17% WITH NOTHING ELSE TAKING ENOUGH TO MATTER. On the other hand, if you're running something else measurable, the 83% will be a smaller number and FahCore_a4 will be running A LOT slower. If that describes your system, then reduce the number of CPUs allocated to the CPU slot until there is a bit of idle time. The CPU cores (including FahCore_a4) run really a lot slower if you've over-committed your CPUs.

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 2:02 am
by kerryd
Bruce The cpu is a 3930k 10 core fold smp 2 cores are set for gpu's There is and has been nothing running but V7 and a smp and 2 9401's on the gpus. The cpu is running 100%on all 12 cores at 4400MHz
That log was started after a reboot last night at this point I want to delete that wu.If I can not get rid of it I will delete V7 and the 2 gpu wus along with the smp!!!1

FhaCorea4.e is using 83% of cpu core 17.1 is at 08% core17 2 is running 08%
every thing else is a 0%


I did not see his/her post.I was starting to think it was core 17 using most of the cpu both 9401's started about the time the wu did.

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 2:08 am
by bruce
I understand you're upset, but you didn't answer my questions.

The i7-39xx is a 6-core chip with hyperthreading. For every core than you use between 6 ad 12, the performance increase is really quite small and if you happen to have enough work for your system to do so that FahCore_a4 has less than 100% access to the number of cores you've configured, performance is degraded very rapidly. Leaving a little idle time may actually improve your throughput.

EXT64 answered your question: Delete the slot.

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 2:24 am
by kerryd
Thanks EXT64 a tpf of 1min 5 sec is more like it.So I am thinking that was a bad WU

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 3:19 am
by Joe_H
kerryd wrote:Thanks EXT64 a tpf of 1min 5 sec is more like it.So I am thinking that was a bad WU
A TPF of 1 min on a WU from which project? It makes a difference. Project 7044 is a large project with a preferred deadline of 33 days and a final timeout of 72. At a TPF of 1 hour it would have easily completed within deadlines. In any case it was not a bad WU, it has been completed successfully by another folder.

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 6:28 am
by kerryd
Joe_H wrote:
kerryd wrote:Thanks EXT64 a tpf of 1min 5 sec is more like it.So I am thinking that was a bad WU
A TPF of 1 min on a WU from which project? It makes a difference. Project 7044 is a large project with a preferred deadline of 33 days and a final timeout of 72. At a TPF of 1 hour it would have easily completed within deadlines. In any case it was not a bad WU, it has been completed successfully by another folder.
So your saying that there was nothing wrong with that WU or the core your full of it.If it was ok then it should of gave bigadv. ppd not 400 ppd.
Right now I am running a 8566 wu with a tpf of 6 min 13sec . I have never seen a tpf higher then 10 min NEVER.The only thing I did was dump that stinking wu no reboot nothing. Who knows the core could of got corrupted when the video drivers crashed well running 2 9401 wu's only other thing running besides V7 was firefox. And yes it has been completed successfully by another folder but there where allso some that WHERE NOT.
. heres something on that wu and running 10 cores


Re: EUE on New Clones for 7048 and 7044

Postby EXT64 » Sun Apr 14, 2013 1:42 pm
Interesting - such a quick failure (never really started) tells me either the download was bad (received garbage) or these WU really don't like SMP 10, though I had never heard 10 being a problem.


Re: EUE on New Clones for 7048 and 7044

Postby Joe_H » Mon Apr 15, 2013 12:29 am
I have read a couple anecdotal reports of some WU's failing with a multiple of 5, but the last was well over a year ago. But there was never any good followup, so it could have been due to a different reason. Hard to say more without some way of looking inside the calculations in progress. So if 5 is ever a problem, it must be only very rarely and possibly only on specific WU's of a project

Re: Fahcore a4 wu 7044

Posted: Tue Feb 11, 2014 7:00 am
by bruce
kerryd wrote:And yes it has been completed successfully by another folder but there where allso some that WHERE NOT.
. heres something on that wu and running 10 cores
The WU has been successfully completed by THREE people. They took
Days taken to complete WU: 2.18
Days taken to complete WU: 6.57
Days taken to complete WU: 24.98

All short of the 33.30 day preferred deadline and also short of the 72.10 day deadline.

Considering the fact that some people run CPUs with a single core and others have as many as 12 or 24 or 32 or even 48, there's bound to be a wide variation in completion times.

We have no way of knowing how many hours per day any of those people fold; we only know when it was assigned and when it was returned.

There's a reason why the4 deadline is set to 33.3 days. It's entirely possible that the slowest of machines might have a TPF of several hours and as long as it can be completed in 25 to 33 days, good science is being done by a slow machine. You, however, are using 10 cores, so you might complete it in 2.5 to 3.3 days so a TPF of 1.5 hours is not unrealistic. I wouldn't expect it to be 2.5 hours, though.