Page 1 of 1
Project 7809, TPF increases until restart
Posted: Sat Mar 23, 2013 8:49 am
by Xavier Zepherious
project 7809(5.284.143)
TPF 48.37s
est time to complete 3.1 days
I noticed this was wrong
Rig is been rock steady - hasn't had a single work failure
FAH 7.2.9 Win7 3930k 4.1Ghz
Solution - Pause All work - and then select fold again
seems to fix the long TPF issues - this has occured before on these Work units -and Im not the only one noticing it
this seems to work
tpf is now 11m45s
Somehow the Project gets hung up somehow and just pausing and resuming fixes it
no other project does this
Mod Edit: Edited Thread Subject, Added PRCG To Post - PantherX
Project: 7809 (10, 443, 5) 4 hour TPF
Posted: Sun Mar 24, 2013 1:43 pm
by iancook221188
Project: 7809 (10, 443, 5)
on a 2600k 4.5 it using 99%cpu
17 day to complete
tried restating the client, can i delete the wu it has been going 12 hour now before i noticed the problem, the old log has all the info about the other client in it as well do you want me to post the large jog here as it would be quite long as it goes back 12 hours with two other gpu client working
Mod Edit: Post Merged, PRCG Added - PantherX
Re: project 7809(5.284.143) longTPF
Posted: Sun Mar 24, 2013 1:49 pm
by Qinsp
I've seen a P8101 do this too. Restart drops the TPF a bunch.
Unless you are talking about the STATUS tpf and ETA estimates. They don't work on big jobs as far as I can tell. If I'm rock-steady at 21:00 tpf ±0:05 sec for 30 frames, the estimates will vary between 15min and 45min. I have to hand calc the ETA if I need to service the machine.
Re: Project 7809, TPF increases until restart
Posted: Sun Mar 24, 2013 2:53 pm
by PantherX
Before I report it to the PG Member, it would be nice to have the log files (filtered) along with the PRCGs which might help in troubleshooting this issue. Please ensure that this is indeed caused only by the WU, i.e. other applications aren't using the CPU which is causing the increase in the TPF.
Re: Project 7809, TPF increases until restart
Posted: Sun Mar 24, 2013 7:00 pm
by Xavier Zepherious
I would check on what GPU WU(project) is running with that WU
Im hearing reports On our forums of similar issues with other SMP WU's
did a 8083 WU flawlessly after it
7808 WU(2,362,157) is also doing it now (basically not doing much all night I slept until I restarted it)
never had a problem until the 8070 WU were running also running a 7660 (3 way SLI 670's)
Pause and resume restarts it
the log file doesn't say much - im running verbosity 5
TPF Ranges to 30 - 1 hour
once paused and resumed tpf around 10m
Unusual behavior
what Im assuming is the GPU WU is interfering with the SMP somehow
the other person was running a 7647 SMP WU and a 8070 GPU WU
update after reviewlng log files when the TPF change occurs
Seems the issue coincides with a new GPU WU starting
tpf is at 10min until It sends and receives/starts a new 8070
once that happens tpf goes to 40mins
Re: Project 7809, TPF increases until restart
Posted: Mon Mar 25, 2013 11:53 pm
by Xavier Zepherious
and confirmed once again with the next upload
it doesn't happen right away - but when the new GPU WU start the SMP starts lagging - and within 3 frames is at 40m tpf
Re: Project 7809, TPF increases until restart
Posted: Tue Mar 26, 2013 1:12 am
by PantherX
Could it be possible that some WUs from this Project series is extra sensitive to any interruptions? When you get the TPF increase, if you just pause the GPU Slot (thus there will not be any more interruptions), will the TPF drop back to normal? If it doesn't then it might be an issue.
BTW, if the issue occurs with only the GPU processing 8070 WUs then it would seem that it is using enough CPU cycles to cause a slowdown in the SMP processing.
Re: Project 7809, TPF increases until restart
Posted: Tue Mar 26, 2013 6:50 am
by Xavier Zepherious
the next two wu's were a 7625 and 7660 - and no slowdown occured and had a 8m tpf on 7808
the slow down is not minor either with 8070's - up to 1hr or more TPF
more like a bunch of threads get stuck on the SMP
pause and resuming all of them fixes the smp and bring tpf back to around 10m
so it's not like its stealing cpus cycles (maybe some - because as noted above 2m extra - 25% increase in time to complete a frame)
and it doesn't occur every time either (although there is a increased TPF increase with each new 8070/8071)
Im going to watch these units and update as often as I can to figure out if the GPU WU's are causing it or it's a smp issue
next time im going to try just pausing the gpu's wait see what that does
then I'll restart the gpu's ......and leave SMP alone - if that doesn't fix it
then I'll try the SMP alone
You want debugging - as a programmer myself this is what I do
Re: Project 7809, TPF increases until restart
Posted: Tue Mar 26, 2013 4:36 pm
by bruce
Do note that the debugging will take time. To establish a reasonable TPF estimate, the client must collect enough recent data to establish a pattern that may be different than the previous pattern it has established, so you'll have to wait for several frames to be completed after any change you make.
Re: Project 7809, TPF increases until restart
Posted: Tue Mar 26, 2013 7:48 pm
by PantherX
Thanks for taking the time to debug and document your findings! This would be really helpful when troubleshooting.
Re: Project 7809, TPF increases until restart
Posted: Tue Mar 26, 2013 10:26 pm
by Xavier Zepherious
Happened again with a 8071 WU
Exactly when it starts the new WU - went up to 1hr TPF on the 7809 SMP unit in 2 frames(and may stay like that for hours)
sounds like certain GPU units are hogging CPU resources when they start - and won't release it
Pausing the GPU solves it
the TPF on the SMP improves immediately - down to 5m TPF with the SMP
Restarting the GPU's would probably work to fix the issue rather than resetting them all
- so this is primarily a GPU WU issue
Now if there was a way for me to figure out how much CPU power each GPU uses to see where the issue is
Re: Project 7809, TPF increases until restart
Posted: Wed Mar 27, 2013 1:21 am
by bruce
What CPU percentages does Task Manager report are being used by the various active programs?
Re: Project 7809, TPF increases until restart
Posted: Wed Mar 27, 2013 2:52 am
by Napoleon
I've noticed myself that P807x GPU WUs require quite a lot of CPU time in practice - a lot more than other GPU WUs. It doesn't necessarily show up in Task Manager. I'd recommend dropping to SMP:10 on the 3930K CPU, since SMP:11 is known to cause CPU WU failures.
Re: Project 7809, TPF increases until restart
Posted: Wed Mar 27, 2013 5:05 am
by Xavier Zepherious
I'll have to keep this in mind next time to check task manager
at moment a4 core 94-97%
a baseline tpf (my rig)on 7808/7809 is 8-12m tpf with a 3 GPUs running... 5m20s tpf when not running GPU's
(almost twice as long when my gpu's are running)
and that's fine Im not getting 45K like my SMP rig running the same WU's nor do I expect it
10k-16K that's fine 7808/7809 always gave me poor numbers with my gpus
just that 30min -1hr TPF is not something I want to waste power or time... 2-4 day est time practically no points
workin an 8082 with tpf of 2m35s(4.1k Credit) with 3 gpus - 23k PPD
vs 7808 for 10K (which I just uploaded)
the issue also occurs on 8082/8083
just to a lesser extend
tpf on 8082 up to 15 mins after fetching a new WU (8070)
found core A4 92%
core 15 at 3%,1%,1%
cpu ide 3%
a4 is not picking up the extra cycle
two hrs at increased tpf
restart of all WU's
a4 it's up to 95 now
tpf back to 3 mins
hmmm looks like I might have to restrict CPU to smp -10
Edit:update
SMP10 is no worse for tpf than smp12
if fact it might be better because the SMP may not be CPU starved now
SMP at 83% cpu use with improved tpf over SMP 12 with 95% cpu use on a4 core
tpf 7ms est PPD on 7809 30k
looks good so far
Re: Project 7809, TPF increases until restart
Posted: Wed Mar 27, 2013 3:56 pm
by Xavier Zepherious
Ya! I'll stay at smp10
CPU usuage on the GPUS dropped too down to 1% combined - rather than 5% before - both my gpu's WU's and SMP were starved before
SMP has improved remarkably