Page 1 of 1

p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 8:33 pm
by rickoic
Work unit apparently froze around 98% for 8 date/time checks. Paused it and restarted it and it picked up. Is this something to be on the outlook for frequently?

Code: Select all

*********************** Log Started 2013-05-16T22:05:26Z ***********************
22:05:28:WU00:FS04:Starting
22:05:28:WU00:FS04:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Rick/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 00 -suffix 01 -version 702 -lifeline 5756 -checkpoint 15 -np 8
22:05:28:WU00:FS04:Started FahCore on PID 4996
22:05:32:WU00:FS04:Core PID:1628
22:05:32:WU00:FS04:FahCore 0xa3 started
22:05:32:WU00:FS04:0xa3:
22:05:32:WU00:FS04:0xa3:*------------------------------*
22:05:32:WU00:FS04:0xa3:Folding@Home Gromacs SMP Core
22:05:32:WU00:FS04:0xa3:Version 2.27 (Dec. 15, 2010)
22:05:32:WU00:FS04:0xa3:
22:05:32:WU00:FS04:0xa3:Preparing to commence simulation
22:05:32:WU00:FS04:0xa3:- Ensuring status. Please wait.
22:05:42:WU00:FS04:0xa3:- Looking at optimizations...
22:05:42:WU00:FS04:0xa3:- Working with standard loops on this execution.
22:05:42:WU00:FS04:0xa3:- Previous termination of core was improper.
22:05:42:WU00:FS04:0xa3:- Going to use standard loops.
22:05:42:WU00:FS04:0xa3:- Files status OK
22:05:42:WU00:FS04:0xa3:- Expanded 3849801 -> 4382860 (decompressed 113.8 percent)
22:05:43:WU00:FS04:0xa3:Called DecompressByteArray: compressed_data_size=3849801 data_size=4382860, decompressed_data_size=4382860 diff=0
22:05:43:WU00:FS04:0xa3:- Digital signature verified
22:05:43:WU00:FS04:0xa3:
22:05:43:WU00:FS04:0xa3:Project: 8568 (Run 0, Clone 8, Gen 7)
22:05:43:WU00:FS04:0xa3:
22:05:43:WU00:FS04:0xa3:Entering M.D.
22:05:49:WU00:FS04:0xa3:Using Gromacs checkpoints
22:05:49:WU00:FS04:0xa3:Mapping NT from 8 to 8 
22:05:54:WU00:FS04:0xa3:Resuming from checkpoint
22:05:54:WU00:FS04:0xa3:Verified 00/wudata_01.log
22:05:56:WU00:FS04:0xa3:Verified 00/wudata_01.trr
22:05:56:WU00:FS04:0xa3:Verified 00/wudata_01.edr
22:05:57:WU00:FS04:0xa3:Completed 483225 out of 500000 steps  (96%)
22:57:12:WU00:FS04:0xa3:Completed 485000 out of 500000 steps  (97%)
23:24:25:WU00:FS04:0xa3:Completed 490000 out of 500000 steps  (98%)
******************************** Date: 17/05/13 ********************************
22:05:42:WU00:FS04:0xa3:- Looking at optimizations...
22:05:42:WU00:FS04:0xa3:- Working with standard loops on this execution.
22:05:42:WU00:FS04:0xa3:- Previous termination of core was improper.
22:05:42:WU00:FS04:0xa3:- Going to use standard loops.
22:05:42:WU00:FS04:0xa3:- Files status OK
22:05:42:WU00:FS04:0xa3:- Expanded 3849801 -> 4382860 (decompressed 113.8 percent)
22:05:43:WU00:FS04:0xa3:Called DecompressByteArray: compressed_data_size=3849801 data_size=4382860, decompressed_data_size=4382860 diff=0
22:05:43:WU00:FS04:0xa3:- Digital signature verified
22:05:43:WU00:FS04:0xa3:
22:05:43:WU00:FS04:0xa3:Project: 8568 (Run 0, Clone 8, Gen 7)
22:05:43:WU00:FS04:0xa3:
22:05:43:WU00:FS04:0xa3:Entering M.D.
22:05:49:WU00:FS04:0xa3:Using Gromacs checkpoints
22:05:49:WU00:FS04:0xa3:Mapping NT from 8 to 8 
22:05:54:WU00:FS04:0xa3:Resuming from checkpoint
22:05:54:WU00:FS04:0xa3:Verified 00/wudata_01.log
22:05:56:WU00:FS04:0xa3:Verified 00/wudata_01.trr
22:05:56:WU00:FS04:0xa3:Verified 00/wudata_01.edr
22:05:57:WU00:FS04:0xa3:Completed 483225 out of 500000 steps  (96%)
22:57:12:WU00:FS04:0xa3:Completed 485000 out of 500000 steps  (97%)
23:24:25:WU00:FS04:0xa3:Completed 490000 out of 500000 steps  (98%)
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
******************************** Date: 17/05/13 ********************************
19:41:13:FS04:Paused
19:41:13:FS04:Shutting core down
19:41:16:WU00:FS04:0xa3:Client no longer detected. Shutting down core 
19:41:16:WU00:FS04:0xa3:
19:41:16:WU00:FS04:0xa3:Folding@home Core Shutdown: CLIENT_DIED
19:41:16:WU00:FS04:FahCore returned: INTERRUPTED (102 = 0x66)
19:41:28:FS04:Unpaused
19:41:28:WU00:FS04:Starting
19:41:28:WU00:FS04:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Rick/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 00 -suffix 01 -version 702 -lifeline 5756 -checkpoint 15 -np 8
19:41:28:WU00:FS04:Started FahCore on PID 36356
19:41:28:WU00:FS04:Core PID:34404
19:41:28:WU00:FS04:FahCore 0xa3 started
19:41:29:WU00:FS04:0xa3:
19:41:29:WU00:FS04:0xa3:*------------------------------*
19:41:29:WU00:FS04:0xa3:Folding@Home Gromacs SMP Core
19:41:29:WU00:FS04:0xa3:Version 2.27 (Dec. 15, 2010)
19:41:29:WU00:FS04:0xa3:
19:41:29:WU00:FS04:0xa3:Preparing to commence simulation
19:41:29:WU00:FS04:0xa3:- Looking at optimizations...
19:41:29:WU00:FS04:0xa3:- Files status OK
19:41:29:WU00:FS04:0xa3:- Expanded 3849801 -> 4382860 (decompressed 113.8 percent)
19:41:29:WU00:FS04:0xa3:Called DecompressByteArray: compressed_data_size=3849801 data_size=4382860, decompressed_data_size=4382860 diff=0
19:41:29:WU00:FS04:0xa3:- Digital signature verified
19:41:29:WU00:FS04:0xa3:
19:41:29:WU00:FS04:0xa3:Project: 8568 (Run 0, Clone 8, Gen 7)
19:41:29:WU00:FS04:0xa3:
19:41:29:WU00:FS04:0xa3:Assembly optimizations on if available.
19:41:29:WU00:FS04:0xa3:Entering M.D.
19:41:35:WU00:FS04:0xa3:Using Gromacs checkpoints
19:41:35:WU00:FS04:0xa3:Mapping NT from 8 to 8 
19:48:10:WU00:FS04:0xa3:Resuming from checkpoint
19:48:10:WU00:FS04:0xa3:Verified 00/wudata_01.log
19:48:10:WU00:FS04:0xa3:Verified 00/wudata_01.trr
19:48:10:WU00:FS04:0xa3:Verified 00/wudata_01.edr
19:48:55:WU00:FS04:0xa3:Completed 492865 out of 500000 steps  (98%)
19:48:56:WARNING:WU00:FS04:Detected clock skew (7 mins 27 secs), adjusting time estimates
20:18:38:WU00:FS04:0xa3:Completed 495000 out of 500000 steps  (99%)

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 8:45 pm
by rickoic
Noticed that it's saying Estimated TPF 2.11 days right now and from 20:18:38 to right now 20:45:00 its only completed 99.01%.
Does this sound like a wu gone bad? Will continue to let it fold to see what happends.
Tks
Rick

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 8:47 pm
by rickoic
Never mind. Just finished up a minute later at 100% and has been sent back.

Tks
Rick

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 8:52 pm
by Joe_H
rick1941 wrote:Work unit apparently froze around 98% for 8 date/time checks. Paused it and restarted it and it picked up. Is this something to be on the outlook for frequently?
I have never seen that happen myself, and there are very few reports of similar things happening to other folders. So normally you should not be seeing this happen. One thing that will cause this kind of thing to happen is if your system slows the processor or puts it into sleep or hibernate mode when it detects no user activity at the keyboard or mouse. To keep that from happening takes adjusting the system to maximum performance in the power settings.
rick1941 wrote:Noticed that it's saying Estimated TPF 2.11 days right now and from 20:18:38 to right now 20:45:00 its only completed 99.01%.
Does this sound like a wu gone bad? Will continue to let it fold to see what happends.
No, more likely that the system has slept. The estimated time also takes a while to catch up after a restart, it probably is still using all of those time periods when little to no processing was done on the WU.

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 8:54 pm
by 7im
If it doesn't error out, it looks more like some other program was using all the CPU cycles during that period.

Also, I don't see how 8 of those date log entries can all have the same date. That data line gets logged every 6 hours, and you should never see more than 4-5 of those entries in a day. Very strange.

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 9:20 pm
by rickoic
7im wrote:If it doesn't error out, it looks more like some other program was using all the CPU cycles during that period.

Also, I don't see how 8 of those date log entries can all have the same date. That data line gets logged every 6 hours, and you should never see more than 4-5 of those entries in a day. Very strange.
I don't know either. Was using some OCR software intermittently during that time, but not full time. And my 3 gpu's didn't seem to miss a beat during that time period.


Anyway the wu finished and was sent back to Stanford, so its in their ballpark now.

Tks
Rick

Re: p8568 (run 0,clone 8, gen 7)

Posted: Fri May 17, 2013 10:20 pm
by bruce
OCR software typically uses one or more CPUs very heavily. Folding by CPUs is intentionally set at an extremely low priority so as to not interfere with "normal" processing (including your OCR). Even if the foreground processing only uses one CPU, folding on CPUs only progresses at the rate of the slowest of the threads so OCR might have locked out FAH while it was running. If you plan to do that sort of thing again, you might be better off to reduce the number of CPUs allocated to FAH and leave plenty of resources for other heavy tasks. Changing the number of CPUs in the middle of an assignment can cause problems and it's not recommended. It's better to anticipate or to wait.