Page 1 of 2

Timeout/Expiration limits and specific work units.

Posted: Tue Feb 04, 2014 12:59 am
by twizzle
As mentioned in another thread, I recently had to dump a series of work units recently because not only couldn't I complete within the timeout limit, they wouldn't make it before hitting the expiration limit either.

The box is a quad-core, it's available for ~ 8.5 hours per day, and a lot of the time I can let fahclient have all of the CPU's, but sometimes (like at the moment) I'm playing with VM's and need to drop it back to two or three cores otherwise the box is unusable. With this particular WU, it was assigned on a Thursday, spent most of Friday processing... and then I realized that it was set to timeout on the weekend and expire on the Tuesday, and at the rate it had progressed it needed around 34 - 36 hours of processing in total (two cores). Only a few days before I had to dump a WU because on the full four cores it would have required ~28 hours to complete, but on two cores it was heading towards ~100 hours of processing.

Usually, expiration and timeout values are more than sufficient to complete the work, but on occasions they just seem completely unrealistic, and recently it has been more than 'occasional'. Possibly this is because the performance on two cores isn't 50% of four cores, it's more like 30% (based on some back-of-envelope calculations).

Log files below :

Code: Select all

*********************** Log Started 2014-01-30T22:17:58Z ***********************
22:17:58:************************* Folding@home Client *************************
22:17:58:      Website: http://folding.stanford.edu/
22:17:58:    Copyright: (c) 2009-2013 Stanford University
22:17:58:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:17:58:         Args: 
22:17:58:       Config: C:/LocalFiles/FAHClient/config.xml
22:17:58:******************************** Build ********************************
22:17:58:      Version: 7.3.6
22:17:58:         Date: Feb 18 2013
22:17:58:         Time: 15:25:17
22:17:58:      SVN Rev: 3923
22:17:58:       Branch: fah/trunk/client
22:17:58:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
22:17:58:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
22:17:58:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
22:17:58:     Platform: win32 XP
22:17:58:         Bits: 32
22:17:58:         Mode: Release
22:17:58:******************************* System ********************************
22:17:58:          CPU: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz
22:17:58:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
22:17:58:         CPUs: 4
22:17:58:       Memory: 3.95GiB
22:17:58:  Free Memory: 1.76GiB
22:17:58:      Threads: WINDOWS_THREADS
22:17:58:  Has Battery: false
22:17:58:   On Battery: false
22:17:58:   UTC offset: 11
22:17:58:          PID: 6280
22:17:58:          CWD: C:/LocalFiles/FAHClient
22:17:58:           OS: Windows 7 Enterprise
22:17:58:      OS Arch: AMD64
22:17:58:         GPUs: 2
22:17:58:        GPU 0: ATI:5 Caicos [Radeon HD 7000 Series]
22:17:58:        GPU 1: ATI:4 Cedar PRO [Radeon HD 5450]
22:17:58:         CUDA: Not detected
22:17:58:Win32 Service: false
22:17:58:***********************************************************************
22:17:58:<config>
22:17:58:  <!-- Folding Core -->
22:17:58:  <checkpoint v='5'/>
22:17:58:  <cpu-affinity v='true'/>
22:17:58:  <cpu-usage v='75'/>
22:17:58:
22:17:58:  <!-- Folding Slot Configuration -->
22:17:58:  <cause v='CANCER'/>
22:17:58:
22:17:58:  <!-- Network -->
xxxxxxxxxxx
22:17:58:
22:17:58:  <!-- User Information -->
22:17:58:  <user v='Twizzle'/>
22:17:58:
22:17:58:  <!-- Folding Slots -->
22:17:58:  <slot id='0' type='CPU'>
22:17:58:    <cpus v='-1'/>
22:17:58:  </slot>
22:17:58:</config>
22:17:58:Trying to access database...
22:17:58:Successfully acquired database lock
22:17:58:Enabled folding slot 00: READY cpu:3
22:17:58:WU01:FS00:Starting
22:17:58:WARNING:WU01:FS00:Changed SMP threads from 2 to 3 this can cause some work units to fail
22:17:58:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/LocalFiles/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 01 -suffix 01 -version 703 -lifeline 6280 -checkpoint 5 -cpu 75 -np 3
22:17:58:WU01:FS00:Started FahCore on PID 5252
22:17:58:WU01:FS00:Core PID:6836
22:17:58:WU01:FS00:FahCore 0xa3 started
22:17:59:WU01:FS00:0xa3:
22:17:59:WU01:FS00:0xa3:*------------------------------*
22:17:59:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
22:17:59:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
22:17:59:WU01:FS00:0xa3:
22:17:59:WU01:FS00:0xa3:Preparing to commence simulation
22:17:59:WU01:FS00:0xa3:- Looking at optimizations...
22:17:59:WU01:FS00:0xa3:- Files status OK
22:17:59:WU01:FS00:0xa3:- Expanded 1766091 -> 2700840 (decompressed 152.9 percent)
22:17:59:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=1766091 data_size=2700840, decompressed_data_size=2700840 diff=0
22:17:59:WU01:FS00:0xa3:- Digital signature verified
22:17:59:WU01:FS00:0xa3:
22:17:59:WU01:FS00:0xa3:Project: 7504 (Run 5, Clone 34, Gen 64)
22:17:59:WU01:FS00:0xa3:
22:17:59:WU01:FS00:0xa3:Assembly optimizations on if available.
22:17:59:WU01:FS00:0xa3:Entering M.D.
22:18:05:WU01:FS00:0xa3:Using Gromacs checkpoints
22:18:06:WU01:FS00:0xa3:Mapping NT from 3 to 3 
22:18:06:WU01:FS00:0xa3:Resuming from checkpoint
22:18:07:WU01:FS00:0xa3:Verified 01/wudata_01.log
22:18:07:WU01:FS00:0xa3:Verified 01/wudata_01.trr
22:18:07:WU01:FS00:0xa3:Verified 01/wudata_01.xtc
22:18:07:WU01:FS00:0xa3:Verified 01/wudata_01.edr
22:18:07:WU01:FS00:0xa3:Completed 29130 out of 500000 steps  (5%)
22:19:45:WU01:FS00:0xa3:Completed 30000 out of 500000 steps  (6%)
22:28:23:WU01:FS00:0xa3:Completed 35000 out of 500000 steps  (7%)
22:35:34:FS00:Shutting core down
22:35:35:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
22:35:35:WU01:FS00:Starting
22:35:35:WARNING:WU01:FS00:Changed SMP threads from 3 to 4 this can cause some work units to fail
22:35:35:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/LocalFiles/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 01 -suffix 01 -version 703 -lifeline 6280 -checkpoint 5 -cpu 75 -np 4
22:35:35:WU01:FS00:Started FahCore on PID 6520
22:35:35:WU01:FS00:Core PID:5616
22:35:35:WU01:FS00:FahCore 0xa3 started
22:35:36:WU01:FS00:0xa3:
22:35:36:WU01:FS00:0xa3:*------------------------------*
22:35:36:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
22:35:36:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
22:35:36:WU01:FS00:0xa3:
22:35:36:WU01:FS00:0xa3:Preparing to commence simulation
22:35:36:WU01:FS00:0xa3:- Looking at optimizations...
22:35:36:WU01:FS00:0xa3:- Files status OK
22:35:36:WU01:FS00:0xa3:- Expanded 1766091 -> 2700840 (decompressed 152.9 percent)
22:35:36:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=1766091 data_size=2700840, decompressed_data_size=2700840 diff=0
22:35:36:WU01:FS00:0xa3:- Digital signature verified
22:35:36:WU01:FS00:0xa3:
22:35:36:WU01:FS00:0xa3:Project: 7504 (Run 5, Clone 34, Gen 64)
22:35:36:WU01:FS00:0xa3:
22:35:36:WU01:FS00:0xa3:Assembly optimizations on if available.
22:35:36:WU01:FS00:0xa3:Entering M.D.
22:35:42:WU01:FS00:0xa3:Using Gromacs checkpoints
22:35:42:WU01:FS00:0xa3:Mapping NT from 4 to 4 
22:35:44:WU01:FS00:0xa3:Resuming from checkpoint
22:35:44:WU01:FS00:0xa3:Verified 01/wudata_01.log
22:35:44:WU01:FS00:0xa3:Verified 01/wudata_01.trr
22:35:44:WU01:FS00:0xa3:Verified 01/wudata_01.xtc
22:35:44:WU01:FS00:0xa3:Verified 01/wudata_01.edr
22:35:44:WU01:FS00:0xa3:Completed 37770 out of 500000 steps  (7%)
22:38:49:WU01:FS00:0xa3:Completed 40000 out of 500000 steps  (8%)
22:45:27:WU01:FS00:0xa3:Completed 45000 out of 500000 steps  (9%)
22:52:06:WU01:FS00:0xa3:Completed 50000 out of 500000 steps  (10%)
22:58:45:WU01:FS00:0xa3:Completed 55000 out of 500000 steps  (11%)
23:05:22:WU01:FS00:0xa3:Completed 60000 out of 500000 steps  (12%)
23:13:16:WU01:FS00:0xa3:Completed 65000 out of 500000 steps  (13%)
23:21:25:WU01:FS00:0xa3:Completed 70000 out of 500000 steps  (14%)
23:28:28:WU01:FS00:0xa3:Completed 75000 out of 500000 steps  (15%)
23:35:23:WU01:FS00:0xa3:Completed 80000 out of 500000 steps  (16%)
23:42:12:WU01:FS00:0xa3:Completed 85000 out of 500000 steps  (17%)
23:49:21:WU01:FS00:0xa3:Completed 90000 out of 500000 steps  (18%)
23:55:56:WU01:FS00:0xa3:Completed 95000 out of 500000 steps  (19%)
00:02:55:WU01:FS00:0xa3:Completed 100000 out of 500000 steps  (20%)
00:09:39:WU01:FS00:0xa3:Completed 105000 out of 500000 steps  (21%)
00:17:03:WU01:FS00:0xa3:Completed 110000 out of 500000 steps  (22%)
00:19:51:FS00:Shutting core down
00:19:54:WU01:FS00:0xa3:Client no longer detected. Shutting down core 
00:19:54:WU01:FS00:0xa3:
00:19:54:WU01:FS00:0xa3:Folding@home Core Shutdown: CLIENT_DIED
00:19:55:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
00:19:55:WU01:FS00:Starting
00:19:55:WARNING:WU01:FS00:Changed SMP threads from 4 to 2 this can cause some work units to fail
00:19:55:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/LocalFiles/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 01 -suffix 01 -version 703 -lifeline 6280 -checkpoint 5 -cpu 75 -np 2
00:19:55:WU01:FS00:Started FahCore on PID 7748
00:19:55:WU01:FS00:Core PID:668
00:19:55:WU01:FS00:FahCore 0xa3 started
00:19:56:WU01:FS00:0xa3:
00:19:56:WU01:FS00:0xa3:*------------------------------*
00:19:56:WU01:FS00:0xa3:Folding@Home Gromacs SMP Core
00:19:56:WU01:FS00:0xa3:Version 2.27 (Dec. 15, 2010)
00:19:56:WU01:FS00:0xa3:
00:19:56:WU01:FS00:0xa3:Preparing to commence simulation
00:19:56:WU01:FS00:0xa3:- Looking at optimizations...
00:19:56:WU01:FS00:0xa3:- Files status OK
00:19:56:WU01:FS00:0xa3:- Expanded 1766091 -> 2700840 (decompressed 152.9 percent)
00:19:56:WU01:FS00:0xa3:Called DecompressByteArray: compressed_data_size=1766091 data_size=2700840, decompressed_data_size=2700840 diff=0
00:19:56:WU01:FS00:0xa3:- Digital signature verified
00:19:56:WU01:FS00:0xa3:
00:19:56:WU01:FS00:0xa3:Project: 7504 (Run 5, Clone 34, Gen 64)
00:19:56:WU01:FS00:0xa3:
00:19:56:WU01:FS00:0xa3:Assembly optimizations on if available.
00:19:56:WU01:FS00:0xa3:Entering M.D.
00:20:02:WU01:FS00:0xa3:Using Gromacs checkpoints
00:20:02:WU01:FS00:0xa3:Mapping NT from 2 to 2 
00:20:03:WU01:FS00:0xa3:Resuming from checkpoint
00:20:03:WU01:FS00:0xa3:Verified 01/wudata_01.log
00:20:03:WU01:FS00:0xa3:Verified 01/wudata_01.trr
00:20:03:WU01:FS00:0xa3:Verified 01/wudata_01.xtc
00:20:03:WU01:FS00:0xa3:Verified 01/wudata_01.edr
00:20:03:WU01:FS00:0xa3:Completed 109200 out of 500000 steps  (21%)
00:22:46:WU01:FS00:0xa3:Completed 110000 out of 500000 steps  (22%)
00:44:48:WU01:FS00:0xa3:Completed 115000 out of 500000 steps  (23%)
01:07:12:WU01:FS00:0xa3:Completed 120000 out of 500000 steps  (24%)
01:25:02:WU01:FS00:0xa3:Completed 125000 out of 500000 steps  (25%)
01:39:06:WU01:FS00:0xa3:Completed 130000 out of 500000 steps  (26%)
01:55:34:WU01:FS00:0xa3:Completed 135000 out of 500000 steps  (27%)
02:10:59:WU01:FS00:0xa3:Completed 140000 out of 500000 steps  (28%)
02:32:36:WU01:FS00:0xa3:Completed 145000 out of 500000 steps  (29%)
02:57:30:WU01:FS00:0xa3:Completed 150000 out of 500000 steps  (30%)
03:20:38:WU01:FS00:0xa3:Completed 155000 out of 500000 steps  (31%)
03:47:18:WU01:FS00:0xa3:Completed 160000 out of 500000 steps  (32%)
04:10:45:WU01:FS00:0xa3:Completed 165000 out of 500000 steps  (33%)
******************************* Date: 2014-01-31 *******************************
04:32:39:WU01:FS00:0xa3:Completed 170000 out of 500000 steps  (34%)
04:57:34:WU01:FS00:0xa3:Completed 175000 out of 500000 steps  (35%)
05:25:57:WU01:FS00:0xa3:Completed 180000 out of 500000 steps  (36%)
05:54:35:WU01:FS00:0xa3:Completed 185000 out of 500000 steps  (37%)
05:59:35:Removing old file 'configs/config-20130830-053052.xml'
05:59:35:Saving configuration to config.xml
05:59:35:<config>
05:59:35:  <!-- Folding Core -->
05:59:35:  <checkpoint v='5'/>
05:59:35:  <cpu-affinity v='true'/>
05:59:35:  <cpu-usage v='75'/>
05:59:35:
05:59:35:  <!-- Folding Slot Configuration -->
05:59:35:  <cause v='CANCER'/>
05:59:35:  <power v='LIGHT'/>
05:59:35:
05:59:35:  <!-- Network -->
05:59:35:  <proxy v='172.25.105.80:8080'/>
05:59:35:  <proxy-enable v='true'/>
05:59:35:  <proxy-pass v='*******'/>
05:59:35:  <proxy-user v='exitm2'/>
05:59:35:
05:59:35:  <!-- User Information -->
05:59:35:  <user v='Twizzle'/>
05:59:35:
05:59:35:  <!-- Folding Slots -->
05:59:35:</config>
05:59:35:FS00:Shutting core down
05:59:39:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
05:59:39:WARNING:WU01:Slot ID 0 no longer exists and there are no other matching slots, dumping
05:59:39:WU01:Sending unit results: id:01 state:SEND error:DUMPED project:7504 run:5 clone:34 gen:64 core:0xa3 unit:0x0000004ffbcb017d4e39b921cbbceeb5
05:59:39:WU01:Connecting to 172.25.105.80:8080
05:59:39:WARNING:WU01:Exception: Failed to send results to work server: Transfer failed
05:59:39:WU01:Trying to send results to collection server
05:59:39:WU01:Connecting to 172.25.105.80:8080
05:59:40:WU01:Server responded WORK_ACK (400)
05:59:40:WU01:Cleaning up

And here's the previous one I dumped after realizing that it just wasn't going to complete in time (there was a long weekend in the middle etc. etc.).

Code: Select all

22:48:43:WU01:FS02:0xa3:*------------------------------*
22:48:43:WU01:FS02:0xa3:Folding@Home Gromacs SMP Core
22:48:43:WU01:FS02:0xa3:Version 2.27 (Dec. 15, 2010)
22:48:43:WU01:FS02:0xa3:
22:48:43:WU01:FS02:0xa3:Preparing to commence simulation
22:48:43:WU01:FS02:0xa3:- Looking at optimizations...
22:48:43:WU01:FS02:0xa3:- Files status OK
22:48:43:WU01:FS02:0xa3:- Expanded 3849950 -> 4393460 (decompressed 114.1 percent)
22:48:43:WU01:FS02:0xa3:Called DecompressByteArray: compressed_data_size=3849950 data_size=4393460, decompressed_data_size=4393460 diff=0
22:48:44:WU01:FS02:0xa3:- Digital signature verified
22:48:44:WU01:FS02:0xa3:
22:48:44:WU01:FS02:0xa3:Project: 8573 (Run 0, Clone 5, Gen 383)
22:48:44:WU01:FS02:0xa3:
22:48:44:WU01:FS02:0xa3:Assembly optimizations on if available.
22:48:44:WU01:FS02:0xa3:Entering M.D.
22:48:50:WU01:FS02:0xa3:Using Gromacs checkpoints
22:48:50:WU01:FS02:0xa3:Mapping NT from 4 to 4 
22:48:51:WU01:FS02:0xa3:Resuming from checkpoint
22:48:51:WU01:FS02:0xa3:Verified 01/wudata_01.log
22:48:52:WU01:FS02:0xa3:Verified 01/wudata_01.trr
22:48:52:WU01:FS02:0xa3:Verified 01/wudata_01.edr
22:48:52:WU01:FS02:0xa3:Completed 124685 out of 500000 steps  (24%)
22:49:58:WU01:FS02:0xa3:Completed 125000 out of 500000 steps  (25%)
23:07:14:WU01:FS02:0xa3:Completed 130000 out of 500000 steps  (26%)
23:24:32:WU01:FS02:0xa3:Completed 135000 out of 500000 steps  (27%)
23:41:50:WU01:FS02:0xa3:Completed 140000 out of 500000 steps  (28%)
23:58:51:WU01:FS02:0xa3:Completed 145000 out of 500000 steps  (29%)
00:16:00:WU01:FS02:0xa3:Completed 150000 out of 500000 steps  (30%)
00:33:07:WU01:FS02:0xa3:Completed 155000 out of 500000 steps  (31%)
00:50:17:WU01:FS02:0xa3:Completed 160000 out of 500000 steps  (32%)
01:07:20:WU01:FS02:0xa3:Completed 165000 out of 500000 steps  (33%)
01:24:32:WU01:FS02:0xa3:Completed 170000 out of 500000 steps  (34%)
01:42:00:WU01:FS02:0xa3:Completed 175000 out of 500000 steps  (35%)
01:59:36:WU01:FS02:0xa3:Completed 180000 out of 500000 steps  (36%)
02:17:08:WU01:FS02:0xa3:Completed 185000 out of 500000 steps  (37%)
02:34:13:WU01:FS02:0xa3:Completed 190000 out of 500000 steps  (38%)
02:51:46:WU01:FS02:0xa3:Completed 195000 out of 500000 steps  (39%)
03:11:09:WU01:FS02:0xa3:Completed 200000 out of 500000 steps  (40%)
03:31:46:WU01:FS02:0xa3:Completed 205000 out of 500000 steps  (41%)
03:49:23:WU01:FS02:0xa3:Completed 210000 out of 500000 steps  (42%)
04:06:54:WU01:FS02:0xa3:Completed 215000 out of 500000 steps  (43%)
04:24:43:WU01:FS02:0xa3:Completed 220000 out of 500000 steps  (44%)
******************************* Date: 2014-01-22 *******************************
04:41:57:WU01:FS02:0xa3:Completed 225000 out of 500000 steps  (45%)
04:59:12:WU01:FS02:0xa3:Completed 230000 out of 500000 steps  (46%)
05:18:09:WU01:FS02:0xa3:Completed 235000 out of 500000 steps  (47%)
05:36:34:WU01:FS02:0xa3:Completed 240000 out of 500000 steps  (48%)
05:53:45:WU01:FS02:0xa3:Completed 245000 out of 500000 steps  (49%)
Mod Edit: Added Code Tags - PantherX

Re: Timeout/Expiration limits and specific work units.

Posted: Tue Feb 04, 2014 1:08 am
by twizzle
And, while I'm on the subject, I just had a look at my 'other' pc which is used to drive some screens up on the wall - so it's available 24x7 running on an old hyper-threaded P4 @ 3.2GHz where it gets > 90% of the processor. The current WU will probably just miss the timeout after 8.5 days of processing.

Re: Timeout/Expiration limits and specific work units.

Posted: Tue Feb 04, 2014 1:59 am
by PantherX
From your configuration it seems that you are using these two settings:
05:59:35: <cpu-affinity v='true'/>
05:59:35: <cpu-usage v='75'/>

I would recommend that you remove the CPU-affinity since it may cause more harm than good. Also, the CPU Usage is only applicable for older FahCores which use a single CPU only. The newer FahCores don't use this setting so you can leave it as it is.

Also, are you basing your estimates of missing the Expiration Deadline on Advanced Control (AKA FAHControl), manually by calculating the TPF or by using HFM.NET?

On my non-dedicated Intel Pentium E5500 @ stock, it takes 24 hours to 30 hours to finish a WU from Project 7504. I would suggest that before the weekends, you could use the Finish button to ensure that you will not be assigned any new WUs which will not be processed over the weekend.

Re: Timeout/Expiration limits and specific work units.

Posted: Tue Feb 04, 2014 5:49 am
by twizzle
My calculation are based on steps completed vs minutes taken, then compared against the timeout/expiration date.

I've removed CPU affinity from my normal desktop (not used on the dedicated box), I'll see how I go with the current WU (Project 8583) on the 4-way. By my calculation, it will take 39.09 hours running as a 3-way task, it's got another 30.1 hours of processing to go, and it has about 40 hours of processing time available - if I leave it running on three processors which *is* affecting the desktop.

Re: Timeout/Expiration limits and specific work units.

Posted: Tue Feb 04, 2014 12:11 pm
by PantherX
Could you please describe how your desktop is being affected? CPU folding is done with idle setting by default so virtually all processes gets priority over it.

If your CPU is being over-subscribed, then CPU folding would most likely be effected in a negative manner. Thus, you may want to reduce the CPUs allocated to the CPU Slot.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 12:49 am
by twizzle
Horribly laggy desktop response, and the VM desktop (another copy of Win7) becomes non-responsive. Despite the process priority being "low", Win7 doesn't appear to work well with fahclient, and at times I'm forced to set folding to "idle" or "off" to get work done. And if I drop from "medium" to "light" (NT=2), I greatly increase the chances of missing the timeout & the expiration because the processing times blow out (ie ~ 30 hours to ~ 100 hours). And "Idle" immediately suspends the processing unless the screensaver is active.

And none of this addresses the basic problem : frequently, work units are taking so long to process that 40 hours of availability a week isn't sufficient to complete before the timeout/expiration times, even when processing at a level which impacts the desktop. This wasn't a problem I noticed a few months ago. Just like the GPU's - Core 17 came out, and it became impossible to process GPU WU's on this desktop because it stopped the attached screens from responding and it was impossible to complete before the expiration times.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 1:27 am
by 7im
VM has a lot of overhead that folding does not and cannot account for. Same with running any other resource hungry applications. At that point, when advanced applications are in use, advanced decisions need to be made about resource allocation.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 1:35 am
by PantherX
May I suggest that set your number of CPUs to 1 and see if you can meet all the deadlines of the WUs being assigned?

Given your unique workload, you may be better off running a single CPU Slot with 1 CPU allocated to it. That way, you will meet all the deadlines and hopefully, your work will not be seriously impacted by this.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 5:11 am
by twizzle
So... I set it to do uni-processing, and I won't get any SMP WU's? I don't see how this deals with the underlying issue. Given my 24x7 box will have trouble finishing the current WU before the eight day timeout (unlikely now, as I have to power it down tomorrow night for a power outage), how many desktops are out there dumping WU's because they almost certainly *aren't* running 24x7 "full"?

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 5:25 am
by 7im
Assuring the client is meeting the deadlines has always been up to the donor, not the project. There is no way for fah to know you only fold a few hours a day, or that you run resource intensive applications in addition to fah.

Thats like asking your car to get good gas mileage while towing a huge boat. And then blaming the boat maker for a bigger gas bill.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 6:24 am
by PantherX
SMP WUs have shorter deadlines and are usually meant for systems that are powered on for several hours a day. UNI WUs on the other hand, have longer deadlines and are more suited for systems that run few hours a day. Thus, with longer deadlines, you have plenty of time to finish WUs.

Moreover, v6 by default would always start in UNI mode to prevent this issue. Earlier versions of V7 had separate UNI and SMP Slots. It is only in the recent versions that both are merged into a single CPU Slot which the user can configure since it is now easy to do so.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 9:58 pm
by twizzle
7im wrote:Assuring the client is meeting the deadlines has always been up to the donor, not the project. There is no way for fah to know you only fold a few hours a day, or that you run resource intensive applications in addition to fah.

Thats like asking your car to get good gas mileage while towing a huge boat. And then blaming the boat maker for a bigger gas bill.
I'm sorry, but that's a really dumb analogy. Don't hitch a boat to my car and then expect me to be able to deliver it within YOUR timeframe if you set your expectation based on the car being a V8 and the driver never sleeping.

I'm not blaming anyone, I just want realistic timeout and expiration dates that an "average" PC can deliver!

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 10:22 pm
by twizzle
PantherX wrote:SMP WUs have shorter deadlines and are usually meant for systems that are powered on for several hours a day. UNI WUs on the other hand, have longer deadlines and are more suited for systems that run few hours a day. Thus, with longer deadlines, you have plenty of time to finish WUs.

Moreover, v6 by default would always start in UNI mode to prevent this issue. Earlier versions of V7 had separate UNI and SMP Slots. It is only in the recent versions that both are merged into a single CPU Slot which the user can configure since it is now easy to do so.
Thanks for the info. When this WU completes, I will reconfigure the PC as three 1-way slots.

Re: Timeout/Expiration limits and specific work units.

Posted: Wed Feb 05, 2014 10:38 pm
by 7im
All multi core fah work units are average, with average timeouts and deadlines. Made to run on average systems, running an average amount of time per day.

FAH looks at your hardware config, and client settings, and sends work that it expects the client can complete by folding an average amount of hours per day. If you have slower hardware, or only fold part time, or run other applications at the same time, the defininition of average is not the same as if you fold on faster hardware, or fold full time, and don't run other applications. That puts the time management in the hands of the donor.

The average uniprocessor system is several times slower than the average SMP system which has several CPUs that can work in parallel. Deadlines are intentionally shorter for average systems with several cores.

Try this to get some below average (in the deadline) work units. Create a single CPU slot. Configure that slot to use one CPU core. On the Expert tab, add the extra-args-option "-np x" where x is the number of CPUs you want to use for folding (minus the quotes). I assume that to be 3. You will tend to get work units with a longer deadline, but fold them using multiple CPU cores, and thus get more points than folding 3 individual WUs. You should miss fewer deadlines. But again, the time management is up to you. You may still need to fold more hours per day. The availability of lower end work units is limited.

Re: Timeout/Expiration limits and specific work units.

Posted: Thu Feb 06, 2014 12:42 am
by jimerickson
scientific constraints have determined that time is of the essence. try
a uniprocessor slot.