Downtime between WUs?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Downtime between WUs?

Post by csvanefalk »

I mostly get 8004 WUs, and currently it takes about 1-2 minutes for a new WU to start folding after one finishes...is there a way to get around this? I am not sure, but it seems that the client waits to download and start folding until the old WU is sucessfully uploaded to the server. Would it not be possible to relegate the verification and uploading to a separate process, so that the new WU can start folding immediately?

The typical downtime message I am referring to is this:

Code: Select all

23:33:54:WU00:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
23:34:13:WU00:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
23:34:13:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
23:34:23:WU00:FS00:0xa4:
23:34:23:WU00:FS00:0xa4:Finished Work Unit:
23:34:23:WU00:FS00:0xa4:- Reading up to 770532 from "00/wudata_01.trr": Read 770532
23:34:23:WU00:FS00:0xa4:trr file hash check passed.
23:34:23:WU00:FS00:0xa4:- Reading up to 456640 from "00/wudata_01.xtc": Read 456640
23:34:23:WU00:FS00:0xa4:xtc file hash check passed.
23:34:23:WU00:FS00:0xa4:edr file hash check passed.
23:34:23:WU00:FS00:0xa4:logfile size: 22476
23:34:23:WU00:FS00:0xa4:Leaving Run
23:34:24:WU00:FS00:0xa4:- Writing 1255052 bytes of core data to disk...
23:34:24:WU00:FS00:0xa4:Done: 1254540 -> 1194750 (compressed to 95.2 percent)
23:34:24:WU00:FS00:0xa4:  ... Done.
23:35:08:WU00:FS00:0xa4:- Shutting down core
23:35:08:WU00:FS00:0xa4:
23:35:08:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
23:35:13:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
23:35:13:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:8004 run:121 clone:10 gen:192 core:0xa4 unit:0x000001256652edcb4ee901ca46b1295e
23:35:13:WU00:FS00:Uploading 1.14MiB to 171.67.108.59
23:35:13:WU00:FS00:Connecting to 171.67.108.59:8080
23:35:13:WU01:FS00:Connecting to assign3.stanford.edu:8080
23:35:14:WU01:FS00:News: Welcome to Folding@Home
23:35:14:WU01:FS00:Assigned to work server 171.67.108.59
23:35:14:WU01:FS00:Requesting new work unit for slot 00: READY smp:12 from 171.67.108.59
23:35:14:WU01:FS00:Connecting to 171.67.108.59:8080
23:35:15:WU01:FS00:Downloading 531.56KiB
23:35:16:WU00:FS00:Upload complete
23:35:16:WU00:FS00:Server responded WORK_ACK (400)
23:35:16:WU00:FS00:Final credit estimate, 922.00 points
23:35:16:WU00:FS00:Cleaning up
23:35:22:WU01:FS00:Download 96.32%
23:35:22:WU01:FS00:Download complete
23:35:22:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8004 run:175 clone:16 gen:86 core:0xa4 unit:0x0000007d6652edcb4ee9037454d514d4
23:35:22:WU01:FS00:Starting
23:35:22:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 956 -checkpoint 30 -np 12
23:35:22:WU01:FS00:Started FahCore on PID 1630
23:35:22:Started thread 9 on PID 956
23:35:22:WU01:FS00:Core PID:1634
23:35:22:WU01:FS00:FahCore 0xa4 started
23:35:22:WU01:FS00:0xa4:
23:35:22:WU01:FS00:0xa4:*------------------------------*
23:35:22:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
23:35:22:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
23:35:22:WU01:FS00:0xa4:
23:35:22:WU01:FS00:0xa4:Preparing to commence simulation
23:35:22:WU01:FS00:0xa4:- Looking at optimizations...
23:35:22:WU01:FS00:0xa4:- Created dyn
23:35:22:WU01:FS00:0xa4:- Files status OK
23:35:22:WU01:FS00:0xa4:- Expanded 543810 -> 1303296 (decompressed 239.6 percent)
23:35:22:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=543810 data_size=1303296, decompressed_data_size=1303296 diff=0
23:35:22:WU01:FS00:0xa4:- Digital signature verified
23:35:22:WU01:FS00:0xa4:
23:35:22:WU01:FS00:0xa4:Project: 8004 (Run 175, Clone 16, Gen 86)
23:35:22:WU01:FS00:0xa4:
23:35:22:WU01:FS00:0xa4:Assembly optimizations on if available.
23:35:22:WU01:FS00:0xa4:Entering M.D.
23:35:28:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
23:35:47:WU01:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Downtime between WUs?

Post by 7im »

What is the value of this option in your log?

<next-unit-percentage v=' ? '/>

Set it one percentage lower so the next WU downloads just a bit sooner.

Please also note that 1-2 minutes is a vast improvement over the previous v6 client. As you can see in your log, the current WU, and the next WU are uploading/downloading concurrently. The v6 client uploaded first, then downloaded next, causing much longer pauses between work units. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

Hehe, I guess I am getting spoiled already ;) I just itch to optimize stuff...will set the variable you pointed out and see what happens :)
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Downtime between WUs?

Post by Jesse_V »

Usually the next-unit-percentage is set to 99, (the default) so that when a WU gets to 99%, the client downloads the next WU. Then when the WU finishes, the next one is ready to go. Like 7im said, you can adjust this value to download the WU sooner or later. But the clock starts when the WU downloads, so it's your call. :D
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

Jesse_V wrote:But the clock starts when the WU downloads
Can you elaborate on this a bit? Could there be problems?
Zagen30
Posts: 823
Joined: Tue Mar 25, 2008 12:45 am
Hardware configuration: Core i7 3770K @3.5 GHz (not folding), 8 GB DDR3 @2133 MHz, 2xGTX 780 @1215 MHz, Windows 7 Pro 64-bit running 7.3.6 w/ 1xSMP, 2xGPU

4P E5-4650 @3.1 GHz, 64 GB DDR3 @1333MHz, Ubuntu Desktop 13.10 64-bit

Re: Downtime between WUs?

Post by Zagen30 »

csvanefalk wrote:
Jesse_V wrote:But the clock starts when the WU downloads
Can you elaborate on this a bit? Could there be problems?
I assume he was referring to how it affects bonus points. The return time is calculated from when the WU is first downloaded, not when the client starts processing it, so by having WUs downloaded at 99% you'll lose some bonus points as the previous WU is being finished. It's not that much, though, so I wouldn't worry about it.
Image
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: Downtime between WUs?

Post by iceman1992 »

csvanefalk wrote:
Jesse_V wrote:But the clock starts when the WU downloads
Can you elaborate on this a bit? Could there be problems?
Just that if you set the client to download a new unit at 91% for example, and it finished downloading at say 94%, it will be sitting idle at least until the current one finishes. That drops your QRB :wink:
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

Ah ok, thanks! :D
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

EDIT: this has occured for all WUs since I changed the setting, so I am guessing I am doing something wrong :-/

Alright, so I set it to 98%...and now I am having trouble with the client being assigned the same WU it is already working on?? Can I fix this?

Code: Select all

17:08:51:WU00:FS00:0xa4:Completed 242500 out of 250000 steps  (97%)
17:09:10:WU00:FS00:0xa4:Completed 245000 out of 250000 steps  (98%)
17:09:10:WU01:FS00:Connecting to assign3.stanford.edu:8080
17:09:11:WU01:FS00:News: Welcome to Folding@Home
17:09:11:WU01:FS00:Assigned to work server 171.67.108.59
17:09:11:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 171.67.108.59
17:09:11:WU01:FS00:Connecting to 171.67.108.59:8080
17:09:12:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x0000004d6652edcb4ee90039505df21a aborting download
17:09:12:WU01:FS00:Connecting to assign3.stanford.edu:8080
17:09:12:WU01:FS00:News: Welcome to Folding@Home
17:09:12:WU01:FS00:Assigned to work server 171.67.108.59
17:09:12:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 171.67.108.59
17:09:12:WU01:FS00:Connecting to 171.67.108.59:8080
17:09:13:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x0000004d6652edcb4ee90039505df21a aborting download
17:09:28:WU00:FS00:0xa4:Completed 247500 out of 250000 steps  (99%)
17:09:47:WU00:FS00:0xa4:Completed 250000 out of 250000 steps  (100%)
17:09:47:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
17:09:57:WU00:FS00:0xa4:
17:09:57:WU00:FS00:0xa4:Finished Work Unit:
17:09:57:WU00:FS00:0xa4:- Reading up to 769884 from "00/wudata_01.trr": Read 769884
17:09:57:WU00:FS00:0xa4:trr file hash check passed.
17:09:57:WU00:FS00:0xa4:- Reading up to 456868 from "00/wudata_01.xtc": Read 456868
17:09:57:WU00:FS00:0xa4:xtc file hash check passed.
17:09:57:WU00:FS00:0xa4:edr file hash check passed.
17:09:57:WU00:FS00:0xa4:logfile size: 22561
17:09:57:WU00:FS00:0xa4:Leaving Run
17:09:58:WU00:FS00:0xa4:- Writing 1254717 bytes of core data to disk...
17:09:58:WU00:FS00:0xa4:Done: 1254205 -> 1194511 (compressed to 95.2 percent)
17:09:58:WU00:FS00:0xa4:  ... Done.
17:10:12:WU01:FS00:Connecting to assign3.stanford.edu:8080
17:10:12:WU01:FS00:News: Welcome to Folding@Home
17:10:12:WU01:FS00:Assigned to work server 171.67.108.59
17:10:12:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 171.67.108.59
17:10:12:WU01:FS00:Connecting to 171.67.108.59:8080
17:10:13:ERROR:WU01:FS00:Exception: Have already seen this work unit 0x0000004d6652edcb4ee90039505df21a aborting download
17:10:42:WU00:FS00:0xa4:- Shutting down core
17:10:42:WU00:FS00:0xa4:
17:10:42:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
17:10:48:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
17:10:48:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:8004 run:69 clone:16 gen:54 core:0xa4 unit:0x0000004d6652edcb4ee90039505df21a
17:10:48:WU00:FS00:Uploading 1.14MiB to 171.67.108.59
17:10:48:WU00:FS00:Connecting to 171.67.108.59:8080
17:10:51:WU00:FS00:Upload complete
17:10:51:WU00:FS00:Server responded WORK_ACK (400)
17:10:51:WU00:FS00:Final credit estimate, 909.00 points
17:10:51:WU00:FS00:Cleaning up
17:11:49:WU01:FS00:Connecting to assign3.stanford.edu:8080
17:11:50:WU01:FS00:News: Welcome to Folding@Home
17:11:50:WU01:FS00:Assigned to work server 171.67.108.59
17:11:50:WU01:FS00:Requesting new work unit for slot 00: READY smp:12 from 171.67.108.59
17:11:50:WU01:FS00:Connecting to 171.67.108.59:8080
17:11:50:WU01:FS00:Downloading 532.48KiB
17:11:57:WU01:FS00:Download 48.08%
17:12:00:WU01:FS00:Download complete
17:12:00:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8004 run:66 clone:51 gen:18 core:0xa4 unit:0x000000146652edcb4eee56383fcdfed7
17:12:00:WU01:FS00:Starting
17:12:00:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 956 -checkpoint 30 -np 12
17:12:00:WU01:FS00:Started FahCore on PID 2140
17:12:00:Started thread 36 on PID 956
17:12:00:WU01:FS00:Core PID:2144
17:12:00:WU01:FS00:FahCore 0xa4 started
17:12:00:WU01:FS00:0xa4:
17:12:00:WU01:FS00:0xa4:*------------------------------*
17:12:00:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
17:12:00:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
17:12:00:WU01:FS00:0xa4:
17:12:00:WU01:FS00:0xa4:Preparing to commence simulation
17:12:00:WU01:FS00:0xa4:- Looking at optimizations...
17:12:00:WU01:FS00:0xa4:- Created dyn
17:12:00:WU01:FS00:0xa4:- Files status OK
17:12:00:WU01:FS00:0xa4:- Expanded 544749 -> 1305312 (decompressed 239.6 percent)
17:12:00:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=544749 data_size=1305312, decompressed_data_size=1305312 diff=0
17:12:00:WU01:FS00:0xa4:- Digital signature verified
17:12:00:WU01:FS00:0xa4:
17:12:00:WU01:FS00:0xa4:Project: 8004 (Run 66, Clone 51, Gen 18)
17:12:00:WU01:FS00:0xa4:
17:12:00:WU01:FS00:0xa4:Assembly optimizations on if available.
17:12:00:WU01:FS00:0xa4:Entering M.D.
17:12:06:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
17:12:25:WU01:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
17:12:45:WU01:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Downtime between WUs?

Post by Jesse_V »

Apologizes, yes I meant bonus points...
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
GreyWhiskers
Posts: 660
Joined: Mon Oct 25, 2010 5:57 am
Hardware configuration: a) Main unit
Sandybridge in HAF922 w/200 mm side fan
--i7 2600K@4.2 GHz
--ASUS P8P67 DeluxeB3
--4GB ADATA 1600 RAM
--750W Corsair PS
--2Seagate Hyb 750&500 GB--WD Caviar Black 1TB
--EVGA 660GTX-Ti FTW - Signature 2 GPU@ 1241 Boost
--MSI GTX560Ti @900MHz
--Win7Home64; FAH V7.3.2; 327.23 drivers

b) 2004 HP a475c desktop, 1 core Pent 4 HT@3.2 GHz; Mem 2GB;HDD 160 GB;Zotac GT430PCI@900 MHz
WinXP SP3-32 FAH v7.3.6 301.42 drivers - GPU slot only

c) 2005 Toshiba M45-S551 laptop w/2 GB mem, 160GB HDD;Pent M 740 CPU @ 1.73 GHz
WinXP SP3-32 FAH v7.3.6 [Receiving Core A4 work units]
d) 2011 lappy-15.6"-1920x1080;i7-2860QM,2.5;IC Diamond Thermal Compound;GTX 560M 1,536MB u/c@700;16GB-1333MHz RAM;HDD:500GBHyb w/ 4GB SSD;Win7HomePrem64;320.18 drivers FAH 7.4.2ß
Location: Saratoga, California USA

Re: Downtime between WUs?

Post by GreyWhiskers »

Here's a post from a while ago.

It helped me to understand what happens between the "next-percentage" time and when a WU is ready to fold, having completed the previous WU. It's important to know what kind of WUs you are folding (e.g., are you doing bigadv's with 25 MB or 50 MB downloads, or are you doing smaller WUs?), and what the nature of your internet connection is. FWIW, I am doing a mix of regular SMP WUs and Core 15 Fermi GPU WUs, none of which have very big downloads. BTW, it doesn't matter how big the uploads are - that happens in parallel with processing the next WU. I have a Comcast cable internet connection in Silicon Valley, which gives me maybe a couple of Mbps upload and 15 Mbps download, so the downloads occur very, very quickly.

Finally, with the latest SMP WUs being very small - on my i7 2600k, less than an hour per WU,and with short deadlines of 1.6 days - downloading too soon will be noticeable in the QRB points.

GreyWhiskers wrote:
13:38:43:WU01:FS02:0xa4:Completed 250000 out of 250000 steps (100%)
13:38:43:WU01:FS02:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
13:38:44:WU03:FS02:Connecting to assign3.stanford.edu:8080
13:38:44:WU03:FS02:News: Welcome to Folding@Home
13:38:44:WU03:FS02:Assigned to work server 171.67.108.58
13:38:44:WU03:FS02:Requesting new work unit for slot 02: RUNNING smp:6 from 171.67.108.58
13:38:44:WU03:FS02:Connecting to 171.67.108.58:8080
13:38:45:WU03:FS02:Downloading 531.87KiB
13:38:46:WU03:FS02:Download complete
13:38:46:WU03:FS02:Received Unit: id:03 state:DOWNLOAD error:OK project:8001 run:21 clone:53 gen:10 core:0xa4 unit:0x000000126652edca4eded91fac1d64a7
13:38:53:WU01:FS02:0xa4:
13:38:53:WU01:FS02:0xa4:Finished Work Unit:
Especially for SMP, which factors Quick return Bonus into the points calculation, I have chosen to set the "next unit percentage" to 100%. Here are some of the reasons:
- The QRB clock starts with your download and ends with the upload complete of your finished product.
- There is at least 10 seconds, usually much longer, between the 100% and the "Finished Work Unit". See the "Sleep" line. After the "sleep", this time is used to package up the various log files and completed data files created during execution of your WU into one compressed file to upload to the server, and clean up the WU.
- Your new WU will start only after the "Finished Work Unit" time.
- if you have a decently fast internet connection, you can use this time between 100% and Finished Work unit to download your next WU.
- I don't remember an instance when the actual download time exceeded this. I folded bigadv SMP WUs for quite some time with 25 MByte downloads - but I was running them on v6 where I couldn't take advantage of this overlap.
- Even if your download did take a few seconds longer than the 100%-to-"Finished Work Unit" time, these few seconds are undoubtedly MUCH shorter than the TPF frame time. If your TPF for a big WU is, say, 8 minutes, and you download at 99%, and if the download takes say 25 seconds, your QRB clock started at least 8 minutes earlier than it needed to - you can't start folding this new WU anyway until you reach the "Finished Work Unit" time anyway.

In the first example you included, the SMP download was only 48.90 KiB, which was downloaded in about a second. Your log example didn't show what your TPF for that WU was, but if it was several minutes, you were holding onto the downloaded WU for all that time, PLUS the 100% to "Finished Work Unit" time, while the QRB clock was ticking.

In your second example, which DID start at the download at 100%, you downloaded a much larger WU, 531.87 KiB. The download took about 2 seconds: from 13:38:44 (connecting to assign) to 13:38:46 (Download complete). Even though you had only 10 seconds between 100% and "Finished Work Unit" (the previous WU must have been very small), you still had 7 extra seconds before folding could start. Your TPF for the finished work unit was 64 seconds. If you HAD downloaded at 99%, you would have been sitting on the next WU for 64 + 10 - 2 = 72 seconds while the QRB clock ticked, [EDIT] vs the 7 seconds you did wait[/EDIT]. I've seen some short WUs where that 65 seconds could have made a difference in your points.

Anyway, that's my long-winded 2 cents.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Downtime between WUs?

Post by bruce »

csvanefalk wrote:EDIT: this has occurred for all WUs since I changed the setting, so I am guessing I am doing something wrong :-/

Alright, so I set it to 98%...and now I am having trouble with the client being assigned the same WU it is already working on?? Can I fix this?
I takes you 19 seconds to complete a frame for this project. You can change next-unit-percentage from 98% to 100% and it will improve your QRB by 38 seconds -- probably not something you'll even notice. Downloading actually took 11 seconds (from 17:11:49 to 17:12:00) so it normally won't have any trouble downloading between when the WU reaches 100% and when the FahCore is finished organizing the data for upload (61 seconds from 17:09:47 to 17:10:48).

The data uploaded in 3 seconds (from 17:10:48 to 17:10:51)

I don't see a lot of room for improvement. I suppose if you replaced your HD with an SSD you could save a reasonable percentage of the 61 seconds.

Some servers are programmed to reissue the same WU if the same Client ID requests a new WU before it finishes with a WU for the same slot. It's much better for the client to reject it with the "already seen" message than to accept it and process it a second time for zero credit.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

Thanks for all the replies so far! You guys have really helped me understand the inter-WU processing period a lot better.

I noticed one issue I am wondering about though. Below you can find my logfile as I finish folding a 7905 WU...notice the whopping 6 minute gap between ... Done and Shutting down core... I was actively monitoring CPU and memory usage (with top) during this period, and the system was more or less completely idle. There were occasional (5-6) spikes of acitvity that involved the Client, but these were extremely brief and used no more than a few percentage points of the total CPU capacity (9%).

Can someone help me understand what the Client is doing during this interval? What is it waiting for?

Code: Select all

08:30:30:WU00:FS00:0xa4:Completed 495000 out of 500000 steps  (99%)
08:30:31:WU01:FS00:Connecting to assign3.stanford.edu:8080
08:30:32:WU01:FS00:News: Welcome to Folding@Home
08:30:32:WU01:FS00:Assigned to work server 128.113.12.163
08:30:32:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 128.113.12.163
08:30:32:WU01:FS00:Connecting to 128.113.12.163:8080
08:30:33:WU01:FS00:Downloading 1.85MiB
08:30:39:WU01:FS00:Download 37.24%
08:30:45:WU01:FS00:Download 67.70%
08:30:48:WU01:FS00:Download complete
08:30:48:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:7905 run:89 clone:3 gen:7 core:0xa4 unit:0x0000000b00ac9c234e4d8516efac4ed5
[christopher@chrisdesktop ~]$ 08:32:09:WU00:FS00:0xa4:Completed 500000 out of 500000 steps  (100%)
08:32:09:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
08:32:19:WU00:FS00:0xa4:
08:32:19:WU00:FS00:0xa4:Finished Work Unit:
08:32:20:WU00:FS00:0xa4:- Reading up to 24721224 from "00/wudata_01.trr": Read 24721224
08:32:20:WU00:FS00:0xa4:trr file hash check passed.
08:32:20:WU00:FS00:0xa4:edr file hash check passed.
08:32:20:WU00:FS00:0xa4:logfile size: 25640
08:32:20:WU00:FS00:0xa4:Leaving Run
08:32:21:WU00:FS00:0xa4:- Writing 24754220 bytes of core data to disk...
08:32:24:WU00:FS00:0xa4:Done: 24753708 -> 19664894 (compressed to 79.4 percent)
08:32:24:WU00:FS00:0xa4:  ... Done.
08:39:06:WU00:FS00:0xa4:- Shutting down core
08:39:06:WU00:FS00:0xa4:
08:39:06:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
08:39:29:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:39:29:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7905 run:110 clone:0 gen:12 core:0xa4 unit:0x0000000d00ac9c234e4d8589ed06c89f
08:39:29:WU00:FS00:Uploading 18.75MiB to 128.113.12.163
08:39:29:WU00:FS00:Connecting to 128.113.12.163:8080
08:39:29:WU01:FS00:Starting
08:39:29:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 1129 -checkpoint 30 -np 12
08:39:29:WU01:FS00:Started FahCore on PID 11400
08:39:29:Started thread 16 on PID 1129
08:39:29:WU01:FS00:Core PID:11404
08:39:29:WU01:FS00:FahCore 0xa4 started
08:39:30:WU01:FS00:0xa4:
08:39:30:WU01:FS00:0xa4:*------------------------------*
08:39:30:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
08:39:30:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
08:39:30:WU01:FS00:0xa4:
08:39:30:WU01:FS00:0xa4:Preparing to commence simulation
08:39:30:WU01:FS00:0xa4:- Looking at optimizations...
08:39:30:WU01:FS00:0xa4:- Created dyn
08:39:30:WU01:FS00:0xa4:- Files status OK
08:39:30:WU01:FS00:0xa4:- Expanded 1935474 -> 2552108 (decompressed 131.8 percent)
08:39:30:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1935474 data_size=2552108, decompressed_data_size=2552108 diff=0
08:39:30:WU01:FS00:0xa4:- Digital signature verified
08:39:30:WU01:FS00:0xa4:
08:39:30:WU01:FS00:0xa4:Project: 7905 (Run 89, Clone 3, Gen 7)
08:39:30:WU01:FS00:0xa4:
08:39:30:WU01:FS00:0xa4:Assembly optimizations on if available.
08:39:30:WU01:FS00:0xa4:Entering M.D.
08:39:35:WU00:FS00:Upload 29.66%
08:39:36:WU01:FS00:0xa4:Completed 0 out of 500000 steps  (0%)
08:39:41:WU00:FS00:Upload 50.99%
08:39:47:WU00:FS00:Upload 67.32%
08:39:53:WU00:FS00:Upload 95.31%
08:39:57:WU00:FS00:Upload complete
08:39:57:WU00:FS00:Server responded WORK_ACK (400)
08:39:57:WU00:FS00:Final credit estimate, 4633.00 points
08:39:58:WU00:FS00:Cleaning up
Nathan_P
Posts: 1165
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Downtime between WUs?

Post by Nathan_P »

There is a known problem if you are using the ext4 file system on a linux machine. For reasons that i don't know the ext 4 filing system takes a very long time to write out the WU files to a standard HDD at the end of processing. This does not happen on an SSD with ext 4 or on ext3 filing systems
Image
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

Nathan_P wrote:There is a known problem if you are using the ext4 file system on a linux machine. For reasons that i don't know the ext 4 filing system takes a very long time to write out the WU files to a standard HDD at the end of processing. This does not happen on an SSD with ext 4 or on ext3 filing systems
That would fit my situation exactly - I am running ext4 on a standard HDD. I am going to google this a bit and see if I can find a solution.

EDIT: it seems running the client on a RAM disk should resolve this...I am going to try that setup and see what happens.
Post Reply