Downtime between WUs?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Downtime between WUs?

Post by 7im »

Was a ram disk a solution you found with Google?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

7im wrote:Was a ram disk a solution you found with Google?
In passing mostly. It was reasonable to try it first since it was a very simple solution. Will post here later on how it went.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

EDIT: Not wanting to sound extreme, but is there any way to get rid of the 10 second sleep period that follows a finished folding?

Using a RAM disk indeed solves the issue and leads to streamlined performance...I would guess it shortened the overall transition period as well due to reduced I/O latency.

Code: Select all

15:21:24:WU00:FS00:0xa4:Completed 490000 out of 500000 steps  (98%)
15:23:07:WU00:FS00:0xa4:Completed 495000 out of 500000 steps  (99%)
15:23:08:WU01:FS00:Connecting to assign3.stanford.edu:8080
15:23:10:WU01:FS00:News: Welcome to Folding@Home
15:23:10:WU01:FS00:Assigned to work server 171.67.108.58
15:23:10:WU01:FS00:Requesting new work unit for slot 00: RUNNING smp:12 from 171.67.108.58
15:23:10:WU01:FS00:Connecting to 171.67.108.58:8080
15:23:11:WU01:FS00:Downloading 478.52KiB
15:23:13:WU01:FS00:Download complete
15:23:13:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:OK project:8047 run:169 clone:12 gen:11 core:0xa4 unit:0x0000000f6652edca4fda1bedd712161d
15:23:13:WU01:FS00:Downloading project 8047 description
15:23:13:WU01:FS00:Connecting to fah-web.stanford.edu:80
15:23:14:WU01:FS00:Project 8047 description downloaded successfully
15:24:50:WU00:FS00:0xa4:Completed 500000 out of 500000 steps  (100%)
15:24:51:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
15:25:01:WU00:FS00:0xa4:
15:25:01:WU00:FS00:0xa4:Finished Work Unit:
15:25:01:WU00:FS00:0xa4:- Reading up to 24721224 from "00/wudata_01.trr": Read 24721224
15:25:01:WU00:FS00:0xa4:trr file hash check passed.
15:25:01:WU00:FS00:0xa4:edr file hash check passed.
15:25:01:WU00:FS00:0xa4:logfile size: 25645
15:25:01:WU00:FS00:0xa4:Leaving Run
15:25:03:WU00:FS00:0xa4:- Writing 24754225 bytes of core data to disk...
15:25:06:WU00:FS00:0xa4:Done: 24753713 -> 19667030 (compressed to 79.4 percent)
15:25:06:WU00:FS00:0xa4:  ... Done.
15:25:06:WU00:FS00:0xa4:- Shutting down core
15:25:06:WU00:FS00:0xa4:
15:25:06:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
15:25:06:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
15:25:06:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7905 run:74 clone:0 gen:15 core:0xa4 unit:0x0000001100ac9c234e4d84c0ed2a54c3
15:25:06:WU00:FS00:Uploading 18.76MiB to 128.113.12.163
15:25:06:WU00:FS00:Connecting to 128.113.12.163:8080
15:25:06:WU01:FS00:Starting
15:25:06:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/ramdisk/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 2405 -checkpoint 30 -np 12
15:25:06:WU01:FS00:Started FahCore on PID 7801
15:25:06:Started thread 9 on PID 2405
15:25:06:WU01:FS00:Core PID:7805
15:25:06:WU01:FS00:FahCore 0xa4 started
15:25:07:WU01:FS00:0xa4:
15:25:07:WU01:FS00:0xa4:*------------------------------*
15:25:07:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
15:25:07:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
15:25:07:WU01:FS00:0xa4:
15:25:07:WU01:FS00:0xa4:Preparing to commence simulation
15:25:07:WU01:FS00:0xa4:- Looking at optimizations...
15:25:07:WU01:FS00:0xa4:- Created dyn
15:25:07:WU01:FS00:0xa4:- Files status OK
15:25:07:WU01:FS00:0xa4:- Expanded 489490 -> 1142300 (decompressed 233.3 percent)
15:25:07:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=489490 data_size=1142300, decompressed_data_size=1142300 diff=0
15:25:07:WU01:FS00:0xa4:- Digital signature verified
15:25:07:WU01:FS00:0xa4:
15:25:07:WU01:FS00:0xa4:Project: 8047 (Run 169, Clone 12, Gen 11)
15:25:07:WU01:FS00:0xa4:
15:25:07:WU01:FS00:0xa4:Assembly optimizations on if available.
15:25:07:WU01:FS00:0xa4:Entering M.D.
15:25:12:WU00:FS00:Upload 28.99%
15:25:13:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
15:25:18:WU00:FS00:Upload 72.97%
15:25:25:WU00:FS00:Upload complete
15:25:25:WU00:FS00:Server responded WORK_ACK (400)
15:25:25:WU00:FS00:Final credit estimate, 4611.00 points
15:25:25:WU00:FS00:Cleaning up
15:25:27:WU01:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
15:25:42:WU01:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Downtime between WUs?

Post by 7im »

You'll fold faster setting the checkpoint to 30 minutes than worrying about those 10 seconds you can't change.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

If I instead set the checkpoint interval to 0 in the config file (FAHControl only lets me go to 3 seconds), will this turn it off altogether?
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Downtime between WUs?

Post by Jesse_V »

7im wrote:You'll fold faster setting the checkpoint to 30 minutes than worrying about those 10 seconds you can't change.
Make sense, but of course the downside is that you may have to repeat 29 minutes of work if you resume from a checkpoint.

I have also noticed the 10 second delay. Simply out of curiosity, why is it there? What purpose does it serve?
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Downtime between WUs?

Post by 7im »

Jesse_V wrote:
7im wrote:You'll fold faster setting the checkpoint to 30 minutes than worrying about those 10 seconds you can't change.
Make sense, but of course the downside is that you may have to repeat 29 minutes of work if you resume from a checkpoint.

I have also noticed the 10 second delay. Simply out of curiosity, why is it there? What purpose does it serve?

The downside is obvious. But anyone worrying about 10 seconds between work units obviously will NEVER turn the client or computer off! Otherwise even a single restart would negate many weeks of folding while trying to save those 10 seconds between work units, and so chasing those 10 seconds would be like pursuing an undomesticated water foul.

And again, the 10 seconds won't be changed, so csvanefalk will find it much more productive to find ways of never turning off the client or computer. And even if the 10 seconds could be changed, csvanefalk is still better off finding ways of never turning off the client or computer, like investing in a UPS. ;)
Last edited by 7im on Sun Jun 24, 2012 4:40 pm, edited 2 times in total.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Joe_H
Site Admin
Posts: 8002
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: Downtime between WUs?

Post by Joe_H »

Jesse_V wrote:I have also noticed the 10 second delay. Simply out of curiosity, why is it there? What purpose does it serve?
From prior version log messages, my assumption is that the delay is to allow all threads to complete and exit. It was more visible with older SMP cores that used MPI for inter-thread communication that ran in separate processes.
Image
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Downtime between WUs?

Post by 7im »

Not all fahcore types and/or their threads shutdown at the same rate. It's a shutdown/startup buffer between WUs, as Joe_H describes. There should be a ticket on this, if anyone was curious.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
csvanefalk
Posts: 147
Joined: Mon May 21, 2012 10:28 am

Re: Downtime between WUs?

Post by csvanefalk »

A man's gotta optimize what a man's gotta optimize.
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: Downtime between WUs?

Post by Jesse_V »

csvanefalk wrote:A man's gotta optimize what a man's gotta optimize.
If you're willing to make the effort, by all means please do, especially for F@h. :D
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Downtime between WUs?

Post by bruce »

The programmer put that 10 second pause in there for a reason. I don't know what it was, but I have a pretty good guess that somebody discovered that without out it, there was a reasonable high probability that something could happen that would corrupt the WU as it was being written. Would you rather have an occasional WU be corrupted after reaching 100% and be discarded or would you rather wait 10 seconds after WUs which are not corrupt?
Post Reply