Same sort of issue here. Here's the log from finish to me manually shutting down and restarting. Note that this machine has only been running two weeks and has never encountered an a3 before.
Code: Select all
[23:53:21] Completed 250000 out of 250000 steps (100%)
Writing final coordinates.
Average load imbalance: 9.2 %
Part of the total run time spent waiting due to load imbalance: 3.9 %
Parallel run - timing based on wallclock.
NODE (s) Real (s) (%)
Time: 153720.126 153720.126 100.0
1d18h42:00
(Mnbf/s) (GFlops) (ns/day) (hour/ns)
Performance: 1411.795 74.260 0.559 42.951
Thanx for Using GROMACS - Have a Nice Day
[23:53:42] DynamicWrapper: Finished Work Unit: sleep=10000
[23:53:52]
[23:53:52] Finished Work Unit:
[23:53:52] - Reading up to 121622496 from "work/wudata_05.trr": Read 121622496
[23:53:53] trr file hash check passed.
[23:53:53] - Reading up to 108720676 from "work/wudata_05.xtc": Read 108720676
[23:53:54] xtc file hash check passed.
[23:53:54] edr file hash check passed.
[23:53:54] logfile size: 208162
[23:53:54] Leaving Run
[23:53:57] - Writing 230724326 bytes of core data to disk...
[23:55:01] Done: 230723814 -> 222366262 (compressed to 3.3 percent)
[23:55:02] ... Done.
[23:55:22] - Shutting down core
[23:55:22]
[23:55:22] Folding@home Core Shutdown: FINISHED_UNIT
[23:55:24] CoreStatus = 64 (100)
[23:55:24] Unit 5 finished with 85 percent of time to deadline remaining.
[23:55:24] Updated performance fraction: 0.779874
[23:55:24] Sending work to server
[23:55:24] Project: 6903 (Run 10, Clone 12, Gen 72)
[23:55:24] + Attempting to send results [January 16 23:55:24 UTC]
[23:55:24] - Reading file work/wuresults_05.dat from core
[23:55:24] (Read 222366774 bytes from disk)
[23:55:24] Connecting to http://130.237.232.237:8080/
[00:26:31] - Couldn't send HTTP request to server
[00:26:31] + Could not connect to Work Server (results)
[00:26:31] (130.237.232.237:8080)
[00:26:31] + Retrying using alternative port
[00:26:31] Connecting to http://130.237.232.237:80/
[00:26:31] - Couldn't send HTTP request to server
[00:26:31] + Could not connect to Work Server (results)
[00:26:31] (130.237.232.237:80)
[00:26:31] - Error: Could not transmit unit 05 (completed January 16) to work server.
[00:26:31] - 1 failed uploads of this unit.
[00:26:31] Keeping unit 05 in queue.
[00:26:31] Trying to send all finished work units
[00:26:31] Project: 6903 (Run 10, Clone 12, Gen 72)
[00:26:31] + Attempting to send results [January 17 00:26:31 UTC]
[00:26:31] - Reading file work/wuresults_05.dat from core
[00:26:31] (Read 222366774 bytes from disk)
[00:26:31] Connecting to http://130.237.232.237:8080/
[00:26:32] - Couldn't send HTTP request to server
[00:26:32] + Could not connect to Work Server (results)
[00:26:32] (130.237.232.237:8080)
[00:26:32] + Retrying using alternative port
[00:26:32] Connecting to http://130.237.232.237:80/
[00:26:32] - Couldn't send HTTP request to server
[00:26:32] + Could not connect to Work Server (results)
[00:26:32] (130.237.232.237:80)
[00:26:32] - Error: Could not transmit unit 05 (completed January 16) to work server.
[00:26:32] - 2 failed uploads of this unit.
[00:26:32] + Attempting to send results [January 17 00:26:32 UTC]
[00:26:32] - Reading file work/wuresults_05.dat from core
[00:26:32] (Read 222366774 bytes from disk)
[00:26:32] Connecting to http://130.237.165.141:8080/
[00:26:32] - Couldn't send HTTP request to server
[00:26:32] + Could not connect to Work Server (results)
[00:26:32] (130.237.165.141:8080)
[00:26:32] + Retrying using alternative port
[00:26:32] Connecting to http://130.237.165.141:80/
[00:42:21] - Couldn't send HTTP request to server
[00:42:21] + Could not connect to Work Server (results)
[00:42:21] (130.237.165.141:80)
[00:42:21] Could not transmit unit 05 to Collection server; keeping in queue.
[00:42:21] + Sent 0 of 1 completed units to the server
[00:42:21] - Preparing to get new work unit...
[00:42:21] Cleaning up work directory
[00:42:21] + Attempting to get work packet
[00:42:21] Passkey found
[00:42:21] - Will indicate memory of 16075 MB
[00:42:21] - Connecting to assignment server
[00:42:21] Connecting to http://assign.stanford.edu:8080/
[00:42:22] Posted data.
[00:42:22] Initial: 8F80; - Successful: assigned to (128.143.199.96).
[00:42:22] + News From Folding@Home: Welcome to Folding@Home
[00:42:22] Loaded queue successfully.
[00:42:22] Sent data
[00:42:22] Connecting to http://128.143.199.96:8080/
[00:42:23] Posted data.
[00:42:23] Initial: 0000; - Receiving payload (expected size: 1767764)
[00:42:25] - Downloaded at ~863 kB/s
[00:42:25] - Averaged speed for that direction ~4971 kB/s
[00:42:25] + Received work.
[00:42:25] Trying to send all finished work units
[00:42:25] Project: 6903 (Run 10, Clone 12, Gen 72)
[00:42:25] + Attempting to send results [January 17 00:42:25 UTC]
[00:42:25] - Reading file work/wuresults_05.dat from core
[00:42:25] (Read 222366774 bytes from disk)
[00:42:25] Connecting to http://130.237.232.237:8080/
[03:36:07] - Autosending finished units... [January 17 03:36:07 UTC]
[03:36:07] Trying to send all finished work units
[03:36:07] - Already sending work
[03:36:07] + Sent 0 of 1 completed units to the server
[03:36:07] - Autosend completed
^C[05:52:45] ***** Got an Activate signal (2)
[05:52:46] Killing all core threads
Folding@Home Client Shutdown.
rick@Server5:~/fah$ ./fah6
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
24 cores detected
--- Opening Log file [January 17 05:52:59 UTC]
# Linux SMP Console Edition ###################################################
###############################################################################
Folding@Home Client Version 6.34
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/rick/fah
Executable: ./fah6
Arguments: -verbosity 9 -smp -bigadv
[05:52:59] - Ask before connecting: No
[05:52:59] - User name: rhavern (Team 33)
[05:52:59] - User ID: 7B5589202D84D214
[05:52:59] - Machine ID: 1
[05:52:59]
[05:52:59] Loaded queue successfully.
[05:52:59]
[05:52:59] + Processing work unit
[05:52:59] Core required: FahCore_a3.exe
[05:52:59] - Autosending finished units... [05:52:59]
[05:52:59] Core not found.
[05:52:59] Trying to send all finished work units
[05:52:59] - Core is not present or corrupted.
[05:52:59] Project: 6903 (Run 10, Clone 12, Gen 72)
[05:52:59] - Attempting to download new core...
[05:52:59] + Attempting to send results [January 17 05:52:59 UTC]
[05:52:59] + Downloading new core: FahCore_a3.exe
[05:52:59] - Reading file work/wuresults_05.dat from core
[05:52:59] Downloading core (/~pande/Linux/AMD64/Core_a3.fah from www.stanford.edu)
[05:52:59] (Read 222366774 bytes from disk)
[05:52:59] Connecting to http://130.237.232.237:8080/
[05:53:12] Initial: AFDE; + 10240 bytes downloaded
<snip>
[05:53:13] Initial: 9274; + 2683199 bytes downloaded
[05:53:13] Verifying core Core_a3.fah...
[05:53:13] Signature is VALID
[05:53:13]
[05:53:13] Trying to unzip core FahCore_a3.exe
[05:53:13] Decompressed FahCore_a3.exe (6272504 bytes) successfully
[05:53:13] + Core successfully engaged
[05:53:18]
[05:53:18] + Processing work unit
[05:53:18] Core required: FahCore_a3.exe
[05:53:18] Core found.
[05:53:18] Working on queue slot 06 [January 17 05:53:18 UTC]
[05:53:18] + Working ...
[05:53:18] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 24 -checkpoint 15 -verbose -lifeline 6888 -version 634'
[05:53:19]
[05:53:19] *------------------------------*
[05:53:19] Folding@Home Gromacs SMP Core
[05:53:19] Version 2.27 (Dec. 15, 2010)
[05:53:19]
[05:53:19] Preparing to commence simulation
[05:53:19] - Looking at optimizations...
[05:53:19] - Created dyn
[05:53:19] - Files status OK
[05:53:19] - Expanded 1767252 -> 1951112 (decompressed 110.4 percent)
[05:53:19] Called DecompressByteArray: compressed_data_size=1767252 data_size=1951112, decompressed_data_size=1951112 diff=0
[05:53:19] - Digital signature verified
[05:53:19]
[05:53:19] Project: 6941 (Run 0, Clone 83, Gen 452)
[05:53:19]
[05:53:19] Assembly optimizations on if available.
[05:53:19] Entering M.D.
:-) G R O M A C S (-:
Groningen Machine for Chemical Simulation
:-) VERSION 4.5.3 (-:
Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra,
Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff,
Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
Michael Shirts, Alfons Sijbers, Peter Tieleman,
Berk Hess, David van der Spoel, and Erik Lindahl.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2010, The GROMACS development team at
Uppsala University & The Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
:-) Gromacs (-:
Reading file work/wudata_06.tpr, VERSION 4.5.3-dev-20101113-8af87 (single precision)
Starting 24 threads
[05:53:25] Mapping NT from 24 to 24
Making 2D domain decomposition 6 x 4 x 1
starting mdrun 'Mutant_scan'
226500016 steps, 453000.0 ps (continuing from step 226000016, 452000.0 ps).
[05:53:26] Completed 0 out of 500000 steps (0%)