I apparently have two machines stuck in this state where everything seems OK, but the upload will not proceed. I stopped one, restarted it, and watched it upload ~100MB of data and now it is just stopped. The connection that the FAH client is making is:
TCP 192.168.1.21:63399 130.237.232.141:8080 ESTABLISHED [FAH-634.exe]
Here are the logs.
Machine 1:
Code: Select all
[06:43:04] - Preparing to get new work unit...
[06:43:04] Cleaning up work directory
[06:43:09] + Attempting to get work packet
[06:43:09] Passkey found
[06:43:09] - Connecting to assignment server
[06:43:09] - Successful: assigned to (130.237.232.141).
[06:43:09] + News From Folding@Home: Welcome to Folding@Home
[06:43:09] Loaded queue successfully.
[06:46:40] + Closed connections
[06:46:40]
[06:46:40] + Processing work unit
[06:46:40] Core required: FahCore_a5.exe
[06:46:40] Core found.
[06:46:40] Working on queue slot 04 [September 2 06:46:40 UTC]
[06:46:40] + Working ...
[06:46:40]
[06:46:40] *------------------------------*
[06:46:40] Folding@Home Gromacs SMP Core
[06:46:40] Version 2.27 (Mar 12, 2010)
[06:46:40]
[06:46:40] Preparing to commence simulation
[06:46:40] - Looking at optimizations...
[06:46:40] - Created dyn
[06:46:40] - Files status OK
[06:46:45] - Expanded 24862433 -> 30796292 (decompressed 123.8 percent)
[06:46:45] Called DecompressByteArray: compressed_data_size=24862433 data_size=30796292, decompressed_data_size=30796292 diff=0
[06:46:45] - Digital signature verified
[06:46:45]
[06:46:45] Project: 6900 (Run 21, Clone 12, Gen 32)
[06:46:45]
[06:46:45] Assembly optimizations on if available.
[06:46:45] Entering M.D.
[06:46:51] Mapping NT from 12 to 12
[06:46:54] Completed 0 out of 250000 steps (0%)
[07:11:06] Completed 2500 out of 250000 steps (1%)
[07:35:31] Completed 5000 out of 250000 steps (2%)
[08:01:10] Completed 7500 out of 250000 steps (3%)
[08:24:50] Completed 10000 out of 250000 steps (4%)
...
[21:52:21] Completed 240000 out of 250000 steps (96%)
[22:15:56] Completed 242500 out of 250000 steps (97%)
[22:39:32] Completed 245000 out of 250000 steps (98%)
[23:03:10] Completed 247500 out of 250000 steps (99%)
[23:26:46] Completed 250000 out of 250000 steps (100%)
[23:26:59] DynamicWrapper: Finished Work Unit: sleep=10000
[23:27:09]
[23:27:09] Finished Work Unit:
[23:27:09] - Reading up to 52713120 from "work/wudata_04.trr": Read 52713120
[23:27:11] trr file hash check passed.
[23:27:11] - Reading up to 46989724 from "work/wudata_04.xtc": Read 46989724
[23:27:12] xtc file hash check passed.
[23:27:12] edr file hash check passed.
[23:27:12] logfile size: 204797
[23:27:12] Leaving Run
[23:27:12] - Writing 100075581 bytes of core data to disk...
[23:27:14] ... Done.
[23:27:55] - Shutting down core
[23:27:55]
[23:27:55] Folding@home Core Shutdown: FINISHED_UNIT
[23:28:05] CoreStatus = 64 (100)
[23:28:05] Sending work to server
[23:28:05] Project: 6900 (Run 21, Clone 12, Gen 32)
[23:28:05] + Attempting to send results [September 3 23:28:05 UTC]
Folding@Home Client Shutdown at user request.
Folding@Home Client Shutdown.
--- Opening Log file [September 4 15:16:46 UTC]
# Windows CPU Console Edition #################################################
###############################################################################
Folding@Home Client Version 6.34
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: c:\FAH\SMP
Executable: fah-634
Arguments: -send all
[15:16:46] - Ask before connecting: No
[15:16:46] - User name: DrSpalding (Team 48083)
[15:16:46] - User ID: 60DDBAE11FB3894D
[15:16:46] - Machine ID: 2
[15:16:46]
[15:16:46] Loaded queue successfully.
[15:16:46] Attempting to return result(s) to server...
[15:16:46] Project: 6900 (Run 21, Clone 12, Gen 32)
[15:16:46] - Read packet limit of 540015616... Set to 524286976.
[15:16:46] + Attempting to send results [September 4 15:16:46 UTC]
Machine 2:
Code: Select all
[17:25:14] - Preparing to get new work unit...
[17:25:14] Cleaning up work directory
[17:25:14] + Attempting to get work packet
[17:25:14] Passkey found
[17:25:14] - Connecting to assignment server
[17:25:14] - Successful: assigned to (130.237.232.141).
[17:25:14] + News From Folding@Home: Welcome to Folding@Home
[17:25:14] Loaded queue successfully.
[17:26:20] + Closed connections
[17:26:20]
[17:26:20] + Processing work unit
[17:26:20] Core required: FahCore_a5.exe
[17:26:20] Core found.
[17:26:20] Working on queue slot 07 [August 31 17:26:20 UTC]
[17:26:20] + Working ...
[17:26:20]
[17:26:20] *------------------------------*
[17:26:20] Folding@Home Gromacs SMP Core
[17:26:20] Version 2.27 (Mar 12, 2010)
[17:26:20]
[17:26:20] Preparing to commence simulation
[17:26:20] - Looking at optimizations...
[17:26:20] - Created dyn
[17:26:20] - Files status OK
[17:26:26] - Expanded 24865338 -> 30796292 (decompressed 123.8 percent)
[17:26:26] Called DecompressByteArray: compressed_data_size=24865338 data_size=30796292, decompressed_data_size=30796292 diff=0
[17:26:26] - Digital signature verified
[17:26:26]
[17:26:26] Project: 6900 (Run 4, Clone 11, Gen 35)
[17:26:26]
[17:26:26] Assembly optimizations on if available.
[17:26:26] Entering M.D.
[17:26:33] Mapping NT from 8 to 8
[17:26:56] Completed 0 out of 250000 steps (0%)
[18:14:03] Completed 2500 out of 250000 steps (1%)
[19:00:59] Completed 5000 out of 250000 steps (2%)
[19:47:53] Completed 7500 out of 250000 steps (3%)
[20:34:45] Completed 10000 out of 250000 steps (4%)
...
[20:42:41] Completed 240000 out of 250000 steps (96%)
[21:29:38] Completed 242500 out of 250000 steps (97%)
[22:16:33] Completed 245000 out of 250000 steps (98%)
[23:03:31] Completed 247500 out of 250000 steps (99%)
[23:50:29] Completed 250000 out of 250000 steps (100%)
[23:50:46] DynamicWrapper: Finished Work Unit: sleep=10000
[23:50:56]
[23:50:56] Finished Work Unit:
[23:50:56] - Reading up to 52713120 from "work/wudata_07.trr": Read 52713120
[23:50:57] trr file hash check passed.
[23:50:57] - Reading up to 46992300 from "work/wudata_07.xtc": Read 46992300
[23:50:57] xtc file hash check passed.
[23:50:57] edr file hash check passed.
[23:50:57] logfile size: 208764
[23:50:57] Leaving Run
[23:50:58] - Writing 100082124 bytes of core data to disk...
[23:51:04] ... Done.
[23:51:52] - Shutting down core
[23:51:52]
[23:51:52] Folding@home Core Shutdown: FINISHED_UNIT
[23:52:03] CoreStatus = 64 (100)
[23:52:03] Sending work to server
[23:52:03] Project: 6900 (Run 4, Clone 11, Gen 35)
[23:52:03] + Attempting to send results [September 3 23:52:03 UTC]
These machines were sitting at that point from around 16:30 PDT on 3 Sept 2011, and haven't budged since except Machine 1 that I stopped the client, restarted, and watched it upload ~100MB of data, and now it has been sitting for about 45 minutes after the network traffic stopped.
An oddly coincidental happening is that all three of my -bigadv machines started uploading about the same time (16:00 PDT yesterday) but the third one worked fine with a P2692 WU that came from 171.67.108.22. It was then assigned to 130.237.232.141 for its next WU and couldn't connect.
Machine 3:
Code: Select all
[23:23:20] Folding@home Core Shutdown: FINISHED_UNIT
[23:23:25] CoreStatus = 64 (100)
[23:23:25] Sending work to server
[23:23:25] Project: 2692 (Run 1, Clone 21, Gen 148)
[23:23:25] + Attempting to send results [September 3 23:23:25 UTC]
[23:30:21] + Results successfully sent
[23:30:21] Thank you for your contribution to Folding@Home.
[23:30:21] + Number of Units Completed: 71
[23:30:26] - Preparing to get new work unit...
[23:30:26] Cleaning up work directory
[23:30:28] + Attempting to get work packet
[23:30:28] Passkey found
[23:30:28] - Connecting to assignment server
[23:30:29] - Successful: assigned to (130.237.232.141).
[23:30:29] + News From Folding@Home: Welcome to Folding@Home
[23:30:30] Loaded queue successfully.
[00:12:30] + Could not connect to Work Server
[00:12:30] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[00:12:46] + Attempting to get work packet
[00:12:46] Passkey found
[00:12:46] - Connecting to assignment server
[00:12:47] - Successful: assigned to (130.237.232.141).
[00:12:47] + News From Folding@Home: Welcome to Folding@Home
[00:12:47] Loaded queue successfully.
[00:37:05] + Could not connect to Work Server
[00:37:05] - Attempt #2 to get work failed, and no other work to do.
Waiting before retry.
[00:37:19] + Attempting to get work packet
[00:37:19] Passkey found
[00:37:19] - Connecting to assignment server
[00:37:20] - Successful: assigned to (171.67.108.22).
[00:37:20] + News From Folding@Home: Welcome to Folding@Home
[00:37:20] Loaded queue successfully.
[00:38:01] + Closed connections
[00:38:01]
[00:38:01] + Processing work unit
[00:38:01] Core required: FahCore_a5.exe
[00:38:01] Core found.
[00:38:01] Working on queue slot 08 [September 4 00:38:01 UTC]
[00:38:01] + Working ...
[00:38:01]
[00:38:01] *------------------------------*
[00:38:01] Folding@Home Gromacs SMP Core
[00:38:01] Version 2.27 (Mar 12, 2010)
[00:38:01]
[00:38:01] Preparing to commence simulation
[00:38:01] - Looking at optimizations...
[00:38:01] - Created dyn
[00:38:01] - Files status OK
[00:38:07] - Expanded 26694550 -> 33054789 (decompressed 123.8 percent)
[00:38:07] Called DecompressByteArray: compressed_data_size=26694550 data_size=33054789, decompressed_data_size=33054789 diff=0
[00:38:07] - Digital signature verified
[00:38:07]
[00:38:07] Project: 2685 (Run 7, Clone 17, Gen 82)
[00:38:07]
[00:38:07] Assembly optimizations on if available.
[00:38:07] Entering M.D.
[00:38:13] Mapping NT from 8 to 8
[00:38:16] Completed 0 out of 250000 steps (0%)
[01:16:00] Completed 2500 out of 250000 steps (1%)
[01:53:47] Completed 5000 out of 250000 steps (2%)
[02:31:33] Completed 7500 out of 250000 steps (3%)
[03:09:33] Completed 10000 out of 250000 steps (4%)
I hope that two machines behind a NAT and simultaneously uploading completed 100MB data sets to the same server for the same project didn't break it somehow.
We could use some help here to get these WU's uploaded successfully and the clients to proceed past the upload phase.