Page 3 of 7

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 11:48 am
by Torell
same issue here

Code: Select all

[11:44:30] + Attempting to send results [September 4 11:44:30 UTC]
[11:44:30] - Reading file work/wuresults_01.dat from core
[11:44:30]   (Read 100197061 bytes from disk)
[11:44:30] Connecting to http://130.237.165.141:8080/
[11:44:31] - Couldn't send HTTP request to server
[11:44:31] + Could not connect to Work Server (results)
[11:44:31]     (130.237.165.141:8080)
[11:44:31] + Retrying using alternative port
[11:44:31] Connecting to http://130.237.165.141:80/
[11:44:31] - Couldn't send HTTP request to server
[11:44:31] + Could not connect to Work Server (results)
[11:44:31]     (130.237.165.141:80)
[11:44:31]   Could not transmit unit 01 to Collection server; keeping in queue.
[11:44:31] + Sent 0 of 1 completed units to the server
[11:44:31] - Autosend completed

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 3:04 pm
by the animal
I finished a 6900 about 6 hours ago, still not uploaded.

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 4:12 pm
by DrSpalding
I apparently have two machines stuck in this state where everything seems OK, but the upload will not proceed. I stopped one, restarted it, and watched it upload ~100MB of data and now it is just stopped. The connection that the FAH client is making is:

TCP 192.168.1.21:63399 130.237.232.141:8080 ESTABLISHED [FAH-634.exe]

Here are the logs.

Machine 1:

Code: Select all

[06:43:04] - Preparing to get new work unit...
[06:43:04] Cleaning up work directory
[06:43:09] + Attempting to get work packet
[06:43:09] Passkey found
[06:43:09] - Connecting to assignment server
[06:43:09] - Successful: assigned to (130.237.232.141).
[06:43:09] + News From Folding@Home: Welcome to Folding@Home
[06:43:09] Loaded queue successfully.
[06:46:40] + Closed connections
[06:46:40]
[06:46:40] + Processing work unit
[06:46:40] Core required: FahCore_a5.exe
[06:46:40] Core found.
[06:46:40] Working on queue slot 04 [September 2 06:46:40 UTC]
[06:46:40] + Working ...
[06:46:40]
[06:46:40] *------------------------------*
[06:46:40] Folding@Home Gromacs SMP Core
[06:46:40] Version 2.27 (Mar 12, 2010)
[06:46:40]
[06:46:40] Preparing to commence simulation
[06:46:40] - Looking at optimizations...
[06:46:40] - Created dyn
[06:46:40] - Files status OK
[06:46:45] - Expanded 24862433 -> 30796292 (decompressed 123.8 percent)
[06:46:45] Called DecompressByteArray: compressed_data_size=24862433 data_size=30796292, decompressed_data_size=30796292 diff=0
[06:46:45] - Digital signature verified
[06:46:45]
[06:46:45] Project: 6900 (Run 21, Clone 12, Gen 32)
[06:46:45]
[06:46:45] Assembly optimizations on if available.
[06:46:45] Entering M.D.
[06:46:51] Mapping NT from 12 to 12
[06:46:54] Completed 0 out of 250000 steps  (0%)
[07:11:06] Completed 2500 out of 250000 steps  (1%)
[07:35:31] Completed 5000 out of 250000 steps  (2%)
[08:01:10] Completed 7500 out of 250000 steps  (3%)
[08:24:50] Completed 10000 out of 250000 steps  (4%)
...
[21:52:21] Completed 240000 out of 250000 steps  (96%)
[22:15:56] Completed 242500 out of 250000 steps  (97%)
[22:39:32] Completed 245000 out of 250000 steps  (98%)
[23:03:10] Completed 247500 out of 250000 steps  (99%)
[23:26:46] Completed 250000 out of 250000 steps  (100%)
[23:26:59] DynamicWrapper: Finished Work Unit: sleep=10000
[23:27:09]
[23:27:09] Finished Work Unit:
[23:27:09] - Reading up to 52713120 from "work/wudata_04.trr": Read 52713120
[23:27:11] trr file hash check passed.
[23:27:11] - Reading up to 46989724 from "work/wudata_04.xtc": Read 46989724
[23:27:12] xtc file hash check passed.
[23:27:12] edr file hash check passed.
[23:27:12] logfile size: 204797
[23:27:12] Leaving Run
[23:27:12] - Writing 100075581 bytes of core data to disk...
[23:27:14]   ... Done.
[23:27:55] - Shutting down core
[23:27:55]
[23:27:55] Folding@home Core Shutdown: FINISHED_UNIT
[23:28:05] CoreStatus = 64 (100)
[23:28:05] Sending work to server
[23:28:05] Project: 6900 (Run 21, Clone 12, Gen 32)


[23:28:05] + Attempting to send results [September 3 23:28:05 UTC]

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.


--- Opening Log file [September 4 15:16:46 UTC] 


# Windows CPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: c:\FAH\SMP
Executable: fah-634
Arguments: -send all 

[15:16:46] - Ask before connecting: No
[15:16:46] - User name: DrSpalding (Team 48083)
[15:16:46] - User ID: 60DDBAE11FB3894D
[15:16:46] - Machine ID: 2
[15:16:46] 
[15:16:46] Loaded queue successfully.
[15:16:46] Attempting to return result(s) to server...
[15:16:46] Project: 6900 (Run 21, Clone 12, Gen 32)
[15:16:46] - Read packet limit of 540015616... Set to 524286976.


[15:16:46] + Attempting to send results [September 4 15:16:46 UTC]
Machine 2:

Code: Select all

[17:25:14] - Preparing to get new work unit...
[17:25:14] Cleaning up work directory
[17:25:14] + Attempting to get work packet
[17:25:14] Passkey found
[17:25:14] - Connecting to assignment server
[17:25:14] - Successful: assigned to (130.237.232.141).
[17:25:14] + News From Folding@Home: Welcome to Folding@Home
[17:25:14] Loaded queue successfully.
[17:26:20] + Closed connections
[17:26:20]
[17:26:20] + Processing work unit
[17:26:20] Core required: FahCore_a5.exe
[17:26:20] Core found.
[17:26:20] Working on queue slot 07 [August 31 17:26:20 UTC]
[17:26:20] + Working ...
[17:26:20]
[17:26:20] *------------------------------*
[17:26:20] Folding@Home Gromacs SMP Core
[17:26:20] Version 2.27 (Mar 12, 2010)
[17:26:20]
[17:26:20] Preparing to commence simulation
[17:26:20] - Looking at optimizations...
[17:26:20] - Created dyn
[17:26:20] - Files status OK
[17:26:26] - Expanded 24865338 -> 30796292 (decompressed 123.8 percent)
[17:26:26] Called DecompressByteArray: compressed_data_size=24865338 data_size=30796292, decompressed_data_size=30796292 diff=0
[17:26:26] - Digital signature verified
[17:26:26]
[17:26:26] Project: 6900 (Run 4, Clone 11, Gen 35)
[17:26:26]
[17:26:26] Assembly optimizations on if available.
[17:26:26] Entering M.D.
[17:26:33] Mapping NT from 8 to 8
[17:26:56] Completed 0 out of 250000 steps  (0%)
[18:14:03] Completed 2500 out of 250000 steps  (1%)
[19:00:59] Completed 5000 out of 250000 steps  (2%)
[19:47:53] Completed 7500 out of 250000 steps  (3%)
[20:34:45] Completed 10000 out of 250000 steps  (4%)
...
[20:42:41] Completed 240000 out of 250000 steps  (96%)
[21:29:38] Completed 242500 out of 250000 steps  (97%)
[22:16:33] Completed 245000 out of 250000 steps  (98%)
[23:03:31] Completed 247500 out of 250000 steps  (99%)
[23:50:29] Completed 250000 out of 250000 steps  (100%)
[23:50:46] DynamicWrapper: Finished Work Unit: sleep=10000
[23:50:56]
[23:50:56] Finished Work Unit:
[23:50:56] - Reading up to 52713120 from "work/wudata_07.trr": Read 52713120
[23:50:57] trr file hash check passed.
[23:50:57] - Reading up to 46992300 from "work/wudata_07.xtc": Read 46992300
[23:50:57] xtc file hash check passed.
[23:50:57] edr file hash check passed.
[23:50:57] logfile size: 208764
[23:50:57] Leaving Run
[23:50:58] - Writing 100082124 bytes of core data to disk...
[23:51:04]   ... Done.
[23:51:52] - Shutting down core
[23:51:52]
[23:51:52] Folding@home Core Shutdown: FINISHED_UNIT
[23:52:03] CoreStatus = 64 (100)
[23:52:03] Sending work to server
[23:52:03] Project: 6900 (Run 4, Clone 11, Gen 35)


[23:52:03] + Attempting to send results [September 3 23:52:03 UTC]
These machines were sitting at that point from around 16:30 PDT on 3 Sept 2011, and haven't budged since except Machine 1 that I stopped the client, restarted, and watched it upload ~100MB of data, and now it has been sitting for about 45 minutes after the network traffic stopped.

An oddly coincidental happening is that all three of my -bigadv machines started uploading about the same time (16:00 PDT yesterday) but the third one worked fine with a P2692 WU that came from 171.67.108.22. It was then assigned to 130.237.232.141 for its next WU and couldn't connect.

Machine 3:

Code: Select all

[23:23:20] Folding@home Core Shutdown: FINISHED_UNIT
[23:23:25] CoreStatus = 64 (100)
[23:23:25] Sending work to server
[23:23:25] Project: 2692 (Run 1, Clone 21, Gen 148)


[23:23:25] + Attempting to send results [September 3 23:23:25 UTC]
[23:30:21] + Results successfully sent
[23:30:21] Thank you for your contribution to Folding@Home.
[23:30:21] + Number of Units Completed: 71

[23:30:26] - Preparing to get new work unit...
[23:30:26] Cleaning up work directory
[23:30:28] + Attempting to get work packet
[23:30:28] Passkey found
[23:30:28] - Connecting to assignment server
[23:30:29] - Successful: assigned to (130.237.232.141).
[23:30:29] + News From Folding@Home: Welcome to Folding@Home
[23:30:30] Loaded queue successfully.
[00:12:30] + Could not connect to Work Server
[00:12:30] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[00:12:46] + Attempting to get work packet
[00:12:46] Passkey found
[00:12:46] - Connecting to assignment server
[00:12:47] - Successful: assigned to (130.237.232.141).
[00:12:47] + News From Folding@Home: Welcome to Folding@Home
[00:12:47] Loaded queue successfully.
[00:37:05] + Could not connect to Work Server
[00:37:05] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[00:37:19] + Attempting to get work packet
[00:37:19] Passkey found
[00:37:19] - Connecting to assignment server
[00:37:20] - Successful: assigned to (171.67.108.22).
[00:37:20] + News From Folding@Home: Welcome to Folding@Home
[00:37:20] Loaded queue successfully.
[00:38:01] + Closed connections
[00:38:01]
[00:38:01] + Processing work unit
[00:38:01] Core required: FahCore_a5.exe
[00:38:01] Core found.
[00:38:01] Working on queue slot 08 [September 4 00:38:01 UTC]
[00:38:01] + Working ...
[00:38:01]
[00:38:01] *------------------------------*
[00:38:01] Folding@Home Gromacs SMP Core
[00:38:01] Version 2.27 (Mar 12, 2010)
[00:38:01]
[00:38:01] Preparing to commence simulation
[00:38:01] - Looking at optimizations...
[00:38:01] - Created dyn
[00:38:01] - Files status OK
[00:38:07] - Expanded 26694550 -> 33054789 (decompressed 123.8 percent)
[00:38:07] Called DecompressByteArray: compressed_data_size=26694550 data_size=33054789, decompressed_data_size=33054789 diff=0
[00:38:07] - Digital signature verified
[00:38:07]
[00:38:07] Project: 2685 (Run 7, Clone 17, Gen 82)
[00:38:07]
[00:38:07] Assembly optimizations on if available.
[00:38:07] Entering M.D.
[00:38:13] Mapping NT from 8 to 8
[00:38:16] Completed 0 out of 250000 steps  (0%)
[01:16:00] Completed 2500 out of 250000 steps  (1%)
[01:53:47] Completed 5000 out of 250000 steps  (2%)
[02:31:33] Completed 7500 out of 250000 steps  (3%)
[03:09:33] Completed 10000 out of 250000 steps  (4%)
I hope that two machines behind a NAT and simultaneously uploading completed 100MB data sets to the same server for the same project didn't break it somehow. :(

We could use some help here to get these WU's uploaded successfully and the clients to proceed past the upload phase.

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 4:29 pm
by bollix47
There appears to be a few problems associated with .141. Watching my last upload attempt the WU took a normal time to upload judging by the transmission lights, then the sending process hung like it didn't receive an acknowledgement from the server. The server currently has a high net load.

The interesting part is that the client continues to 'think' it's sending so if, for example, you finished a regular smp it won't upload either, even though it's to a different server because the client is 'already sending' from hours earlier.

I had that situation earlier and finally used a -send XX(-send all won't work in this situation) to send the regular smp WU. Just restarting the client doesn't help because it tries to send to .141 and eventually hangs the send process again, never getting to the next WU that is ready to upload.

Initial upload failure to .141:

viewtopic.php?p=194653#p194653

Re: Problems receiving work from 130.237.232.141

Posted: Sun Sep 04, 2011 5:18 pm
by GreyWhiskers
I'm not able to directly monitor my Windows 7 home computer (I'm out of town for a few days), but I did leave with a p6900 processing that completed "on time" at about 6am PST on Saturday - at the time when forum reports showed all the trouble. Looking at the Stanford and EOC stats pages, the completed WU did apparently upload, and gave me the expected credit, at about the time I expected. (when I'm home, I can see the exact time with HFM.net, but I don't have that set up for remote viewing).

I won't know for a couple of days whether it was able to download a new -bigadv work unit or if it is stalled too. But, it did upload a p6900 successfully Sat morning.

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 5:20 pm
by ei57
The server status page reports "no response". Seems like 130.237.232.237 is in trouble too.

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 5:22 pm
by DrSpalding
Update: Now the server 130.237.232.141 (folding-4) is refusing connections and my two idle machines moved past it and downloaded new WUs. The -bigadv WUs must be in short supply due to the server being down and they immediately got normal SMP WUs instead.

I hope that the server comes up in time to upload the WUs for bonus credit, but one machine will probably not get any bonus, as the 4-day mark hits about 10:26 PDT today. I.e. now. :(

Re: 130.237.232.141:8080 Upload Problems

Posted: Sun Sep 04, 2011 7:51 pm
by kasson
Yes--the servers appear down. Please see separate thread. I've contacted our people on the ground there, and we'll try to get it up as soon as we can. Expecting no sooner than Monday morning CET, though. It's 10pm Sunday night there right now.

Re: Problems receiving work from 130.237.232.141

Posted: Mon Sep 05, 2011 4:36 pm
by kasson
We found the problem and removed the corrupt files. I'm hoping the server can auto-rewind and generate the correct files; if not, we might be low on work until I can generate the right ones myself. Let us know how things look.

Re: Problems receiving work from 130.237.232.141

Posted: Mon Sep 05, 2011 5:00 pm
by Drugless
Thankyou kind sir!
Retrying now, will post progress update shortly.

Re: Problems receiving work from 130.237.232.141

Posted: Mon Sep 05, 2011 5:16 pm
by Dave_Goodchild
Just managed to upload all the units waiting on server .141

Thanks for all the efforts to get this server back online.

Re: Problems receiving work from 130.237.232.141

Posted: Mon Sep 05, 2011 5:38 pm
by Jesse_V
Dave_Goodchild wrote:Just managed to upload all the units waiting on server .141

Thanks for all the efforts to get this server back online.
Yep thank kasson, the Pande Group, and the server people over in Sweden. viewtopic.php?f=18&t=19522#p194706

Re: Problems receiving work from 130.237.232.141

Posted: Mon Sep 05, 2011 8:31 pm
by HaloJones
Appreciate the effort but still cannot get anything down from this server that isn't corrupt. :( Still on a3 units for now.

Re: Problems receiving work from 130.237.232.141

Posted: Wed Sep 07, 2011 10:29 pm
by Dave_Goodchild
Looks like this server is out of work and with .22 low on work all my less than 12 core machines are running standard SMP is there any update on the situation please?

Re: Problems receiving work from 130.237.232.141

Posted: Thu Sep 08, 2011 12:25 am
by kasson
We took the server off assign because we need to drain the work units before we can correct one of the issues regarding corrupt work units. There may be a shortage of bigadv work units until it comes back on assign.