Server upload problem - 130.237.232.237

Horvat · Post by **Horvat** » Sun Nov 20, 2011 7:05 am

I would like to report a problem with servers not accepting or whatever the rue nature of the problem is. This is part of one of my client logs that shows what is happening. I have partially discussed this in another thread I created about problems with a WU.

[15:58:11] - Couldn't send HTTP request to server
[15:58:11] + Could not connect to Work Server (results)
[15:58:11] (130.237.232.237:8080)
[15:58:11] + Retrying using alternative port
[16:04:45] Completed 95000 out of 250000 steps (38%)
[16:20:06] Completed 97500 out of 250000 steps (39%)
[16:35:25] Completed 100000 out of 250000 steps (40%)
[16:38:20] - Couldn't send HTTP request to server
[16:38:20] + Could not connect to Work Server (results)
[16:38:20] (130.237.232.237:80)
[16:38:20] - Error: Could not transmit unit 03 (completed November 13) to work server.

All for of my systems are setup the same and this does not occur with all the systems nor all the time with this particular system. It was commented in the one thread I started about problems with a WU that it took over 5 days to return a WU and I only received base points for it. Not true, I posted the log for that WU and it showed it being returned in less than 5 days. Additionally, Grandpa stated based on part of log I posted that the server was not communicating back with my system that the WU had been received. This is not a problem with my hardware but with the upload servers. This has already cost me almost a million points and I do not feel I should continue to be penalized for a problem that I have no control over. Please advise. Thank you.

Post by **bruce** » Mon Nov 21, 2011 8:55 pm

I see you've gotten no response. I have not been on the forum for a couple of days so I'm just checking to see if some action has already been taken or if this still needs attention. You did get credit for a 6904 not long after you asked the question so it's likely that's the WU that you were looking for.

Post by **Macaholic** » Mon Nov 21, 2011 9:23 pm

Other threads posted about this machine/issue are;

viewtopic.php?f=19&t=20052
viewtopic.php?f=19&t=20071

Are you using Langouste?

Horvat · Post by **Horvat** » Tue Nov 22, 2011 11:31 pm

Thank you for the response. However, I need some help with what seems to be the root of my problems. Here is an excerpt from one of my logs:

Code: Select all

[22:15:15] Completed 250000 out of 250000 steps (100%) 
[22:15:43] DynamicWrapper: Finished Work Unit: sleep=10000 
[22:15:53] 
[22:15:53] Finished Work Unit: 
[22:15:53] - Reading up to 121622496 from "work/wudata_01.trr": Read 121622496 
[22:15:54] trr file hash check passed. 
[22:15:54] - Reading up to 108761932 from "work/wudata_01.xtc": Read 108761932 
[22:15:55] xtc file hash check passed. 
[22:15:55] edr file hash check passed. 
[22:15:55] logfile size: 215383 
[22:15:55] Leaving Run 
[22:15:59] - Writing 230772803 bytes of core data to disk... 
[22:16:47] Done: 230772291 -> 222420165 (compressed to 3.3 percent) 
[22:16:47] ... Done. 
[22:17:13] - Shutting down core 
[22:17:13] 
[22:17:13] Folding@home Core Shutdown: FINISHED_UNIT 
[22:17:15] CoreStatus = 64 (100) 
[22:17:15] Sending work to server 
[22:17:15] Project: 6903 (Run 8, Clone 9, Gen 29) 


[22:17:15] + Attempting to send results [November 22 22:17:15 UTC] 
[23:04:01] - Couldn't send HTTP request to server 
[23:04:01] + Could not connect to Work Server (results) 
[23:04:01] (130.237.232.237:8080) 
[23:04:01] + Retrying using alternative port

What is happening is this continues on and on until I reboot the system. At that point I will get an entry "Attempting to send results", then I will download a new WU. During processing the new WU, it will periodically give an entry retrying to send unit until eventually I will get an entry to the effect "this WU has already been received. I am getting credit for the WU at this time even though my systems will continue to try to upload them.

Code: Select all

# Linux SMP Console Edition ################################################### 
############################################################################### 

Folding@Home Client Version 6.34 

http://folding.stanford.edu 

############################################################################### 
############################################################################### 

Launch directory: /usr/local/fah 
Executable: ./fah6 
Arguments: -bigadv -smp 

[23:08:19] - Ask before connecting: No 
[23:08:19] - User name: Horvat (Team 111065) 
[23:08:19] - User ID: xxxxxxxxxxxxx
[23:08:19] - Machine ID: 1 
[23:08:19] 
[23:08:19] Loaded queue successfully. 
[23:08:19] - Preparing to get new work unit... 
[23:08:19] Cleaning up work directory 
[23:08:19] Project: 6903 (Run 8, Clone 9, Gen 29) 
[23:08:19] + Attempting to get work packet 
[23:08:19] Passkey found 
[23:08:19] - Connecting to assignment server 


[23:08:19] + Attempting to send results [November 22 23:08:19 UTC] 
[23:08:20] - Successful: assigned to (130.237.232.237). 
[23:08:23] + News From Folding@Home: Welcome to Folding@Home 
[23:08:23] Loaded queue successfully. 
[23:10:51] + Closed connections 
[23:10:51] 
[23:10:51] + Processing work unit 
[23:10:51] Core required: FahCore_a5.exe 
[23:10:51] Core found. 
[23:10:51] Working on queue slot 02 [November 22 23:10:51 UTC] 
[23:10:52] + Working ... 
[23:10:52] 
[23:10:52] *------------------------------* 
[23:10:52] Folding@Home Gromacs SMP Core 
[23:10:52] Version 2.27 (Thu Feb 10 09:46:40 PST 2011) 
[23:10:52] 
[23:10:52] Preparing to commence simulation 
[23:10:52] - Looking at optimizations... 
[23:10:52] - Created dyn 
[23:10:52] - Files status OK 
[23:10:58] - Expanded 57243802 -> 71846524 (decompressed 50.4 percent) 
[23:10:58] Called DecompressByteArray: compressed_data_size=57243802 data_size=71846524, decompressed_data_size=71846524 diff=0 
[23:10:58] - Digital signature verified 
[23:10:58] 
[23:10:58] Project: 6903 (Run 2, Clone 23, Gen 1) 
[23:10:59] 
[23:10:59] Assembly optimizations on if available. 
[23:10:59] Entering M.D. 
[23:11:07] Mapping NT from 24 to 24 
[23:11:11] Completed 0 out of 250000 steps (0%)

All 4 of my systems continue to do this even after I have completely reinstalled the OS and FAH client. My systems are dual socket intel systems running native Linux. Any help on this?

Mod Edit: Added Code Tags - PantherX

Horvat · Post by **Horvat** » Wed Nov 23, 2011 3:12 am

As an update, here is a continuation or the log posted above. I would like to note it only does it with the 6903 and 6904 WU's. I have no problems with the 6901 and SMP WU's. I have a 7MB DSL line with 1MB upload speed, as someone had mentioned in another thread it could possibly be my internet connection.

[23:11:07] Mapping NT from 24 to 24
[23:11:11] Completed 0 out of 250000 steps (0%)
[23:35:58] - Couldn't send HTTP request to server
[23:35:58] + Could not connect to Work Server (results)
[23:35:58] (130.237.232.237:8080)
[23:35:58] + Retrying using alternative port
[23:46:00] Completed 2500 out of 250000 steps (1%)
[00:20:41] Completed 5000 out of 250000 steps (2%)
[00:28:02] - Couldn't send HTTP request to server
[00:28:02] + Could not connect to Work Server (results)
[00:28:02] (130.237.232.237:80)
[00:28:02] - Error: Could not transmit unit 01 (completed November 22) to work server.
[00:28:02] Keeping unit 01 in queue.
[00:55:24] Completed 7500 out of 250000 steps (3%)
[01:30:10] Completed 10000 out of 250000 steps (4%)
[02:04:55] Completed 12500 out of 250000 steps (5%)
[02:39:55] Completed 15000 out of 250000 steps (6%)

So at this point I did not get credit for this WU. [22:17:15] Project: 6903 (Run 8, Clone 9, Gen 29)

Post by **Macaholic** » Wed Nov 23, 2011 4:00 am

No. P6903 (R8, C9, G29) is apparently still stuck in queue for you. Thus, no credit yet. Could your DSL carrier be capping your upload file size?

Horvat · Post by **Horvat** » Wed Nov 23, 2011 4:42 am

No on the upload file size restriction. They actually advertise it as being larger than what I posted earlier but I have confirmed with a 3rd party bandwidth tester it is not.

I have an update, here is a continuation of the log that shows it finally uploaded. So what is the problem here? It is not Internet connection, I have ruled out OS, FAH client and setup. Would you please consider that the server is having some type of issue?

Code: Select all

[23:46:00] Completed 2500 out of 250000 steps (1%) 
[00:20:41] Completed 5000 out of 250000 steps (2%) 
[00:28:02] - Couldn't send HTTP request to server 
[00:28:02] + Could not connect to Work Server (results) 
[00:28:02] (130.237.232.237:80) 
[00:28:02] - Error: Could not transmit unit 01 (completed November 22) to work server. 
[00:28:02] Keeping unit 01 in queue. 
[00:55:24] Completed 7500 out of 250000 steps (3%) 
[01:30:10] Completed 10000 out of 250000 steps (4%) 
[02:04:55] Completed 12500 out of 250000 steps (5%) 
[02:39:55] Completed 15000 out of 250000 steps (6%) 


--- Opening Log file [November 23 03:14:52 UTC] 


# Linux SMP Console Edition ################################################### 
############################################################################### 

Folding@Home Client Version 6.34 

http://folding.stanford.edu 

############################################################################### 
############################################################################### 

Launch directory: /usr/local/fah 
Executable: ./fah6 
Arguments: -bigadv -smp 

[03:14:52] - Ask before connecting: No 
[03:14:52] - User name: Horvat (Team 111065) 
[03:14:52] - User ID: xxxxxxxxxxxxxx
[03:14:52] - Machine ID: 1 
[03:14:52] 
[03:14:53] Loaded queue successfully. 
[03:14:53] 
[03:14:53] + Processing work unit 
[03:14:53] Core required: FahCore_a5.exe 
[03:14:53] Core found. 
[03:14:53] Project: 6903 (Run 8, Clone 9, Gen 29) 


[03:14:53] Working on queue slot 02 [November 23 03:14:53 UTC] 
[03:14:53] + Working ... 
[03:14:53] + Attempting to send results [November 23 03:14:53 UTC] 
[03:14:53] 
[03:14:56] *------------------------------* 
[03:14:56] Folding@Home Gromacs SMP Core 
[03:14:56] Version 2.27 (Thu Feb 10 09:46:40 PST 2011) 
[03:14:57] 
[03:14:57] Preparing to commence simulation 
[03:14:57] - Ensuring status. Please wait. 
[03:15:03] - Looking at optimizations... 
[03:15:03] - Working with standard loops on this execution. 
[03:15:03] - Previous termination of core was improper. 
[03:15:03] - Files status OK 
[03:15:10] - Expanded 57243802 -> 71846524 (decompressed 50.4 percent) 
[03:15:10] Called DecompressByteArray: compressed_data_size=57243802 data_size=71846524, decompressed_data_size=71846524 diff=0 
[03:15:10] - Digital signature verified 
[03:15:10] 
[03:15:10] Project: 6903 (Run 2, Clone 23, Gen 1) 
[03:15:10] 
[03:15:10] Entering M.D. 
[03:15:16] Using Gromacs checkpoints 
[03:15:22] Mapping NT from 24 to 24 
[03:15:54] Resuming from checkpoint 
[03:16:12] Verified work/wudata_02.log 
[03:16:13] Verified work/wudata_02.trr 
[03:16:13] Verified work/wudata_02.xtc 
[03:16:13] Verified work/wudata_02.edr 
[03:16:14] Completed 17245 out of 250000 steps (6%) 
[03:19:51] Completed 17500 out of 250000 steps (7%) 
[03:54:30] + Results successfully sent 
[03:54:30] Thank you for your contribution to Folding@Home. 
[03:54:30] + Starting local stats count at 1 
[03:55:36] Completed 20000 out of 250000 steps (8%) 
[04:31:19] Completed 22500 out of 250000 steps (9%)

Mod Edit: Added Code Tags - PantherX

Horvat · Post by **Horvat** » Sun Nov 27, 2011 9:52 pm

I will end this thread since no one cares to step up. All of a sudden my servers are magically connecting and sending the WU's with no problem. Thanks for nothing. I know, it was my hardware and it magically fixed itself.

Grandpa_01 · Post by **Grandpa_01** » Sun Nov 27, 2011 10:23 pm

Hovart if anybody on this would have been able to help they would have. Sorry you feel you were neglected but nobody had a answer to your problem.

Horvat · Post by **Horvat** » Sun Nov 27, 2011 11:47 pm

Thank you Grandpa. It seems it was not a matter of me needing the help. I think it was a problem with their server. If I changed nothing on my part on 4 separate computers that were all having the same problem, and now they all upload properly, the problem was not mine, it was the PG servers. That was the inference of my last post.

k1wi · Post by **k1wi** » Mon Nov 28, 2011 1:19 am

The problem could have been any of the numerous hops between your router and Sweden, where the server is located...

I'm not sure of your level of technical knowledge, so I'm writing a fairly comprehensive post that may be overly informative.

In any event, from your logs I can see that it took your machine over 45 minutes to attempt to upload the larger work unit to the server. That indicates an upload speed of less than ~35kB/s, or ~.285Mbit/s (as we do not know how much was uploaded in that period and assumes that the file size was ~100MB). Given your upload speed is at least 1Mbit, confirmed by your third party app, there is a bottleneck somewhere between your router and PG. The smaller file size of the 6901 work unit means that it is likely that uploading at this rate still allowed it to upload in a decent amount of time.

Reasons to suggest it wasn't PG's servers is that there wasn't a lot of other users posting in this thread reporting similar experiences, yet all your machines were experiencing an issue and they were experiencing it continuously for an extended period of time. If the bottleneck was at PG's server, or their network connection to the outside world, then all users would be affected and all would be unable to upload their work units. Four machines in one location failing to connect and no systemic reports of issues from multiple locations globally != immediately suggest that the problem is with PG servers. Rather, it points to an issue with the connection between the one location and the servers. Indeed, the lack of other people reporting an issue tends to suggest that the issue is closer to your end than it is to PG's as the closer the issue is to PG's server, the more people will be affected. (If the problem was the interconnect switch between Sweden and France, all US and donors from many other countries who's ISP has interconnect deals with a carrier that routes to Sweden through that switch in France would be affected.)

As I said, any number of reasons could cause an issue between your location and PG's servers - 99% of them being out of PG's hands. For one thing, most speed-tests tend to measure only the distance from you to your ISP (or their closest testing server to you). Testing from you to an overseas location, such as trans-Atlantic - often yield slower and fluctuating results. This can be due to an interconnect being faulty and causing odd routings that degrade performance, due to your ISP not purchasing enough carrier bandwidth to accommodate demand or due network congestion on one or more hops, amongst many.

The issue magically resolving itself leads me to suspect that there was an issue somewhere between your router and PG that was resolved. 'Magical connecting' != mean that PG fixed something without saying anything, it means something was fixed somewhere. It does not conclude that the issue was PG's servers.

But then this is just the view of someone who'd be blinded by the loss of light if Pande Group comes to an abrupt stop.

Folding Forum

Server upload problem - 130.237.232.237

Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237 [Resolved]

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237

Re: Server upload problem - 130.237.232.237