Page 1 of 2

Can't send work units for weeks now

Posted: Tue Feb 03, 2009 12:14 am
by eberlyml
I have been having problems since early in January sending work units back, have posted on specific server issues a few times. I will paste from the logs of the latest one not to go back to see if anyone sees any reason for the problems I'm having. It started with the Windows 6.20 client and after losing about 6 of the last 7 units I decided to remove the 6.20 client and install the 6.23. Well, guess what! You guessed it same thing on the first unit that finished with it. I have been folding with this same machine on the same network for close to a year now with no problems before this. I guess I question if it has anything to do with the servers anymore. Can someone please take a look at the below and see if you see anything that might indicate what is going on here. I am aware of no changes in my network here that could be causing this. All the server addresses give me an "OK" when put in a browser.

TIA,
PME

Code: Select all

[19:24:09] Completed 198000 out of 200000 steps  (99%)
[19:39:12] Timer requesting checkpoint
[19:49:39] Completed 200000 out of 200000 steps  (100%)
[19:50:43] 
[19:50:43] Finished Work Unit:
[19:50:43] - Reading up to 2269856 from "work/wudata_01.trr": Read 2269856
[19:50:44] - Reading up to 121256 from "work/wudata_01.xtc": Read 121256
[19:50:44] xvg file size: 65369
[19:50:44] logfile size: 38466
[19:50:44] Leaving Run
[19:50:49] - Writing 2581287 bytes of core data to disk...
[19:50:49]   ... Done.
[19:50:49] - Shutting down core
[19:50:49] 
[19:50:49] Folding@home Core Shutdown: FINISHED_UNIT
[19:50:52] CoreStatus = 64 (100)
[19:50:52] Sending work to server
[19:50:52] Project: 3858 (Run 6376, Clone 0, Gen 28)


[19:50:52] + Attempting to send results [February 1 19:50:52 UTC]
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:8080)
[19:50:53] + Retrying using alternative port
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:80)
[19:50:53] - Error: Could not transmit unit 01 (completed February 1) to work server.
[19:50:53]   Keeping unit 01 in queue.
[19:50:53] Project: 3858 (Run 6376, Clone 0, Gen 28)


[19:50:53] + Attempting to send results [February 1 19:50:53 UTC]
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:8080)
[19:50:53] + Retrying using alternative port
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:80)
[19:50:53] - Error: Could not transmit unit 01 (completed February 1) to work server.


[19:50:53] + Attempting to send results [February 1 19:50:53 UTC]
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (171.65.103.100:8080)
[19:50:53] + Retrying using alternative port
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (171.65.103.100:80)
[19:50:53]   Could not transmit unit 01 to Collection server; keeping in queue.
[19:50:53] - Preparing to get new work unit...
[19:50:53] + Attempting to get work packet
[19:50:53] - Connecting to assignment server
[19:50:53] - Successful: assigned to (128.59.74.4).
[19:50:53] + News From Folding@Home: Welcome to Folding@Home
[19:50:54] Loaded queue successfully.
[19:50:56] Project: 3858 (Run 6376, Clone 0, Gen 28)


[19:50:56] + Attempting to send results [February 1 19:50:56 UTC]
[19:50:56] - Couldn't send HTTP request to server
[19:50:56] + Could not connect to Work Server (results)
[19:50:56]     (128.59.74.4:8080)
[19:50:56] + Retrying using alternative port
[19:50:56] - Couldn't send HTTP request to server
[19:50:56] + Could not connect to Work Server (results)
[19:50:56]     (128.59.74.4:80)
[19:50:56] - Error: Could not transmit unit 01 (completed February 1) to work server.


[19:50:56] + Attempting to send results [February 1 19:50:56 UTC]
[19:50:56] - Couldn't send HTTP request to server
[19:50:56] + Could not connect to Work Server (results)
[19:50:56]     (171.65.103.100:8080)
[19:50:56] + Retrying using alternative port
[19:50:56] - Couldn't send HTTP request to server
[19:50:56] + Could not connect to Work Server (results)
[19:50:56]     (171.65.103.100:80)
[19:50:56]   Could not transmit unit 01 to Collection server; keeping in queue.
[19:50:56] + Closed connections

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 12:48 am
by DanGe
Are you able to download WUs?

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 12:59 am
by eberlyml
DanGe wrote:Are you able to download WUs?

If you notice the following lines are in the middle of the log portion that I included in my original post.

[19:50:53] Could not transmit unit 01 to Collection server; keeping in queue.
[19:50:53] - Preparing to get new work unit...
[19:50:53] + Attempting to get work packet
[19:50:53] - Connecting to assignment server
[19:50:53] - Successful: assigned to (128.59.74.4).
[19:50:53] + News From Folding@Home: Welcome to Folding@Home
[19:50:54] Loaded queue successfully.
[19:50:56] Project: 3858 (Run 6376, Clone 0, Gen 28)


Does the new unit come from same server I can't connect to to send in a work unit, or is the assignment server different and it assigns this unit to be returned here? All seems kinda strange to me! :?

Thanks for looking,
PME

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 2:30 am
by DanGe
eberlyml wrote:If you notice the following lines
Oops, lol :oops: Didn't see that...

According to the server status page right now, 128.59.74.4 has a load of 97 while 171.65.103.100 has a load of 208. I can see why you can't connect to the second one (208 is rather high), but I'm not sure if 97 is high for the first server.

When you typed the server addresses in your browser, did you enter in the port numbers (e.g. 171.65.103.100:8080)?

If you did all that and if it is not the servers, then the last thing I can think of is your firewall.

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 2:48 am
by eberlyml
DanGe wrote:
eberlyml wrote:If you notice the following lines
Oops, lol :oops: Didn't see that...

According to the server status page right now, 128.59.74.4 has a load of 97 while 171.65.103.100 has a load of 208. I can see why you can't connect to the second one (208 is rather high), but I'm not sure if 97 is high for the first server.

When you typed the server addresses in your browser, did you enter in the port numbers (e.g. 171.65.103.100:8080)?

Yes.

If you did all that and if it is not the servers, then the last thing I can think of is your firewall.
Checked the firewall logs numerous times, tried turning the Windows f/w off a few times and then sending finished wu's in. No change. Don't know why it would all of a sudden become a problem, or why it does not affect the download.
Any way, thanks for thinking thru this with me. Looks like this machine will be taken out of the folding mix, and maybe retired as it serves no other real purpose anymore (only Windows machine left here any way).

PME

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 9:25 pm
by codysluder
eberlyml wrote:Does the new unit come from same server I can't connect to to send in a work unit, or is the assignment server different and it assigns this unit to be returned here? All seems kinda strange to me! :?
Yes, it takes some time to understand how all the parts of FAH work together.

A new WU can come from a number of different servers. It must be returned to that same server or to a collection server. (The collection servers are supposed to be used only when the primary server is down or is overloaded.)

FAH has been experiencing a lot of server overload recently. The new server code will help that a lot when it comes on-line.
http://folding.typepad.com/news/2008/11/index.html

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 9:47 pm
by eberlyml
So if that is the case, I am getting a new work unit within seconds of not being able to send a finished unit back to the same server. Something certainly wrong with that scenario!

Any other ideas certainly welcome here,
PME

Re: Can't send work units for weeks now

Posted: Tue Feb 03, 2009 10:06 pm
by eberlyml
Just got to a place where I could check this machine again, it finished another wu today and again is unable to send it in. I left it go with the 6.20 client until it had five units it was not able to send back. Then I took off 6.20 and installed 6.23 and I'm starting the same process all over again. It has no problem getting a new work unit to fold, but can send very few back anymore. I hate to stop folding on that machine, but it's not doing anyone any good this way either.

PME

Re: Can't send work units for weeks now

Posted: Fri Feb 06, 2009 6:00 pm
by bruce
Nobody has asked you about your firewall / security software. There's a reasonable probability that if you have some good security software (i.e.- something more effective than what is supplied by Microsoft) it's blocking outgoing connections.

Shut down any program that might be a security risk if it connects to the internet. Temporarily disable all non-MS security software. Run fah with the -send all flag. It something uploads, you'll need to reconfigure your security software to give the FAH software more rights to connect to Stanford.

Whether it works or not, reboot to set everything back to normal and post that segment of FAHlog.

Re: Can't send work units for weeks now

Posted: Fri Feb 06, 2009 11:11 pm
by eberlyml
OK, here's what I've done so far. I have disabled/routed around everything that could be a problem including turning off the firewall in my dsl modem. I am still getting the same report in the FAHlog as I initially reported.

Now, one thing I have found. A little while ago I enabled logging on my dsl modem to see if it had anything to say about my problems. the security logs are showing this
:FIREWALL icmp check (1 of 2): Protocol: ICMP Src ip: 128.59.74.4 Dst ip: xx.xx.my.ip Type: Destination Unreachable Code: Communication with Destination Host is Administratively Prohibited
This happens every time I restart the fah client and it tries to send the finished work units.

So it logs it on the modem firewall even when the firewall is turned off on the modem. Does this have anything to do with my problems here?

ideas welcome,
PME

Re: Can't send work units for weeks now

Posted: Sat Feb 07, 2009 5:14 am
by Amaruk

Code: Select all

[19:50:49] Folding@home Core Shutdown: FINISHED_UNIT
[19:50:52] CoreStatus = 64 (100)
[19:50:52] Sending work to server
[19:50:52] Project: 3858 (Run 6376, Clone 0, Gen 28)


[19:50:52] + Attempting to send results [February 1 19:50:52 UTC]
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:8080)
[19:50:53] + Retrying using alternative port
[19:50:53] - Couldn't send HTTP request to server
[19:50:53] + Could not connect to Work Server (results)
[19:50:53]     (128.59.74.4:80)
[19:50:53] - Error: Could not transmit unit 01 (completed February 1) to work server.
[19:50:53]   Keeping unit 01 in queue.
[19:50:53] Project: 3858 (Run 6376, Clone 0, Gen 28)
Project 3858 uses FahCore_7c.exe - this is the only core of the five that I've used that required internet access.


If you haven't done so already, try giving FahCore_7c.exe permission to access the net.

Re: Can't send work units for weeks now

Posted: Sat Feb 07, 2009 5:43 am
by eberlyml
Thanks for your response, I guess I don't follow you here. I have tried sending many units many times without f/w's in place and always get the same message. Can always d/l a new work unit and keeps trying to send finished units over and over but never does. Do I need to do something with permissions on the core? (not much of a Windows guy here).

Re: Can't send work units for weeks now

Posted: Sat Feb 07, 2009 6:34 am
by John_Weatherman
eberlyml wrote:.

Now, one thing I have found. A little while ago I enabled logging on my dsl modem to see if it had anything to say about my problems. the security logs are showing this
:FIREWALL icmp check (1 of 2): Protocol: ICMP Src ip: 128.59.74.4 Dst ip: xx.xx.my.ip Type: Destination Unreachable Code: Communication with Destination Host is Administratively Prohibited
This happens every time I restart the fah client and it tries to send the finished work units.

So it logs it on the modem firewall even when the firewall is turned off on the modem. Does this have anything to do with my problems here?

ideas welcome,
PME
Just a thought, are you actually turning off the firewall? Do you need to log in, for example, as an Administrator to turn it off? What is the type of modem you have, in case anyone else has had the same problem?

Re: Can't send work units for weeks now

Posted: Sat Feb 07, 2009 12:18 pm
by eberlyml
Just a thought, are you actually turning off the firewall? Do you need to log in, for example, as an Administrator to turn it off? What is the type of modem you have, in case anyone else has had the same problem?

I am logging in as admin and turning off the firewall. I have multiple firewalls on my network and have at one point routed around every one of them, and wired the machine in question directly to the modem, turned off the modem firewall and the Windows firewall for a very short time and tried sending the finished units. Got the same exact report from FAHlog as I do with everything in place. If indeed it is on this end it must be some Windows setting or some setting I've yet to find on the modem, which by the way is a Speedtouch 546. Could the ICMP protocol check in the modem have anything to do with this mess?
The modem indicates that this can be turned off only by CLI, as of yet I've not found any way of doing so but am still looking into that.

As always, thanks for your ideas

PME

Re: Can't send work units for weeks now

Posted: Sat Feb 07, 2009 1:57 pm
by Xilikon
Nobody asked if you added a rule to allow port 8080 since uploads use that port and downloads use port 80. I forgot to enable port 8080 before and I was able to download a unit but not upload.