Page 1 of 4
130.237.232.237 going down for maintenance
Posted: Thu Apr 05, 2012 10:49 am
by tofuwombat
I cannot ping this work server.
Is there a way for me to send this finished WU to a machine that wants it?
Seems odd that the collection server is 0.0.0.0
Code: Select all
Slot 05 Done
Project: 6903 (Run 5, Clone 7, Gen 74), Core: a5
Work server: 130.237.232.237:8080
Collection server: 0.0.0.0
Download date: April 1 23:58:50
Finished date: April 5 01:30:24
Failed uploads: 9
Code: Select all
./fah6 -send all -bigadv
Note: Please read the license agreement (fah6 -license). Further
use of this software requires that you have read and accepted this agreement.
--- Opening Log file [April 5 10:28:05 UTC]
# Linux Console Edition #######################################################
###############################################################################
Folding@Home Client Version 6.34
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/tofuwombat/fah
Executable: ./fah6
Arguments: -send all -bigadv
[10:28:05] - Ask before connecting: No
[10:28:05] - User name: tofuwombat (Team 155278)
[10:28:05] - User ID:*********************************
[10:28:05] - Machine ID: 4
[10:28:05]
[10:28:05] Loaded queue successfully.
[10:28:05] Attempting to return result(s) to server...
[10:28:05] Project: 6903 (Run 5, Clone 7, Gen 74)
[10:28:05] - Read packet limit of 540015616... Set to 524286976.
[10:28:05] + Attempting to send results [April 5 10:28:05 UTC]
[10:29:57] - Couldn't send HTTP request to server
[10:29:57] + Could not connect to Work Server (results)
[10:29:57] (130.237.232.237:8080)
[10:29:57] + Retrying using alternative port
[10:31:40] - Couldn't send HTTP request to server
[10:31:40] + Could not connect to Work Server (results)
[10:31:40] (130.237.232.237:80)
[10:31:40] - Error: Could not transmit unit 05 (completed April 5) to work server.
[10:31:40] Keeping unit 05 in queue.
[10:31:40] - Failed to send all units to server
Folding@Home Client Shutdown.
tofuwombat@schnellzug:~/fah$
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 11:09 am
by bollix47
Something is definitely going on with that server. The client type has changed from SMP to classic in the past few hours and the info at the end of the status line is mostly missing. Possibly the server is being worked on and might be back to 'normal' soon.
Did your next WU download okay?
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 1:21 pm
by bollix47
Just finished downloading a WU from this server and another WU is currently uploading to it. The client designation is back to SMP. Appears to be working 'normal' again.
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 5:30 pm
by tofuwombat
Yes, the next WU was downloaded, but I still cannot return the finished one.
Machine is pingable with ping from the command line. Cannot upload, cannot get and "OK" with firefox.
Thanks for the news that it is working for someone.
I'm not convinced my new machine is set up right. Most of my folding has been win&&GPU. This box is Kubuntu10.10/kraken/langoste built especially for high value BigAdv.
It runs fine on smp without -bigadv. but it is frustrating to fold long enough to send in twelve smp wu's (3 days); and have nothing to show for it but a stranded file.
Bonus points evaporate fast.
IF this is NOT my error, is BigAdv STILL broken on the server end? If so (and this is a frustrating way to find out) will an "all clear" message get posted, when it is really fixed?
I expected some teething problems on my end with the new box, but how can I tell if the connection fault is mine or not?
It feels like it would be handy to be able to re-direct a finished WU to a slightly smarter(or long list of dumb) collection server(s)
some flag that is the analog of talking to the manager at a store.
Like
-sendboss x
or
-ipsend x blah.blah.blah.blah
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 6:15 pm
by bollix47
It's been a while since I used langouste but I seem to recall that once langouste takes the file for upload you can't use the -send all because the results are no longer in the work directory. I think it moves the results to a langouste sub-directory but since it's been so long I don't remember the location. Maybe ask in the langouste thread how to resend a failed upload:
viewtopic.php?f=14&t=11615&hilit=langouste
It could be that when the client's autosend kicks in (can be up to 6 hours) then langouste will try to send again but you'll get more support in that 3rd party thread.
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 7:45 pm
by bruce
tofuwombat wrote:I cannot ping this work server.
Is there a way for me to send this finished WU to a machine that wants it?
Seems odd that the collection server is 0.0.0.0
The machine that wants it is never the Collection Server (CS), it's your Work Server (WS). A CS is a redundant server that can sometimes accept an upload and forward it to the actual WS later. Not having a working CS isn't a problem except after the assigned WS fails.
tofuwombat wrote:Machine is pingable with ping from the command line. Cannot upload, cannot get and "OK" with firefox.
The OK message is gradually being removed since it's unnecessary. If you get a 404 error, the server is down. If you get
either an OK message or a blank screen, the server is operational at some basic level.
Re: Failed uploads: 9 cannot ping 130.237.232.237:8080
Posted: Thu Apr 05, 2012 10:39 pm
by tofuwombat
Beautiful, timely answers! Thank you all.
130.237.232.237 going down for maintenance
Posted: Mon Apr 09, 2012 3:57 pm
by kasson
It looks like we might have a problem on 130.237.232.237; we're taking it down in order to investigate.
Re: 130.237.232.237 going down for maintenance
Posted: Mon Apr 09, 2012 6:47 pm
by kasson
We're bringing the server back up now; we'll see how it does.
Re: 130.237.232.237 going down for maintenance
Posted: Mon Apr 09, 2012 11:45 pm
by KMac
I am now running again, so this is just FYI. It occurred after the server was brought back online. I see the IO ERROR, but thought the file transfer size may be related to the server issue since a reboot did not resolve the IO error.
One of my machines entered the following loop. I deleted the log, queue and machinedependant.dat to resolve the issue.
Code: Select all
--- Opening Log file [April 9 22:47:03 UTC]
# Linux SMP Console Edition ###################################################
###############################################################################
Folding@Home Client Version 6.34
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/kevin/fah
Executable: ./fah6
Arguments: -smp -bigadv -verbosity 9
[22:47:03] - Ask before connecting: No
[22:47:03] - User name: KMac (Team 33)
[22:47:03] - User ID: ******************
[22:47:03] - Machine ID: 1
[22:47:03]
[22:47:03] Loaded queue successfully.
[22:47:03]
[22:47:03] - Autosending finished units... [April 9 22:47:03 UTC]
[22:47:03] + Processing work unit
[22:47:03] Trying to send all finished work units
[22:47:03] Core required: FahCore_a5.exe
[22:47:03] + No unsent completed units remaining.
[22:47:03] - Autosend completed
[22:47:03] Core found.
[22:47:03] Working on queue slot 07 [April 9 22:47:03 UTC]
[22:47:03] + Working ...
[22:47:03] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 07 -np 48 -priority 96 -checkpoint 15 -verbose -lifeline 5275 -version 634'
[22:47:03]
[22:47:03] *------------------------------*
[22:47:03] Folding@Home Gromacs SMP Core
[22:47:03] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[22:47:03]
[22:47:03] Preparing to commence simulation
[22:47:03] - Looking at optimizations...
[22:47:03] - Created dyn
[22:47:03] - Files status OK
[22:47:03] Couldn't Decompress
[22:47:03] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[22:47:03] -Error: Couldn't update checksum variables
[22:47:03] Error: Could not open work file
[22:47:03]
[22:47:03] Folding@home Core Shutdown: FILE_IO_ERROR
[22:47:03] CoreStatus = 75 (117)
[22:47:03] Error opening or reading from a file.
[22:47:03] Deleting current work unit & continuing...
[22:47:03] Trying to send all finished work units
[22:47:03] + No unsent completed units remaining.
[22:47:03] - Preparing to get new work unit...
[22:47:03] Cleaning up work directory
[22:47:03] + Attempting to get work packet
[22:47:03] Passkey found
[22:47:03] - Will indicate memory of 25545 MB
[22:47:03] - Connecting to assignment server
[22:47:03] Connecting to http://assign.stanford.edu:8080/
[22:47:04] Posted data.
[22:47:04] Initial: ED82; - Successful: assigned to (130.237.232.237).
[22:47:04] + News From Folding@Home: Welcome to Folding@Home
[22:47:04] Loaded queue successfully.
[22:47:04] Sent data
[22:47:04] Connecting to http://130.237.232.237:8080/
[22:47:04] Posted data.
[22:47:04] Initial: 0000; - Receiving payload (expected size: 512)
[22:47:04] Conversation time very short, giving reduced weight in bandwidth avg
[22:47:04] - Downloaded at ~1 kB/s
[22:47:04] - Averaged speed for that direction ~1 kB/s
[22:47:04] + Received work.
[22:47:04] + Closed connections
[22:47:09]
[22:47:09] + Processing work unit
[22:47:09] Core required: FahCore_a5.exe
[22:47:09] Core found.
[22:47:09] Working on queue slot 08 [April 9 22:47:09 UTC]
[22:47:09] + Working ...
[22:47:09] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 08 -np 48 -priority 96 -checkpoint 15 -verbose -lifeline 5275 -version 634'
[22:47:09]
[22:47:09] *------------------------------*
[22:47:09] Folding@Home Gromacs SMP Core
[22:47:09] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[22:47:09]
[22:47:09] Preparing to commence simulation
[22:47:09] - Looking at optimizations...
[22:47:09] - Created dyn
[22:47:09] - Files status OK
[22:47:09] Couldn't Decompress
[22:47:09] Called DecompressByteArray: compressed_data_size=0 data_size=0, decompressed_data_size=0 diff=0
[22:47:09] -Error: Couldn't update checksum variables
[22:47:09] Error: Could not open work file
[22:47:09]
[22:47:09] Folding@home Core Shutdown: FILE_IO_ERROR
[22:47:10] CoreStatus = 75 (117)
[22:47:10] Error opening or reading from a file.
[22:47:10] Deleting current work unit & continuing...
[22:47:10] Trying to send all finished work units
[22:47:10] + No unsent completed units remaining.
[22:47:10] - Preparing to get new work unit...
[22:47:10] Cleaning up work directory
[22:47:10] + Attempting to get work packet
[22:47:10] Passkey found
[22:47:10] - Will indicate memory of 25545 MB
[22:47:10] - Connecting to assignment server
[22:47:10] Connecting to http://assign.stanford.edu:8080/
[22:47:10] Posted data.
[22:47:10] Initial: ED82; - Successful: assigned to (130.237.232.237).
[22:47:10] + News From Folding@Home: Welcome to Folding@Home
[22:47:10] Loaded queue successfully.
[22:47:10] Sent data
[22:47:10] Connecting to http://130.237.232.237:8080/
[22:47:10] Posted data.
[22:47:10] Initial: 0000; - Receiving payload (expected size: 512)
[22:47:10] Conversation time very short, giving reduced weight in bandwidth avg
[22:47:10] - Downloaded at ~1 kB/s
[22:47:10] - Averaged speed for that direction ~1 kB/s
[22:47:10] + Received work.
[22:47:10] + Closed connections
[22:47:15]
[22:47:15] + Processing work unit
[22:47:15] Core required: FahCore_a5.exe
[22:47:15] Core found.
[22:47:15] Working on queue slot 09 [April 9 22:47:15 UTC]
[22:47:15] + Working ...
[22:47:15] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 09 -np 48 -priority 96 -checkpoint 15 -verbose -lifeline 5275 -version 634'
[22:47:16]
[22:47:16] *------------------------------*
Re: 130.237.232.237 going down for maintenance
Posted: Tue Apr 10, 2012 1:58 am
by EXT64
I am having trouble uploading a 6901 WU to this server. It has been trying for most of the afternoon/evening without success. Should this server be back up now, or is this WU now garbage?
Re: 130.237.232.237 going down for maintenance
Posted: Tue Apr 10, 2012 7:23 am
by noname2
Today server is already running? Thanks.
[01:56:55] Project: 6098 (Run 0, Clone 20, Gen 116)
[01:56:55] + Attempting to send results [April 10 01:56:55 UTC]
[01:57:29] + Results successfully sent
[01:57:29] Thank you for your contribution to Folding@Home.
[01:57:29] + Number of Units Completed: 14
[01:57:30] - Preparing to get new work unit...
[01:57:30] Cleaning up work directory
[01:57:30] + Attempting to get work packet
[01:57:30] Passkey found
[01:57:30] - Connecting to assignment server
[01:57:32] - Successful: assigned to (130.237.232.237).
[01:57:32] + News From Folding@Home: Welcome to Folding@Home
[01:57:32] Loaded queue successfully.
[01:57:48] + Closed connections
[01:57:48]
[01:57:48] + Processing work unit
[01:57:48] Core required: FahCore_a5.exe
[01:57:48] Core not found.
[01:57:48] - Core is not present or corrupted.
[01:57:48] - Attempting to download new core...
[01:57:48] + Downloading new core: FahCore_a5.exe
[01:57:49] + 10240 bytes downloaded
[...]
[01:58:11] + 2776254 bytes downloaded
[01:58:11] Verifying core Core_a5.fah...
[01:58:11] Signature is VALID
[01:58:11]
[01:58:11] Trying to unzip core FahCore_a5.exe
[01:58:11] Decompressed FahCore_a5.exe (6272504 bytes) successfully
[01:58:11] + Core successfully engaged
[01:58:16]
[01:58:16] + Processing work unit
[01:58:16] Core required: FahCore_a5.exe
[01:58:16] Core found.
[01:58:16] Working on queue slot 02 [April 10 01:58:16 UTC]
[01:58:16] + Working ...
[01:58:17]
[01:58:17] *------------------------------*
[01:58:17] Folding@Home Gromacs SMP Core
[01:58:17] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[01:58:17]
[01:58:17] Preparing to commence simulation
[01:58:17] - Assembly optimizations manually forced on.
[01:58:17] - Not checking prior termination.
[01:58:18] - Expanded 24865295 -> 30796292 (decompressed 123.8 percent)
[01:58:18] Called DecompressByteArray: compressed_data_size=24865295 data_size=30796292, decompressed_data_size=30796292 diff=0
[01:58:19] - Digital signature verified
[01:58:19]
[01:58:19] Project: 6901 (Run 2, Clone 8, Gen 113)
[01:58:19]
[01:58:19] Assembly optimizations on if available.
[01:58:19] Entering M.D.
[01:58:25] Mapping NT from 12 to 12
[01:58:29] Completed 0 out of 250000 steps (0%)
[02:19:17] Completed 2500 out of 250000 steps (1%)
[02:40:07] Completed 5000 out of 250000 steps (2%)
[03:00:54] Completed 7500 out of 250000 steps (3%)
[03:21:39] Completed 10000 out of 250000 steps (4%)
[03:42:27] Completed 12500 out of 250000 steps (5%)
[04:03:13] Completed 15000 out of 250000 steps (6%)
[04:23:59] Completed 17500 out of 250000 steps (7%)
[04:44:49] Completed 20000 out of 250000 steps (8%)
[05:05:45] Completed 22500 out of 250000 steps (9%)
[05:27:06] Completed 25000 out of 250000 steps (10%)
[05:48:25] Completed 27500 out of 250000 steps (11%)
[06:09:36] Completed 30000 out of 250000 steps (12%)
[06:30:43] Completed 32500 out of 250000 steps (13%)
[06:52:28] Completed 35000 out of 250000 steps (14%)
Re: 130.237.232.237 going down for maintenance
Posted: Tue Apr 10, 2012 11:10 am
by EXT64
I can also confirm it is back up and accepting. It turns out this was just a coincidence and the problem was actually on my end. I am using Ubuntu 12.04, which was working great. Yesterday though, I could connect to download (to F@H and other servers) but not upload (even though I could browse the web). I tried restarting the client a couple times but no luck. Finally, I stopped the client, installed the latest updates, and rebooted. Now it appears to be uploading and downloading normally. Not sure if it just needed a reboot or if one of the older updates was actually defective, but now it seems to be fixed. Thanks for keeping this post up to date so that I could realize the problem was on my end and not yours.
Re: 130.237.232.237 going down for maintenance
Posted: Tue Apr 10, 2012 3:00 pm
by bruce
Updates for Ubuntu 12.04 or for something else?
Re: 130.237.232.237 going down for maintenance
Posted: Thu Apr 12, 2012 3:26 am
by EXT64
It was either Ubuntu (12.04) or the computer just needed a reboot. Here is what happened:
Previous week, folding was normal on Ubuntu 12.04
Installed updates to 12.04 (reboot)
Started folding again, could download WUs but not upload. Then tried an internet speed test. I could again download but not upload. Restarted folding client several times, problem remained.
Stopped folding, installed new 12.04 updates (it had been a couple days), (reboot)
Started folding again and everything worked (and has ever since I think).
So I can't be 100% sure, but it seems that an Ubuntu 12.04 update may have broken and then fixed it. Not sure what update (driver firmware, network manager, etc.)