RESOLVED: 171.64.65.54 overloaded / NOT accepting

Moderators: Site Moderators, FAHC Science Team

noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

RESOLVED: 171.64.65.54 overloaded / NOT accepting

Post by noorman »

.

Code: Select all

8 cores detected
If you see this twice, MPI is working
If you see this twice, MPI is working


--- Opening Log file [June 14 15:20:45 UTC]


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding-at-Home
Executable: C:\Folding-at-Home\FaH6.exe
Arguments: -verbosity 9 -smp -send all

[15:20:45] - Ask before connecting: No
[15:20:45] - User name: noorman (Team 734)
[15:20:45] - User ID: 7D1BA32F532694B4
[15:20:45] - Machine ID: 1
[15:20:45]
[15:20:45] Loaded queue successfully.
[15:20:45] Attempting to return result(s) to server...
[15:20:45] Trying to send all finished work units
[15:20:45] Project: 6041 (Run 0, Clone 51, Gen 30)


[15:20:45] + Attempting to send results [June 14 15:20:45 UTC]
[15:20:45] - Reading file work/wuresults_05.dat from core
[15:20:45]   (Read 63945075 bytes from disk)
[15:20:45] Connecting to http://171.64.65.54:8080/

It just hangs there; wasn't able to send my recently finished A3 core Results when the Client was running 'normally' either !
Just tried it with -send all; just the same story (of course)
( but this way I can restart it faster to try and get it sent off)


.
Last edited by noorman on Thu Jun 17, 2010 6:18 pm, edited 2 times in total.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: 171.64.65.54 NOT Accepting

Post by kasson »

It's assigning and accepting right now. The server is under fairly heavy load--it's possible that all the work threads were busy when you tried to connect.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: 171.64.65.54 NOT Accepting

Post by noorman »

.

Code: Select all

8 cores detected
If you see this twice, MPI is working
If you see this twice, MPI is working


--- Opening Log file [June 14 16:37:54 UTC]


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding-at-Home
Executable: C:\Folding-at-Home\FaH6.exe
Arguments: -verbosity 9 -smp -send all

[16:37:54] - Ask before connecting: No
[16:37:54] - User name: noorman (Team 734)
[16:37:54] - User ID: 7D1BA32F532694B4
[16:37:54] - Machine ID: 1
[16:37:54]
[16:37:54] Loaded queue successfully.
[16:37:54] Attempting to return result(s) to server...
[16:37:54] Trying to send all finished work units
[16:37:54] Project: 6041 (Run 0, Clone 51, Gen 30)


[16:37:54] + Attempting to send results [June 14 16:37:54 UTC]
[16:37:54] - Reading file work/wuresults_05.dat from core
[16:37:57]   (Read 63945075 bytes from disk)
[16:37:57] Connecting to http://171.64.65.54:8080/
[16:38:03] Posted data.
[16:38:03] Initial: 683C; + Could not connect to Work Server (results)
[16:38:03]     (171.64.65.54:8080)
[16:38:03] + Retrying using alternative port
[16:38:03] Connecting to http://171.64.65.54:80/
[16:38:08] Posted data.
[16:38:19] Initial: 683C; + Could not connect to Work Server (results)
[16:38:19]     (171.64.65.54:80)
[16:38:19] - Error: Could not transmit unit 05 (completed June 14) to work serve
r.
[16:38:19] - 2 failed uploads of this unit.


[16:38:19] + Attempting to send results [June 14 16:38:19 UTC]
[16:38:19] - Reading file work/wuresults_05.dat from core
[16:38:19]   (Read 63945075 bytes from disk)
[16:38:19] Connecting to http://171.67.108.25:8080/

.

Still not doing it for me ...


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: 171.64.65.54 NOT Accepting

Post by noorman »

.

After about 7 hours, my SMP Results finally got uploaded :shock:


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
Davabled
Posts: 2
Joined: Wed Jun 17, 2009 4:07 pm

Re: 171.64.65.54 overloaded / NOT accepting

Post by Davabled »

Still not accepting results for me as of 3 p.m. Pacific, been trying for a couple hours. Same for server 171.67.108.25
snippet from log file:

Code: Select all

[22:06:37] + Attempting to send results [June 14 22:06:37 UTC]
[22:06:58] - Couldn't send HTTP request to server
[22:06:58] + Could not connect to Work Server (results)
[22:06:58]     (171.64.65.54:8080)
[22:06:58] + Retrying using alternative port
[22:07:19] - Couldn't send HTTP request to server
[22:07:19] + Could not connect to Work Server (results)
[22:07:19]     (171.64.65.54:80)
[22:07:19] - Error: Could not transmit unit 01 (completed June 14) to work server.

[22:10:02] + Attempting to send results [June 14 22:10:02 UTC]
[22:10:06] + Could not connect to Work Server (results)
[22:10:06]     (171.67.108.25:8080)
[22:10:06] + Retrying using alternative port
[22:10:15] + Could not connect to Work Server (results)
[22:10:15]     (171.67.108.25:80)
[22:10:15]   Could not transmit unit 01 to Collection server; keeping in queue.
[22:10:45] Project: 6041 (Run 0, Clone 68, Gen 23)
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: 171.64.65.54 overloaded / NOT accepting

Post by kasson »

server .54 is currently down for maintenance on a RAID. Our sysadmins are aware this is a time-critical issue, and we'll get this up as soon as we can. No ETA, though.

@noorman, glad it worked. The system logs were a bit weird--in the middle of a bunch of accepts and assigns, there were connection attempts from your IP with nothing to follow. I'm not sure what to make of that.
Datsun 1600
Posts: 33
Joined: Mon May 05, 2008 2:42 am

Re: 171.64.65.54 overloaded / NOT accepting

Post by Datsun 1600 »

No joy here either returning WUs, at least with only one boxen on ATM, I am not overloading the system. Will see how your points allocation on the P6701 is and decide if I will continue racking up a large power bill.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: 171.64.65.54 overloaded / NOT accepting

Post by noorman »

.
kasson wrote:server .54 is currently down for maintenance on a RAID. Our sysadmins are aware this is a time-critical issue, and we'll get this up as soon as we can. No ETA, though.

@noorman, glad it worked. The system logs were a bit weird--in the middle of a bunch of accepts and assigns, there were connection attempts from your IP with nothing to follow. I'm not sure what to make of that.
.

Here 's where my log indicated that it was uploaded:

Code: Select all

[17:19:08] + Attempting to send results [June 14 17:19:08 UTC]
[17:19:08] Core found.
[17:19:08] - Reading file work/wuresults_05.dat from core
[17:19:08] Working on queue slot 06 [June 14 17:19:08 UTC]
[17:19:10] + Working ...
[17:19:10]   (Read 63945075 bytes from disk)
[17:19:10] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 8 -nocpulock -checkpoint 3 -verbose -lifeline 596 -version 629'

[17:19:10] Connecting to http://171.64.65.54:8080/
[17:19:12] 
[17:19:12] *------------------------------*
[17:19:12] Folding@Home Gromacs SMP Core
[17:19:12] Version 2.19 (Mar 12, 2010)
[17:19:12] 
[17:19:12] Preparing to commence simulation
[17:19:12] - Ensuring status. Please wait.
[17:19:21] - Looking at optimizations...
[17:19:21] - Working with standard loops on this execution.
[17:19:21] - Previous termination of core was improper.
[17:19:21] - Going to use standard loops.
[17:19:21] - Files status OK
[17:19:22] - Expanded 1795892 -> 2078149 (decompressed 115.7 percent)
[17:19:22] Called DecompressByteArray: compressed_data_size=1795892 data_size=2078149, decompressed_data_size=2078149 diff=0
[17:19:22] - Digital signature verified
[17:19:22] 
[17:19:22] Project: 6012 (Run 2, Clone 319, Gen 125)
[17:19:22] 
[17:19:22] Entering M.D.
[17:19:28] Using Gromacs checkpoints
[17:19:30] Resuming from checkpoint
[17:19:30] Verified work/wudata_06.log
[17:19:30] Verified work/wudata_06.trr
[17:19:30] Verified work/wudata_06.edr
[17:19:31] Completed 44426 out of 500000 steps  (8%)
[17:21:55] Completed 45000 out of 500000 steps  (9%)
[17:43:53] Completed 50000 out of 500000 steps  (10%)
[17:55:19] Completed 55000 out of 500000 steps  (11%)
[17:59:29] Completed 60000 out of 500000 steps  (12%)
[18:03:21] Completed 65000 out of 500000 steps  (13%)
[18:07:11] Completed 70000 out of 500000 steps  (14%)
[18:11:02] Completed 75000 out of 500000 steps  (15%)
[18:14:57] Completed 80000 out of 500000 steps  (16%)
[18:18:52] Completed 85000 out of 500000 steps  (17%)
[18:22:44] Completed 90000 out of 500000 steps  (18%)
[18:26:36] Completed 95000 out of 500000 steps  (19%)
[18:30:23] Completed 100000 out of 500000 steps  (20%)
[18:30:45] Posted data.
[18:30:46] Initial: 0000; + Could not connect to Work Server (results)
[18:30:46]     (171.64.65.54:8080)
[18:30:46] + Retrying using alternative port
[18:30:46] Connecting to http://171.64.65.54:80/
[18:34:16] Completed 105000 out of 500000 steps  (21%)
[18:38:06] Posted data.
[18:38:06] Initial: 0000; Completed 110000 out of 500000 steps  (22%)
[18:38:07] + Results successfully sent
[18:38:07] Thank you for your contribution to Folding@Home.
[18:38:07] + Number of Units Completed: 15

[18:38:09] + Sent 1 of 1 completed units to the server
[18:38:09] - Autosend completed
[18:41:52] Completed 115000 out of 500000 steps  (23%)
[18:45:39] Completed 120000 out of 500000 steps  (24%)
.


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: 171.64.65.54 overloaded / NOT accepting

Post by noorman »

kasson wrote:server .54 is currently down for maintenance on a RAID. Our sysadmins are aware this is a time-critical issue, and we'll get this up as soon as we can. No ETA, though.

@noorman, glad it worked. The system logs were a bit weird--in the middle of a bunch of accepts and assigns, there were connection attempts from your IP with nothing to follow. I'm not sure what to make of that.
.

I had been trying to send those Results by using a shortcut with the -send all switch

Since that didn't do it either, I gave up and re-launched F@H to try and Fold some more,

BUT as someone else also reported, Folding seemed to get stuck when during Folding, an automatic send sequence was initiated !

Why would this happen and is this a BUG ?
I have never known a Folding run being 'sort of' paused for a try to upload previous Results from the queue ... :shock:

In those cases too, I then stopped the Client and restarted it shortly afterwards !


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
AlanH
Posts: 57
Joined: Mon Dec 03, 2007 9:54 pm

Re: 171.64.65.54 overloaded / NOT accepting

Post by AlanH »

My system returned a unit to this server yesterday but it is not shown as credited.

Code: Select all


[19:05:46] Folding@home Core Shutdown: FINISHED_UNIT
[19:05:47] CoreStatus = 64 (100)
[19:05:47] Unit 6 finished with 90 percent of time to deadline remaining.
[19:05:47] Updated performance fraction: 0.877645
[19:05:47] Sending work to server
[19:05:47] Project: 6060 (Run 0, Clone 4, Gen 80)


[19:05:47] + Attempting to send results [June 14 19:05:47 UTC]
[19:05:47] - Reading file work/wuresults_06.dat from core
[19:05:47]   (Read 3801645 bytes from disk)
[19:05:47] Connecting to http://171.64.65.54:8080/
[19:07:08] Posted data.
[19:07:09] Initial: 0000; - Uploaded at ~45 kB/s
[19:07:09] - Averaged speed for that direction ~22 kB/s
[19:07:09] + Results successfully sent
[19:07:09] Thank you for your contribution to Folding@Home.
[19:07:09] + Number of Units Completed: 87
Folding for TeamCFC
- Mac Pro Dual 2.66GHz Xeon, 4 GBytes running Mac SMP2 client
Mactin
Posts: 222
Joined: Sun Dec 02, 2007 1:08 pm
Location: Côte-des-Neiges, Montréal, Québec

Re: 171.64.65.54 overloaded / NOT accepting

Post by Mactin »

I've been trying to send work sinse last evening (10h00 Eastern, 02h00 GMT)

I came into work and saw :

Code: Select all

[02:12:19] Folding@home Core Shutdown: FINISHED_UNIT
[02:12:22] CoreStatus = 64 (100)
[02:12:22] Unit 7 finished with 86 percent of time to deadline remaining.
[02:12:22] Updated performance fraction: 0.865615
[02:12:22] Sending work to server
[02:12:22] Project: 6053 (Run 0, Clone 59, Gen 51)


[02:12:22] + Attempting to send results [June 15 02:12:22 UTC]
[02:12:22] - Reading file work/wuresults_07.dat from core
[02:12:22]   (Read 3799048 bytes from disk)
[02:12:22] Connecting to http://171.64.65.54:8080/
[02:12:43] - Couldn't send HTTP request to server
[02:12:43] + Could not connect to Work Server (results)
[02:12:43]     (171.64.65.54:8080)
[02:12:43] + Retrying using alternative port
[02:12:43] Connecting to http://171.64.65.54:80/
[02:13:05] - Couldn't send HTTP request to server
[02:13:05] + Could not connect to Work Server (results)
[02:13:05]     (171.64.65.54:80)
[02:13:05] - Error: Could not transmit unit 07 (completed June 15) to work server.
[02:13:05] - 1 failed uploads of this unit.
[02:13:05]   Keeping unit 07 in queue.
...
[13:10:34] Completed 480000 out of 2000000 steps  (24%)
[13:12:29] - Autosending finished units... [June 15 13:12:29 UTC]
[13:12:29] Trying to send all finished work units
[13:12:29] Project: 6053 (Run 0, Clone 59, Gen 51)

[13:12:29] + Attempting to send results [June 15 13:12:29 UTC]
[13:12:29] - Reading file work/wuresults_07.dat from core
[13:12:29]   (Read 3799048 bytes from disk)
[13:12:29] Connecting to http://171.64.65.54:8080/
[13:12:51] - Couldn't send HTTP request to server
[13:12:51] + Could not connect to Work Server (results)
[13:12:51]     (171.64.65.54:8080)
[13:12:51] + Retrying using alternative port
[13:12:51] Connecting to http://171.64.65.54:80/
[13:13:12] - Couldn't send HTTP request to server
[13:13:12] + Could not connect to Work Server (results)
[13:13:12]     (171.64.65.54:80)
[13:13:12] - Error: Could not transmit unit 07 (completed June 15) to work server.
[13:13:12] - 5 failed uploads of this unit.


[13:13:12] + Attempting to send results [June 15 13:13:12 UTC]
[13:13:12] - Reading file work/wuresults_07.dat from core
[13:13:12]   (Read 3799048 bytes from disk)
[13:13:12] Connecting to http://171.67.108.25:8080/
[13:13:15] Posted data.
[13:13:15] Initial: 0000; + Could not connect to Work Server (results)
[13:13:15]     (171.67.108.25:8080)
[13:13:15] + Retrying using alternative port
[13:13:15] Connecting to http://171.67.108.25:80/
[13:13:19] Posted data.
[13:13:19] Initial: 0000; + Could not connect to Work Server (results)
[13:13:19]     (171.67.108.25:80)
[13:13:19]   Could not transmit unit 07 to Collection server; keeping in queue.
[13:13:19] + Sent 0 of 1 completed units to the server
[13:13:19] - Autosend completed
Like I said befor my head does not care but my heart desparately care for the points that I'm loosing!

This is the reason that I HATE the bonus scheme. In the past, I could not care less about this, now I care a lot, because for every second that a PG server is down, PG takes points away. All the other positives go out the door.

Keep on folding
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: 171.64.65.54 overloaded / NOT accepting

Post by Grandpa_01 »

Just remember it is not just you loosing we all are. The server does not care who you are or who I am so we all get to loose equally. :lol:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
lanbrown
Posts: 104
Joined: Thu Jul 09, 2009 1:21 am

Re: 171.64.65.54 overloaded / NOT accepting

Post by lanbrown »

Mactin,

I agree. Points are being lost and there is no recourse. For the bonus scheme to actually work, they needed to change the collection method. Of the two ways I can think of off the top of my head.

1) Redundant servers. That could be harder than it sounds though and could require major code changes. A load balancer could be an option, but both servers would need to be in constant communication with each other so that each knows of every WU that has been assigned.

2) If the collection server is offline, then the client communicates through an encrypted session to another server or could even be the assignment server. In this secured session, a timestamp or hashkey is provided. This is added to the WU, when the server is back on-line, it gets sent and the server take the timestamp/hashkey into account of when the WU as actually completed. This prevents someone from changing the time on the machine to get higher bonus points.

There are times where servers are taken offline for maintenance and causes bonus point issues. Which the time lines set short for SMP units and are the only ones currently eligible for the bonus points, if maintenance is planned and the final deadline is six-days, then six-days before the maintenance is planned the assignment server should no longer be sending clients to that server. This gives every client the full amount of time to complete the WU and send it back before it times out. This also gives the admins as much time as they require to finish the work. The same should go to problems servers as well.
lanbrown
Posts: 104
Joined: Thu Jul 09, 2009 1:21 am

Re: 171.64.65.54 overloaded / NOT accepting

Post by lanbrown »

Grandpa_01 wrote:Just remember it is not just you loosing we all are. The server does not care who you are or who I am so we all get to loose equally. :lol:
Not true at all. Let's say it takes 26 hours to complete a WU and the server is down for 24-hours.

Contributor A get a WU 25 hours before the server goes down. So that means an hour before the WU is completed, the server goes down and will be down for a day.

Contributor B has a system with equivalent performance and gets a new WU an hour before the server goes down. The server will be back on-line before the WU is completed.

Contributor A loses bonus points, contributor B does not.
noorman
Posts: 270
Joined: Sun Dec 02, 2007 2:26 pm
Hardware configuration: Folders: Intel C2D E6550 @ 3.150 GHz + GPU XFX 9800GTX+ @ 765 MHZ w. WinXP-GPU
AMD A2X64 3800+ @ stock + GPU XFX 9800GTX+ @ 775 MHZ w. WinXP-GPU
Main rig: an old Athlon Barton 2500+ @2.25 GHz & 2* 512 MB RAM Apacer, Radeon 9800Pro, WinXP SP3+
Location: Belgium, near the International Sea-Port of Antwerp

Re: 171.64.65.54 overloaded / NOT accepting

Post by noorman »

lanbrown wrote:
Grandpa_01 wrote:Just remember it is not just you loosing we all are. The server does not care who you are or who I am so we all get to loose equally. :lol:
Not true at all. Let's say it takes 26 hours to complete a WU and the server is down for 24-hours.

Contributor A get a WU 25 hours before the server goes down. So that means an hour before the WU is completed, the server goes down and will be down for a day.

Contributor B has a system with equivalent performance and gets a new WU an hour before the server goes down. The server will be back on-line before the WU is completed.

Contributor A loses bonus points, contributor B does not.
.

It 's indeed all about timing !

The same goes for Stats; Stanford now has a 2 hour refresh rate (I believe) and f.e. EOC Stats stiil has the 3 hour cycle, which also skews data and points !

.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
Post Reply