Page 2 of 5

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 1:07 pm
by noorman
.

PM sent to Pande Group, V.P.


.

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 2:11 pm
by Foxbat
I think 171.67.108.21 needs a Boot to the Head!

Looking back, nVIDIA Clients the world over will refer to this as the St. Valentine's Day Points Massacre...

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 3:26 pm
by VijayPande
Thanks for the posts. It's early AM in Califorina (that's why this went unfixed for several hours), but I think we've got everything going again. I've contacted Joe regarding this issue: there was a WS bug.

I've also balanced the weights so the other NV WS's can get into the mix better and improve the redundancy.

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 3:38 pm
by noorman
.

Things seem to be coming 'online' again indeed; I restarted my GPU-client a few minutes ago and it got a new P10105 project from another server ...

It 's working again / thanks Professor Image


.

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 4:02 pm
by caintry_boy
VijayPande wrote:Thanks for the posts. It's early AM in Califorina (that's why this went unfixed for several hours), but I think we've got everything going again. I've contacted Joe regarding this issue: there was a WS bug.

I've also balanced the weights so the other NV WS's can get into the mix better and improve the redundancy.
Thanks Vijay!!!


:)

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 5:17 pm
by uncle fuzzy
My issue hasn't been a lack of work, but rather my completed WUs uploading to a black hole. Reviewing recent logs show no long idle periods. With very few exceptions, the WUs upload immediately. The problem is that I am running 4 quads with 9 GPU clients, each box getting between 15K and 28K PPD, but my total yesterday was 17K, rather than the "normal" 70K+. Today is following yesterday, point-wise. I seem to only be receiving the SMP2 points.

Is this server talking to the stats server?

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 6:10 pm
by subego
All are working now. Thanks much!

Re: 171.67.108.21 is Reject;

Posted: Mon Feb 15, 2010 8:44 pm
by noorman
uncle fuzzy wrote:My issue hasn't been a lack of work, but rather my completed WUs uploading to a black hole. Reviewing recent logs show no long idle periods. With very few exceptions, the WUs upload immediately. The problem is that I am running 4 quads with 9 GPU clients, each box getting between 15K and 28K PPD, but my total yesterday was 17K, rather than the "normal" 70K+. Today is following yesterday, point-wise. I seem to only be receiving the SMP2 points.

Is this server talking to the stats server?
.

Maybe something you ought to communicate to the Pande Group ...


.

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 1:22 am
by SandStar

Code: Select all

[23:40:32] + Attempting to send results [February 15 23:40:32 UTC]
[23:40:32] - Reading file work/wuresults_04.dat from core
[23:40:32]   (Read 167567 bytes from disk)
[23:40:32] Connecting to http://171.67.108.21:8080/
[23:47:10] Completed 68%
[23:47:24] - Couldn't send HTTP request to server
[23:47:24] + Could not connect to Work Server (results)
[23:47:24]     (171.67.108.21:8080)
[23:47:24] + Retrying using alternative port
[23:47:24] Connecting to http://171.67.108.21:80/
[23:50:33] - Couldn't send HTTP request to server
[23:50:33] + Could not connect to Work Server (results)
[23:50:33]     (171.67.108.21:80)
[23:50:33] - Error: Could not transmit unit 04 (completed February 15) to work server.
[23:50:33] - 6 failed uploads of this unit.
[23:50:33] - Read packet limit of 540015616... Set to 524286976.
[23:50:33] + Attempting to send results [February 15 23:50:33 UTC]
[23:50:33] - Reading file work/wuresults_04.dat from core
[23:50:33]   (Read 167567 bytes from disk)
[23:50:33] Connecting to http://171.67.108.26:8080/
[23:53:33] Completed 69%
[23:59:38] Completed 70%
[00:06:39] Completed 71%
[00:12:50] Completed 72%
[00:19:24] Completed 73%
[00:20:20] Posted data.
[00:20:20] Initial: 0000; - Uploaded at ~0 kB/s
[00:20:20] - Averaged speed for that direction ~18 kB/s
[00:20:20] - Server does not have record of this unit. Will try again later.
[00:20:20]   Could not transmit unit 04 to Collection server; keeping in queue.
[00:20:20] + Sent 0 of 1 completed units to the server
[00:20:20] - Autosend completed
I finally managed to get work from another server but .21 still isn't doing it for me...

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 2:09 am
by Foxbat
I, too, have a WU that I can't upload to 171.67.108.26:8080. When I try going to the port in Firefox, it just sits there. The Stats pages shows the server as Accepting, but the Net Load is 388. Maybe I just need to be patient...

Code: Select all

[01:53:06] + Attempting to send results [February 16 01:53:06 UTC]
[01:53:06] - Reading file work/wuresults_03.dat from core
[01:53:06]   (Read 65624 bytes from disk)
[01:53:06] Connecting to http://171.67.108.21:8080/
[01:53:08] Posted data.
[01:53:08] Initial: 0000; - Uploaded at ~32 kB/s
[01:53:08] - Averaged speed for that direction ~48 kB/s
[01:53:08] - Server does not have record of this unit. Will try again later.
[01:53:08] - Error: Could not transmit unit 03 (completed February 16) to work server.
[01:53:08] - 1 failed uploads of this unit.
[01:53:08]   Keeping unit 03 in queue.
[01:53:08] Trying to send all finished work units
[01:53:08] Project: 3470 (Run 24, Clone 149, Gen 0)
[01:53:08] - Read packet limit of 540015616... Set to 524286976.


[01:53:08] + Attempting to send results [February 16 01:53:08 UTC]
[01:53:08] - Reading file work/wuresults_03.dat from core
[01:53:08]   (Read 65624 bytes from disk)
[01:53:08] Connecting to http://171.67.108.21:8080/
[01:53:09] Posted data.
[01:53:09] Initial: 0000; - Uploaded at ~65 kB/s
[01:53:09] - Averaged speed for that direction ~51 kB/s
[01:53:09] - Server does not have record of this unit. Will try again later.
[01:53:09] - Error: Could not transmit unit 03 (completed February 16) to work server.
[01:53:09] - 2 failed uploads of this unit.
[01:53:09] - Read packet limit of 540015616... Set to 524286976.


[01:53:09] + Attempting to send results [February 16 01:53:09 UTC]
[01:53:09] - Reading file work/wuresults_03.dat from core
[01:53:09]   (Read 65624 bytes from disk)
[01:53:09] Connecting to http://171.67.108.26:8080/
[01:58:21] Posted data.
Oh, that can't be good... "Server does not have record of this unit. Will try again later."

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 2:51 am
by Foxbat
Foxbat wrote:Oh, that can't be good... "Server does not have record of this unit. Will try again later."
Looks like all is well now:

Code: Select all

[02:12:27] Trying to send all finished work units
[02:12:27] Project: 3470 (Run 24, Clone 149, Gen 0)
[02:12:27] - Read packet limit of 540015616... Set to 524286976.


[02:12:27] + Attempting to send results [February 16 02:12:27 UTC]
[02:12:27] - Reading file work/wuresults_03.dat from core
[02:12:27]   (Read 65624 bytes from disk)
[02:12:27] Connecting to http://171.67.108.21:8080/
[02:12:28] Posted data.
[02:12:28] Initial: 0000; - Uploaded at ~65 kB/s
[02:12:28] - Averaged speed for that direction ~54 kB/s
[02:12:28] + Results successfully sent
[02:12:28] Thank you for your contribution to Folding@Home.
[02:12:28] + Number of Units Completed: 3802

[02:12:28] + Sent 1 of 1 completed units to the server
[02:12:28] + Closed connections
Currently at 42% and Folding away!

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 1:32 pm
by SandStar
Still a problem here:

Code: Select all

[12:27:49] - Autosending finished units... [February 16 12:27:49 UTC]
[12:27:49] Trying to send all finished work units
[12:27:49] Project: 5785 (Run 3, Clone 52, Gen 17)
[12:27:49] - Read packet limit of 540015616... Set to 524286976.


[12:27:49] + Attempting to send results [February 16 12:27:49 UTC]
[12:27:49] - Reading file work/wuresults_04.dat from core
[12:27:49]   (Read 167567 bytes from disk)
[12:27:49] Connecting to http://171.67.108.21:8080/
[12:27:51] - Couldn't send HTTP request to server
[12:27:51] + Could not connect to Work Server (results)
[12:27:51]     (171.67.108.21:8080)
[12:27:51] + Retrying using alternative port
[12:27:51] Connecting to http://171.67.108.21:80/
[12:29:34] Completed 67%
[12:31:00] - Couldn't send HTTP request to server
[12:31:00] + Could not connect to Work Server (results)
[12:31:00]     (171.67.108.21:80)
[12:31:00] - Error: Could not transmit unit 04 (completed February 15) to work server.
[12:31:00] - 10 failed uploads of this unit.
[12:31:00] - Read packet limit of 540015616... Set to 524286976.


[12:31:00] + Attempting to send results [February 16 12:31:00 UTC]
[12:31:00] - Reading file work/wuresults_04.dat from core
[12:31:00]   (Read 167567 bytes from disk)
[12:31:00] Connecting to http://171.67.108.26:8080/
[12:35:49] - Couldn't send HTTP request to server
[12:35:49] + Could not connect to Work Server (results)
[12:35:49]     (171.67.108.26:8080)
[12:35:49] + Retrying using alternative port
[12:35:49] Connecting to http://171.67.108.26:80/
[12:35:50] - Couldn't send HTTP request to server
[12:35:50] + Could not connect to Work Server (results)
[12:35:50]     (171.67.108.26:80)
[12:35:50]   Could not transmit unit 04 to Collection server; keeping in queue.
[12:35:50] + Sent 0 of 1 completed units to the server
[12:35:50] - Autosend completed


Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 1:49 pm
by noorman
.

171.67.108.26 is CS6 and it is currently highly loaded on its Network.

Message on that has been sent (earlier), but the time difference means that Stanford is just awakening ...

.

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 2:11 pm
by noorman
.

CS6 is now in REJECT ...

EDIT: Accepting again ...


.

Re: 171.67.108.21 is Reject;

Posted: Tue Feb 16, 2010 4:36 pm
by bruce
Please watch for further updates here viewtopic.php?f=24&t=13474