Page 2 of 2

Re: 171.64.65.20 and 171.67.108.25

Posted: Fri Jan 16, 2009 11:48 pm
by paulwarden
when i checked my router i found
2009-01-16T23:13:12Z info src=171.64.65.20 dst=81.159.137.24 ipprot=1 icmp_type=3 icmp_code=10 ICMP Dest Unreachable, session terminated
2009-01-16T23:15:22Z info src=171.64.65.20 dst=81.159.137.24 ipprot=1 icmp_type=3 icmp_code=10 ICMP Dest Unreachable, session terminated

its funny that there should be a problem sending results when i was having no trouble what so ever last weekend so there must be a problem at your end not mine think i will stop folding til everything is back up and running again which is a shame as i had just ordered another xfx 9600 gso graphics card to go alongside my other one for folding

Re: 171.64.65.20 and 171.67.108.25

Posted: Sat Jan 17, 2009 9:49 am
by Fuzzy-Felt Bloke
I have had my SMP machines turned off for the past 4 days ran them again, and still not uploading. Its not an individual client problem it is a server problem.

I've turned them off again.

Fuzzy-Felt Bloke :D :D

Re: 171.64.65.20 and 171.67.108.25

Posted: Sat Jan 17, 2009 9:58 pm
by eberlyml
I am continuing to receive work from this combination of work/collection servers: 171.67.108.12 and 171.67.108.25. Why does the work server continue to give out work that will not be received back? -queueinfo included below. See the number of failed uploads! (a lot of them were manual tries) for more than 2 weeks now. Am I the only one having so much trouble? Could it be something clientside here?


Slot 05 Done
Project: 5113 (Run 83, Clone 38, Gen 6), Core: a0
Work server: 171.67.108.12:8080
Collection server: 171.67.108.25
Download date: January 3 07:08:22
Finished date: January 7 12:44:27
Failed uploads: 119

Slot 06 Done
Project: 5113 (Run 96, Clone 0, Gen 10), Core: a0
Work server: 171.67.108.12:8080
Collection server: 171.67.108.25
Download date: January 7 15:24:19
Finished date: January 11 21:05:19
Failed uploads: 77

Slot 07 Done
Project: 5113 (Run 88, Clone 90, Gen 11), Core: a0
Work server: 171.67.108.12:8080
Collection server: 171.67.108.25
Download date: January 11 21:05:26
Finished date: January 16 04:27:04
Failed uploads: 36

Re: 171.64.65.20 and 171.67.108.25

Posted: Fri Feb 27, 2009 4:55 pm
by bruce
eberlyml wrote:I am continuing to receive work from this combination of work/collection servers: 171.67.108.12 and 171.67.108.25. Why does the work server continue to give out work that will not be received back?
No, it's not a client problem, and the "will not be received back" is incorrect.

The client is designed to return the WU to the same server that issued it. If, for some reason, that server is down or is excessively busy, the V4 client was designed to hold the WU in queue and retry until the server came back up or the temporary overload dissipated. V5 and V6 also allow for a Collection Server which can accept the WU under whatever conditions cause the upload to the primary Work Server to fail.

First, the total workload has continued to grow while no new servers have been added, so many of the servers are now operating at full capacity and the overload conditions are no longer temporary. Second, (see the announcements) a couple of critical servers are down, which redirects a lot more work to the Collection Servers.

The plan to provide a permanent solution depends on the new server code (see the News Blog) which has been in testing for a while now. It provides a very significant increase in the number of connections that a given server can handle and I expect that the permanent overload conditions will be alleviated. I'm not aware of the details of the roll-out of the new server code, but I expect it soon. In the meantime, every time your client tries to upload, there's a statistical chance that someone else just finished uploading so you'll be successfully connected.

Servers SHOULD be assigning new work, even if the previous WUs have not yet uploaded. The client's queue is designed to hold up to 9 completed WU and it might be able to upload all of them at one time, depending on the server status. If you do NOT get new work, your computer will be idle waiting for the upload of the first WU and you might be up to 8 WUs farther behind when that happens.

I'm sure the Pande Group is doing all they can to correct these issues. In the meantime, all you can do is be patient.

Re: 171.64.65.20 and 171.67.108.25

Posted: Fri Feb 27, 2009 7:21 pm
by MtM
Not sure if this is would be an 'approved' idea, but I'm thinking in these cases you might create a shadow copy of the client + work folder, then delete queue.dat and work folder in either the original location or the shadow copy location allowing to process some new wu's while keeping tab on the servers status. When the collection/work server goes back up, start the client which has the backlog off completed wu's with sendall to force sending all completed wu's?

Would mean you can process some new work while waiting, idk how PG would look at this though?

Re: 171.64.65.20 and 171.67.108.25

Posted: Sat Feb 28, 2009 7:09 pm
by bruce
MtM wrote:Not sure if this is would be an 'approved' idea, but I'm thinking in these cases you might create a shadow copy of the client + work folder, then delete queue.dat and work folder in either the original location or the shadow copy location allowing to process some new wu's while keeping tab on the servers status. When the collection/work server goes back up, start the client which has the backlog off completed wu's with sendall to force sending all completed wu's?

Would mean you can process some new work while waiting, idk how PG would look at this though?
I don't see how that serves any purpose. You can already process new work while waiting. How do you suppose the second and third "Done" WUs listed by eberlyml above got processed?

As I said, up to 9 WUs can be waiting to upload before there's any real problem except the messages which tend to unnecessarily cause irritation to the people who see them. By making another copy, you'll avoid seeing the messages but then if the server does come on line while the donor is sleeping or at work or simply not checking on the status of the server, they might not get uploaded as soon. And you would also be excluding the (slim) chance that the client will be able to upload to the Collection Server before the primary server is repaired.

Is anybody approaching the 9 "Done" WU limit? That's the only case where making a shadow copy would be useful.

The first order of business is to fix the server but just a simple fsck takes days, not hours.