Page 2 of 2

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 4:14 pm
by bruce
If you look at serverstat, you'll see 171.64.122.70 classic VSP03 - accept DOWN - so it's perfectly reasonable that you won't be able to return a WU to that server. (I've got a WU on one of my clients waiting for the Pande Group to fix that server, too.)

Please note that the "no record of this WU" message is a Collection Server problem, not a Work Server problem. From your log:
Mactin wrote:[12:42:24] + Attempting to send results [July 24 12:42:24 UTC]
[12:42:26] - Couldn't send HTTP request to server
[12:42:26] + Could not connect to Work Server (results)
[12:42:26] (143.89.28.70:8080)
[12:42:26] + Retrying using alternative port
[12:42:28] - Couldn't send HTTP request to server
[12:42:28] + Could not connect to Work Server (results)
[12:42:28] (143.89.28.70:80)
[12:42:28] - Error: Could not transmit unit 04 (completed July 24) to work server.

[12:42:28] + Attempting to send results [July 24 12:42:28 UTC]
[12:42:34] - Server does not have record of this unit. Will try again later.
[12:42:34] Could not transmit unit 04 to Collection server; keeping in queue.

[12:42:34] + Closed connections
If the Work Server says it has no record of the WU, then the message is important (but that's not happening).

When a Collection Server says it has no record of the WU, ignore the message. That's a known problem. (If the client can't upload to a CS, that may be the only message that the client can give you.) I see no reason why the WU won't upload to the WS once it is repaired.

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 4:22 pm
by 7im

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 5:26 pm
by Mactin
Bruce,
Just look a little further down the log file.
Yes, at first it failed to connect, but then you will clearly see the the "no record of this WU" message from the WS.
I should have mentioned this at first, sorry.

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 5:27 pm
by Mactin
Thank you, but both WUs have not been credited.

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 5:40 pm
by 7im
Mactin wrote:Bruce,
Just look a little further down the log file.
Yes, at first it failed to connect, but then you will clearly see the the "no record of this WU" message from the WS.
I should have mentioned this at first, sorry.
Sorry, I don't see that message coming from the WS in the log posted on page 1. It only comes from the CS. Did I miss something?

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 6:13 pm
by Mactin
For the 2nd one :
[21:22:18] Folding@home Core Shutdown: FINISHED_UNIT
[21:22:21] CoreStatus = 64 (100)
[21:22:21] Sending work to server
[21:22:21] Project: 2969 (Run 10, Clone 64, Gen 8)
[21:22:21] + Attempting to send results [July 24 21:22:21 UTC]
[21:22:31] - Server does not have record of this unit. Will try again later.
[21:22:31] - Error: Could not transmit unit 09 (completed July 24) to work server.
[21:22:31] Keeping unit 09 in queue.

For the first one, I'm sorry, I did not include enough log. My mistake :
[12:36:18] Project: 2973 (Run 2, Clone 55, Gen 10)
[12:36:18] + Attempting to send results [July 27 12:36:18 UTC]
[12:36:23] - Server does not have record of this unit. Will try again later.
[12:36:23] - Error: Could not transmit unit 04 (completed July 24) to work server.
[12:36:23] + Attempting to send results [July 27 12:36:23 UTC]
[12:36:33] - Server does not have record of this unit. Will try again later.
[12:36:33] Could not transmit unit 04 to Collection server; keeping in queue.

Re: 143.89.28.70

Posted: Tue Jul 27, 2010 6:24 pm
by 7im
Okay, missed that 2nd part. Sorry.

That is a concern. Technically, I suppose if the hard drive had a bad sector or something, the record of that WU was lost when they scanned the disk. Or when they restored a backup, the back up was saved before your WU was sent out. Another reason can be that the WU was returned to a CS, and the CS flagged the WS to not accept that WU any more as being returned already.

When servers have issues like that, it's difficult to tell exactly what happened from only the fahlog.txt file. I assume that PG is already looking in to the problem, and they have access to server logs that hopefully explain this, so they can fix it.

Re: 143.89.28.70

Posted: Thu Jul 29, 2010 2:43 pm
by Mactin
Update:
Both WUs where updated during the night at 0637U and 0704U.
However, 7 hours later, neither has been credited.