Page 2 of 2
Re: Quadro NVS 160M Failing
Posted: Wed Jul 25, 2012 6:11 pm
by gwildperson
A WU can remain in the work queue if there's a problem sending it. Your best option is to post the portion of the log near where 5769 (7, 91, 4467) finished and started trying to send.
The "Unknown" credit problem should be ignored. You'll still get the right number of points when the WU is uploaded, but the server never informed the client what that number would be. A server upgrade will fix that someday.
Re: Quadro NVS 160M Failing
Posted: Thu Jul 26, 2012 2:05 pm
by AYColumbia
Here's the first time it occurs in the log:
Code: Select all
21:00:29:WU01:FS01:FahCore 0xa4 started
21:00:29:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5769 run:7 clone:91 gen:4467 core:0x11 unit:0x344702ac50086e8e1173005b00071689
21:00:29:WU00:FS00:Uploading 6.46KiB to 171.67.108.11
21:00:29:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
21:00:29:WU00:FS00:Connecting to 171.67.108.11:8080
21:00:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
21:00:29:WU00:FS00:Trying to send results to collection server
21:00:29:WU00:FS00:Uploading 6.46KiB to 171.67.108.25
21:00:29:WU00:FS00:Connecting to 171.67.108.25:8080
21:00:29:WU02:FS00:News: Welcome to Folding@Home
21:00:29:WU02:FS00:Assigned to work server 171.67.108.21
21:00:29:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:"G98M [Quadro NVS 160M]" from 171.67.108.21
21:00:29:WU02:FS00:Connecting to 171.67.108.21:8080
Here's the last entry (there were retry entries like this between them):
Code: Select all
11:01:56:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5769 run:7 clone:91 gen:4467 core:0x11 unit:0x344702ac50086e8e1173005b00071689
11:01:56:WU00:FS00:Uploading 6.46KiB to 171.67.108.11
11:01:57:WU00:FS00:Connecting to 171.67.108.11:8080
11:01:57:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
11:01:57:WU00:FS00:Trying to send results to collection server
11:01:57:WU00:FS00:Uploading 6.46KiB to 171.67.108.25
11:01:57:WU00:FS00:Connecting to 171.67.108.25:8080
11:01:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
11:01:58:WU00:FS00:Connecting to 171.67.108.25:80
11:01:59:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.25:80: No connection could be made because the target machine actively refused it.
Re: Quadro NVS 160M Failing
Posted: Thu Jul 26, 2012 5:51 pm
by bruce
I'm not sure anything can be done about this WU. It was reissued and has been successfully completed by others, so the original WU was not corrupt. As far as determining whether the problem was introduced by something that happened on your system (such as your 160M) or by something that happened on the server, I simply do not know. The "Failed to read stream" isn't a message I fully understand. I guess we'll just have to forget about this one and see if it happens again to either you or others.
If it still shows in your Work Queue, I would either start ./FAHClient --dump 00 or delete the 00 folder inside of work but that's not particularly important.
Re: Quadro NVS 160M Failing
Posted: Thu Jul 26, 2012 7:28 pm
by AYColumbia
Ah, thanks for the tip on getting rid of it. BTW, how quickly does something get reassigned? Sometimes I pause the work process or I may need to reboot my laptop. Does it get reassigned during these short windows?
Re: Quadro NVS 160M Failing
Posted: Thu Jul 26, 2012 7:41 pm
by bruce
A pause or a reboot have nothing to do with reassigning anything since the server has no knowledge of that.
1) If the client uploads a successful result, the project moves on. 2) If the client uploads an error report or a report that the WU has been deleted, the same WU is reassigned quickly. 3) If somehow the WU simply disappears (e.g.- reformat/reinstall OS and FAH, or whatever happened to you) the server waits until the Timeout (Preferred Deadline) to decide it's probably not going to see a result from the machine it was assigned to and reassigns it to someone else. That's why the QRB is based on the Timeout. The extra delay waiting for the WU to expire plus the cost of having someone else process the lost WU adds up over time and it's nice to avoid it whenever possible.
Re: Quadro NVS 160M Failing
Posted: Fri Jul 27, 2012 11:56 am
by AYColumbia
Thanks bruce.