Page 5 of 5
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Sun Oct 04, 2020 8:27 pm
by Neil-B
Collection Servers are simply a backup mechanism - originally aimed at covering a server if it was taken down for some reason, but has also been used to cope with surges in comms .. WUs still need to make their way back to the original Work Server for processing - so unless best case is that they are never used (even if one has been set) .. The researchers choose whether to set one.
The existence or not of a CS has no relevance to importance of WUs or their priority.
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Sun Oct 04, 2020 11:11 pm
by bruce
When a server is initialized, it is given a list of potential Collection Servers with which it can make a connection. If connections to that/those server(s) fail, it can stop accepting WUs that might be uploaded to that CS. I didn't check the status of aws3foldingathome.org but if it went off-line or disk storage got full, it can cease being available as a CS until everything is working again and the primary WS is restarted.
There have been some recent changes to the lists of active servers but I can't point to a specific issue there. New servers are replacing old ones and unreliable cloud servers are being disabled but the primary list of work servers hasn't changed much.
As Neil said, a connection to a CS improves reliability but it isn't a necessity.
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 1:52 am
by Colonel_Klink
@bruce
Again thanks for the explanation. It looks as though aws3foldingathome.org has has recent problems that may have reoccurred. viewtopic.php?f=24&t=36103
If a server is both assigning new work and uploading completed work, does assigning new work have a higher priority than receiving completed work when the server has a heavy load??
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:34 am
by PantherX
Colonel_Klink wrote:...point me to a tutorial on how the collection and assignment servers work?...
Neil-B has provided a decent summary. However, if you want additional details and how everything fits, have a read here: viewtopic.php?f=18&t=17794
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:03 pm
by Colonel_Klink
@bruce
Thanks for the response, however I have read that post many times and do not see the answer to my question. If a collection server is not assigned to an assignment server, does the assignment server, when acting as it's own collection server, place priority on the assignment task or the collection task.
Also, last night several of the WU's for 140.163.4.231 that had not uploaded finally uploaded about 5 hours ago. When I looked to see if I received credit for the WU 11752 (0,6632,52) I see that someone else also received credit for this WU, who appears to have been issued the WU after I had completed the WU, and an attempt to upload connected to the server and the upload failed for some reason, but before it was finally uploaded
https://apps.foldingathome.org/wu#proje ... 632&gen=52 I believe this problem is one that you may be commenting on in a GitHub issue.
I still think that the aws3foldingathome.org server being shown as a failed collection server for both 140.163.4.231 and 60.170.111.50 is possibly causing the delay in uploading the completed WU's that I am concerned about. Now I wonder if aws3foldingathome.org is a contributing cause to the GitHub issue of the same WU's being assigned to multiple folders. Does the connection to a collection server set a flag that a WU has been completed and returned or does the flag get set after the WU has fully uploaded?
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:28 pm
by Joe_H
If your WU never attempts to connect to aws3, then it was never set to use that CS for uploading returns. So that would not have any effect on your upload. The aws3 server may have been removed, or never set, for the project the WUs came from as it is low on space and will probably be decommissioned as a WS in the near future as projects on it finish or the server arrangement with Amazon ends.
Priority on a WS is given to creation fo the next generation for a WU that has been returned, then handling incoming and outgoing connections. Given that most of the WU uploads you have problems with do eventually go overnight, I suspect some part of the network connection between you and 66.170.111.50 is getting saturated during the daytime and drops packet ACKs connected with your upload.
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:33 pm
by Neil-B
The 2nd folder was assigned the WU you had just after the 1 day timeout ... your completed WU was received by the WS a day after the other folder had returned it ... it is not when you complete the WU that is important rather when the WS receives it back .. So the sequence is:
Assigned to you .. 2020-10-03 09:17:29
Assigned to other folder ... 2020-10-04 09:22:34 ... just over 24hr after it was assigned to you
Returned by other folder ... 2020-10-04 13:12:07 ... please note that the column headings for returned and credited are switched
Returned by you ...2020-10-05 11:50:53
If a WU is sent to a CS then until it gets forwarded to the WS in question the WS does not know it has been returned and will reassign and reissue if the timeout passes ... but this isn't a CS/WS issue from what I can.
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:48 pm
by Colonel_Klink
Neil_B
I get your point and recognized it before. The issue I have is that my system was trying to upload multiple times before the WU's was issued to the other folder. The problem is that the same WU was issued to two folders before the timeout. Issuing the same WU to two folders does not make sense, unless the time out period has passed.
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 4:51 pm
by Neil-B
timeout was 24hrs for that WU ... the period between it being assigned to you and the next was seconds over 24hr and 5mins ... it wasn't reissued before timeout ... Date format it YYYY-MM-DD HH:MM:SS
Until the WS receives your completed WU it treats the WU as still outstanding and if Timeout passes will put the WU in the queue for reassignment ... Now I understand it is frustrating for you to have WUs that complete folding well before timeout and don't upload until after timeout - but the WS is doing precisely what it is meant to.
The resolution to this is not to change any behaviour on the WS (which is simply following due process) but to find and sort out whatever if causing your connections to the server to hang/drop/fail ... once your connections are uploading 1st time (as the vast majority of uploads should be) then yout WUs will be being returned before timeout and so the WS will not reissue.
If as I think you might be suggesting the WS should recognise you are trying to upload and therefore not reissue then the problem would be with that approach that your upload might continued trying to upload past timeout (1 day) all the way to expiration (8.2 days) until finally the client dumps the WU ... this would put intolerable delays into the progress of science ... the rule implemented is simply that is the WS has itself not received back the completed WU (either directly or via a CS) by the timeout it is queued for reissue to minimise the potential for further delays of unknown length (either totally lost WU or just a very slow processing).
Re: 54 failed attempts to upload to 66.170.111.50
Posted: Mon Oct 05, 2020 5:46 pm
by bruce
As far as the server recognizing when a WU has been returned is concerned, the server may have incoming WU results that have started to upload but have not yet been "received." It's not officially received until the upload finishes and the server notes the PRCG, the name/team/etc associated with the upload, and it gives it a timestamp indicating when it was received. Incomplete/partial uploads are dumped/ignored until the complete results file is received.
The same process is used on either the WS or a CS, so the recorded time that the upload was completed can be reported by the clock on a CS before the WU and the credit record are actually transferred from the CS to the WS. If that transfer of information has not been completed promptly, the WS won't know about it so a duplicate WU might be assigned, but the duration of that interval is generally quite short.
Since you're asking about a server congratulation without a CS, none of this matters. All WUs have to be returned to the WS from whence it came.