Page 1 of 4
Lost Time
Posted: Mon Jun 23, 2008 5:56 am
by Tom
I hope this is the right place to post this topic, looked like a good spot to me.
I have been wondering why we couldn't be working on the next work unit while the finished one uploads? There is a lot of wasted time every day waiting, and uploading is a slow process compared to downloading. Is there a way to make this work? It would allow more work to get done in the same time period.
Re: Lost Time
Posted: Mon Jun 23, 2008 9:56 am
by toTOW
I have the same feelings ... it's even more important when the WU are short (many uploads and downloads) or need to send or download huge files (I'm thinking about SMP WU and some BigWu too) ...
It must be easy to reverse the process : download a WU and then send the results back while working on the new one (instead of send result, get a new WU, process it).
If it's not possible to change this behavior, another solution would be to take upload and download time into account when benchmarking the WU.
Re: Lost Time
Posted: Mon Jun 23, 2008 1:28 pm
by codysluder
I don't think that Stanford can take the upload and download time into account. Are you proposing that FAH somehow compensate everyone for their network time at maybe 1 Gb (which is essentially zero) while the rest of us are somewhere between 47 KB (modem) for some people and maybe 500/1500 KB or better for others?
Re: Lost Time
Posted: Mon Jun 23, 2008 4:17 pm
by 7im
Good idea. It's been suggested many times before. However, the best time to have added that feature was 4 years ago when everyone was still on dialup and connection speeds were slow. Now that everyone is moving to faster and faster broadband, the advantage gets to be less and less, and the idea slowing falls lower and lower on the priority list for new developments. Stanford has limited resources, so they have to set priorities.
There used to be some counter-arguements too, but I don't remember what they were, or even if they still apply.
Re: Lost Time
Posted: Mon Jun 23, 2008 4:25 pm
by Tom
I wasn't thinking about compensation, points are paid for work done. I was just wondering if it was possible to streamline the process to allow work to continue through the upload process. I don't know how the upload process works (If it needs more computer resources than just a file transfer) but I thought it would help the effort if we could eliminate the lost time.
Truly the point system is a great tool. It allows Stanford to direct the flow by awarding more points where the work is needed most. It also gives people a chance to see what they have done and combine their efforts with teams adding fun competition. But honestly if the points went away tomorrow, I would still be folding. In the end, it's about building a better future. The more work we can get done, the closer we get to that goal.
Re: Lost Time
Posted: Mon Jun 23, 2008 6:12 pm
by gwildperson
One very simple change would help a lot: Download a new wu BEFORE uploading the result. Then we can get started on the next assignment while the upload is processed and if the upload fails two or three times. we don't lose as much time. Downloads are often small, compared to uploads, too, and my DSL download speed is higher than my upload speed.
Re: Lost Time
Posted: Mon Jun 23, 2008 7:50 pm
by MstrBlstr
It doesn't work that way folks.
As it is set up, and has been for a long time.
I will explain it as simply as I can.
They (the servers) need the unit that you crunched back first. And there is more than one reason for this.
If you don't send back the one that the AS last assigned you (on that system), you will be assigned the same unit again. The exception to this , is if the client reports that the unit was completed, but is holding it in your queue, due to the fact that it could not connect to the WS or CS to upload the completed unit.
After this, I would assume that there is some sort of communication to the effect of "OK got prior unit returned, send new work unit", or "Holding prior completed unit for upload later, send new work unit."
I could be way off base here. But what you are suggesting, would be to circumvent the checks that are in place to make sure that they get the completed assigned work back first. Which defeats the purpose of having them in the first place.
As I stated above, there is more than one reason for these checks to be in place. But, I will not go into the details of the others.
Re: Lost Time
Posted: Mon Jun 23, 2008 7:58 pm
by 7im
Aha, one of those counter-arguments I mentioned... I remember now. There's no point in sending out a new WU if the last WU you completed is actually screwed up because of too much overclocking or whatever. There is actually a small benefit in waiting so see how the previous WU turns out before sending out the next WU.
Re: Lost Time
Posted: Tue Jun 24, 2008 12:01 am
by MrVTEC
Judging by the simplified process explained above, it seems possible since a user can in fact get a new WU before sending the results of the current one. Why not just have the client request a new WU at 99% and tell the server that the next one is waiting in queue, when in reality it's only 99% complete or even 99.5%. With broadband connection though, I can't see this time loss being too much unless you have hundreds of boxes.
Re: Lost Time
Posted: Tue Jun 24, 2008 1:14 am
by Tom
Ah, I see there is a reason why it works the way it does.
Thanks for answering my question, I thought it was worth asking.
Re: Lost Time
Posted: Tue Jun 24, 2008 4:03 am
by jrweiss
7im wrote:Aha, one of those counter-arguments I mentioned... I remember now. There's no point in sending out a new WU if the last WU you completed is actually screwed up because of too much overclocking or whatever. There is actually a small benefit in waiting so see how the previous WU turns out before sending out the next WU.
Should be fairly simple to have the event that triggers the "FINISHED UNIT" message (or any other "cleanup chore" if applicable) in the log also trigger a "download new WU first" switch in the client. After all, after the FINISHED UNIT message is triggered, the client will still download a WU if the old WU is, for any reason, unable to be immediately uploaded...
Re: Lost Time
Posted: Tue Jun 24, 2008 12:01 pm
by noorman
I think the crux is someplace else;
Since I 've re-joined Linux SMP (with v6 client), I 've encountered a few WU finishes (at 100%) where the client then just stops its natural sequence and just halts.
It 's still running, but not. You need to stop it with CTRL+C, then run the repair sequence, Qfix, delete queue entry, then Qfix again ...
I just did 2 of those repairs on both my SMP rigs !
It 's very very odd that the sequence for the queue entry delete runs for about exactly 4 minutes; sounds familiar.
The waiting time after WU finish, after the upload of results, before a new WU is downloaded is ... the same amount of time !
There 's a bug somewhere; they just don't deem it important enough to fix it.
PROOF: ---> After you 've done the above repair sequence, you just restart the client; it will indicate that results are going to be sent back, in the seconds following that message, a WU is downloaded and started.
After the upload has finished, you get to see the message confirming the successful upload and the "Thank You ..." message too (as usual).
In the last case, upload & download can happen simultaneously ! ! !
Maybe a good pointer for the Coders !
Re: Lost Time
Posted: Tue Jun 24, 2008 2:14 pm
by bruce
noorman wrote:There 's a bug somewhere; they just don't deem it important enough to fix it.
Oh, come now. It's not a question of importance, it's a question of reproducibility. Every time they test their fix, it's going to work correctly but then when it gets out in the field, it's going to fail 1% of the time (or however often it fails now.)
They can't fix a bug that doesn't happen when they test it.
If you can demonstrate a reproducible method to make this happen, they'd be glad to fix it -- and quickly, I suppose.
Re: Lost Time
Posted: Wed Jun 25, 2008 5:47 pm
by leexgx
lost time
pre downloading maybe not good idiea but what should happen is when we hit 100% it should send and download the next unit at the same time (or at least download next work unit then send the compleated one once download has finished) so once it has downloaded it it can start working on that project, if the project failed and needs sending back pre downloading should not be used (basicly use the FINISHED_UNIT for download or core error to not start it send only just in case its failing all the time or Pause is used or -oneunit)
SMP it should be be done and some single console projects as some of them are 25MB+ in size to Send back to the server wasted time waiting for it to send the project back could of been used folding next project, ignoreing points you get if you add up all that lost time idleing waiting for it to send the unit (that mite fail if its very big) thats alot of folding time that could of been used, all
running 2 clients easy fix to this problem but you should not need to do that and it makes the project times longer to return back to the server
Re: Lost Time
Posted: Wed Jun 25, 2008 8:25 pm
by 7im
It's more helpful to the project to get completed work unit back to them for study before downloading the next WU. Delaying the upload until after the download adds a small delay, but repeated many times over, the delay becomes very large. Even sharing the bandwidth with concurrent uploads/downloads is a small delay.