Page 1 of 1
Unable to finish
Posted: Fri Apr 02, 2010 8:29 pm
by theteofscuba
If the WU does not meet deadline, does it send back partially completed WU?
I really hope so. Because getting back a partially completed WU is far better than it going to waste entirely.
Re: Unable to finish
Posted: Fri Apr 02, 2010 9:04 pm
by 7im
No.
If the work unit is so old as to have passed the final deadline, the value of that data is next to nothing. Please note, as described in the FAQs, that a work unit is reissued to a 2nd person when the preferred deadline has expired. So by the time your WU passes the final deadline, that other person has or is about to return that 2nd copy, fully completed. So your partial return, as I mentioned, is all but worthless.
Note the FAQs page is linked at the top of this forum page.
Re: Unable to finish
Posted: Fri Apr 02, 2010 9:35 pm
by theteofscuba
my thoughts:
it says that you can still send in results after the preferred deadline without credit.
i imagine that the server should put this unfinished WU back into the queue at the back of the list (fifo) once determined to be late.
if the client uploads the partially completed WU (before the final deadline), the server should remove that WU from the queue. then, the unfinished WU should be changed to pick up where the partially completed WU left off, and put back into the queue.
this is with the hope that the partially completed WU will be uploaded before the *entire* WU is issued to someone else!
EDIT: I suggest that every time the client performs a check point it should check if the WU is late and if so, stop the WU and send it in immediately.
EDIT 2: If the system based on Clones, Gen, Run is not be compatible with this suggestion, the client still has this checkpoint format to allow you to pick up where you left off. We could just upload our checkpoint data and let someone else finish it.
Re: Unable to finish
Posted: Sat Apr 03, 2010 2:25 am
by 7im
Sorry, WU handling doesn't quite work like that. There is no way to tell if the partial WU was caused by a WU problem, or by unstable hardware, or a communication problem.
It's better and safer to start the WU from the beginning to assure data integrity. Scientific projects depend on data integrity.
Re: Unable to finish
Posted: Sat Apr 03, 2010 5:11 am
by bruce
As far as the philosophy of FAH is concerned, you're talking about two independent issues.
You are asking about a client that upload a "partially completed WU" There is no plan for the client to do that under any circumstances. (Deadlines are a totally independent discussion.) The minimum unit of work is 100% of one WU. The only exception to this rule is when the WU contains an error and cannot be completed -- and then you're generally awarded partial credit for the amount of work that could actually be completed and uploaded along with a error code describing what went wrong.
Like 7im said, it's better and safer to start the WU from the beginning to assure data integrity and I'd also add that since the next WU will encompass the next unit of time, it's better and safer to end the WU at it's natural end.
Re: Unable to finish
Posted: Sat Apr 03, 2010 5:39 am
by theteofscuba
7im wrote:Sorry, WU handling doesn't quite work like that. There is no way to tell if the partial WU was caused by a WU problem, or by unstable hardware, or a communication problem.
It's better and safer to start the WU from the beginning to assure data integrity. Scientific projects depend on data integrity.
there is a way to know if a partial WU was due to being late -- you program it to know. I believe I suggested performing a check on deadlines whenever the automated checkpoint is done, which i believe is default to once every 15 minutes. if fahmon knows the deadline of a WU, then the folding@home client should know this too! so everytime a checkpoint is done, if you simply checked the current datetime and compared it to the WU's deadline, then you can know it is expired, so from that point on: FAH already has working code to create checkpoints for saving state. compress it and upload it. let someone else finish it for credit. unless you don't trust the checkpoint code to have any integrity. make sure the client has a way to notify the server that it was partially completed due to timeout if you have to, when it sends checkpoint data back.
proof of concept:
i actually have a uniprocessor client running right now. If I decided to quit FAH, I could simply terminate the process. I could simply .zip the entire directory and email it to someone else who can then finish the current WU where I left off. I could leave it at 90% complete, and i'm sure there are some competitive folks out there who would love to capitalize on a 90% completed WU for full credit, and/or the altruism of full science.
Re: Unable to finish
Posted: Sat Apr 03, 2010 6:00 am
by bruce
There's no doubt that your concept COULD BE made to work. It just doesn't fit Stanford's current philosophy, and frankly, I doubt that it ever will. (As a general policy, they don't want you manipulating FAH's important files.)
There are almost 400 000 active clients, each producing X WUs per week. If Stanford wrote a procedure that would enable people to email 90% of a WU to someone else and a policy that allowed those WUs to be accepted by the servers from someone to whom the WU had not been assigned, there might be a a few more WUs per week completed. The total increase in global FAH performance would be an insignificant percentage improvement. I'll bet that if someone at Stanford spent the same number of hours working on some relatively minor software improvement, it would probably produce a more significant increase in global FAH performance.
Re: Unable to finish
Posted: Sat Apr 03, 2010 6:32 am
by theteofscuba
I've aabandoned a few WUs in the past when making changes to my network. I'm sure there are some people out there who probably don't keep their computers running long enough to finish even the smallest WU, but that should still count for something, ja? some people might give the -bigadv program a chance and fail. some people might decide to quit FAH all together in the middle of a large WU, but would still like to submit the unfinished WU anyway.
it isn't a lot, but it might add up in the long run. as far as I am concerned, it is wasted science. I see this as an investment in the future more so than immediate gain. i just hope that the client code isn't so prohibitively expensive to work on. I'm not making any demands, but I want to make sure my ideas are heard and understood for those times of brainstorming in the development of FAH. I hope that in the future that the smallest contributions add up, especially for a project of this scale. I want to maximize the contributions I can get, even if it isn't much at all.
Re: Unable to finish
Posted: Sat Apr 03, 2010 10:22 pm
by Wrish
it isn't a lot, but it might add up in the long run. as far as I am concerned, it is wasted science.
That's not their standard. Whenever they code something sizable, the expected amount of science gained
must be substantial. Otherwise they are undertaking too much of an opportunity cost.
Rather than coding an elaborate partial unit upload and credit system, they could spend their time enhancing the stability or performance of an existing core that a bunch of people constantly run. In the meantime, all of us are told, for example, not to embark on -bigadv if we turn our systems off half the time. Rewarding those that break these recommendations would only slow down the relative science as more deadlines get necessarily extended.
Re: Unable to finish
Posted: Sun Apr 04, 2010 1:25 am
by theteofscuba
Wrish wrote:
Rewarding those that break these recommendations would only slow down the relative science as more deadlines get necessarily extended.
Under this suggestion, deadlines are -not- changed. Let me try to explain this better.
first, the FAH servers are occassionally going to scan a list of WUs that have been issued to the wild. they will be scanning for WUs that are past deadline.
when a WU fails to meet this deadline, the WU is put back into the assignment queue. AT THE BACK OF IT because the assignments should be reissued to the wild in the order they were added to the assignment queue.
Now, a preferred deadline has passed. the WU is back into the assignment queue.
Meanwhile, grandma turned on her computer after a few days of not using it. the FAH client gets started, FAH client detects if the deadline has passed -- it would be stupid for FAH to continue on a WU that is past deadline.
under this idea, the FAH client will take grandma's checkpoint, and upload it to the server. the FAH servers already put grandma's origional WU into the assignment queue.
now this is where this idea kicks in:
the WU that expired on grandma's computer hasn't been redistributed yet! there could be thousands of WUs that must be issued before grandma's WU will finally get reissued.
yet, the collection server has received her checkpoint! so the collection server would have the opportunity to remove the "entire" WU from the assignment queue before it is reissued.
then the WU with its checkpoint data available, we are now ready to issue it to someone else. the WU with the checkpoint data gets put back into the assignment queue.
rinse, repeat.
and one final note.
if grandma's WU gets reissued before checkpoint is returned, then there is no harm done.
under the current system it says that you can indeed submit the finished WU after the preferred deadline, but before the final deadline and get no points.
for now, I hope that the collection server already tries to remove the finished but late WU from assignment queue. because it would be be redundant to reissue a completed WU even if late.
Re: Unable to finish
Posted: Sun Apr 04, 2010 2:35 am
by bruce
theteofscuba wrote:when a WU fails to meet this deadline, the WU is put back into the assignment queue. AT THE BACK OF IT because the assignments should be reissued to the wild in the order they were added to the assignment queue.
Most of what you say makes reasonable sense . . . except the part that I've quoted. There's no reason why the assignment needs to be FIFO. Certainly a WU that has been delayed has every reason to be inserted onto the front of the queue (LIFO), even if the newly generated WUs which are derived from WUs that have been completed successfully are put on the back of the queue (FIFO).
That said, I do not believe that the queues for all projects are necessarily managed the same way. That would depend on the scientific need. (I can envision different cases where the most effective management methods would be different. Even so, I know very little about the scientific needs of FAH and enough about system management to be able to say that the PG knows more about the best ways to run their projects than I do.)
Re: Unable to finish
Posted: Sun Apr 04, 2010 5:28 am
by theteofscuba
Sure, there can be special exceptions to the rule. you could prioritize while still on first come first serve basis. there should be a good reason why you would prioritize one WU over any other other. and that isn't a major change to this suggestion.
when I go to the grocery store check out lane, I go to the back of the line because there were people there before me. and while maybe someone needs to skip ahead of me because of a life/death matter so we prioritize for that. but when we naturally go to the front of the check out line in LIFO order and there are more people getting in line faster than the cashier can make a sale for,(analogous to new WUs being added faster than they are issued in LIFO) then the people at the back of the line get starved of any chance of getting rang up in a fair amount of time. they were there first so why is it that they are waiting longer than most other people? LIFO isn't really fair in that sense, despite the law of the land demanding some very specific exceptions for skipping ahead in line.
in my grocery store analogy it isn't perfect. imagine if I'm in line at the grocery store. I'm with my wife, and we realize we forgot something. my wife can go get the item we forgot, before we get to the cashier so we won't lose our place in line. in this analogy, lets just say that by forgetting an item on our list, is that we didn't finish our WU in the time we had before we got in the checkout line.
if my wife gets back to me before I get to cashier, then the cashier will take our checkpoint data and put it into the assignment queue. my wife getting back to me is analogous to receiving the checkpoint data.
for the case for when my wife doesnt find the item by the time i get to the cashier, I have to put all my items back on the shelf, leave the store, then re-enter to try it all again FRESH, which is analogous to reissuing the entire WU to a new person. (although it would be more analogous to having someone else do my shopping for me)
Getting to the cashier is analogous to when a FAH user requests a work unit. the cashier gives the WU based on people like me and my wife providing checkpoint data or not.
this worst case scenario isn't any worse than the way FAH works already. it will either be *better* in some cases, and in others, it will be the same as it has been already.
I hope this helps.