Page 1 of 1

What happens when a work unit fails?

Posted: Sun Mar 18, 2012 3:18 pm
by iBozz
Recently, I've lost three or four(?) work units due to an ongoing problem with installing some software (not that it's relevant, but a TechTool eDrive on an i7 iMac which is being investigated by Micromat).

After recovering from a freeze and a restart after cutting the power on each of these occasions, I've noticed that the then current unit has been "deleted" and this set me wondering ...

1) How much inconvenience does the loss of a work unit make to the total project;

2) Is there more than one contributor working the same unit so that any one loss doesn't actually affect anything because another Folder is unlikely to lose the same unit; and

3) On some occasions, I think that the same unit has been restarted even though shown as deleted (same project etc. numbers) but on others a different unit has started. Why the difference and is there a setting or somesuch which will allow for the same unit to be restarted each time?

4) Is there a method of advising the project that a unit has been lost without them having to sit there waiting until after the final deadline?

I'm using the MacOSX "preference Panel" method of running F@H.

These are just idle thoughts, but can anyone satisfy my curiosity?

Thanks!

Re: What happens when a work unit fails?

Posted: Sun Mar 18, 2012 3:30 pm
by toTOW
1) the trajectory will be delayed by the time of the unit deadline, until it is reassigned.

2) there is only one contributor running a single WU unless something goes wrong (the WU passes its preferred deadline or the client reports the failure)

3) sometimes, it's just the checkpoint that got corrupted, so the WU is not lost and it restarts from scratch

4) nothing that you can really do. You can still report the WU here : viewforum.php?f=19 so we'll tell you if the WU is defective or if someone else has been able to complete it.

Re: What happens when a work unit fails?

Posted: Sun Mar 18, 2012 4:16 pm
by iBozz
Many thanks.

Re: What happens when a work unit fails?

Posted: Mon Mar 19, 2012 4:48 am
by bruce
Additional facts:

Q1) How much inconvenience does the loss of a work unit make to the total project;
A1) the trajectory will be delayed by the time of the unit deadline, until it is reassigned.

The version 6 client sometimes reports the failure and sometimes it doesn't. If the client does report the failure, the servers can reassign the WU immediately rather than waiting for the WU to expire. One of the improvements in the upcoming V7 client is that more of the failures are reported, reducing the the impact of lost WUs like the ones you're describing. (There's always going to be zero progress on the trajectory for however long the WU was assigned to your machine, but not necessarily the additional time until it expires.

To complete a trajectory of a certain number of Gens, you have to add total of all the productive simulation time when somebody is working on a WU to the total of all of the lost time when somebody has the WU assigned to them but then it has to be reassigned. That lost time can be "expensive"

Q4) Is there a method of advising the project that a unit has been lost without them having to sit there waiting until after the final deadline?
A4) nothing that you can really do. You can still report the WU here : viewforum.php?f=19 so we'll tell you if the WU is defective or if someone else has been able to complete it.

Do report it if you suspect the WU may be defective. Don't bother to report it if you know why the failure happened (such as your installing that software or something else happening that can be fixed).