What happens when a work unit fails?

Moderators: Site Moderators, FAHC Science Team

Post Reply
iBozz
Posts: 89
Joined: Wed Nov 26, 2008 7:01 pm
Hardware configuration: iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD running under macOS Catalina v10.15.7 (19G2021)
Location: NW England, UK

What happens when a work unit fails?

Post by iBozz »

Recently, I've lost three or four(?) work units due to an ongoing problem with installing some software (not that it's relevant, but a TechTool eDrive on an i7 iMac which is being investigated by Micromat).

After recovering from a freeze and a restart after cutting the power on each of these occasions, I've noticed that the then current unit has been "deleted" and this set me wondering ...

1) How much inconvenience does the loss of a work unit make to the total project;

2) Is there more than one contributor working the same unit so that any one loss doesn't actually affect anything because another Folder is unlikely to lose the same unit; and

3) On some occasions, I think that the same unit has been restarted even though shown as deleted (same project etc. numbers) but on others a different unit has started. Why the difference and is there a setting or somesuch which will allow for the same unit to be restarted each time?

4) Is there a method of advising the project that a unit has been lost without them having to sit there waiting until after the final deadline?

I'm using the MacOSX "preference Panel" method of running F@H.

These are just idle thoughts, but can anyone satisfy my curiosity?

Thanks!
iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD, macOS Catalina v10.15.7
toTOW
Site Moderator
Posts: 6394
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: What happens when a work unit fails?

Post by toTOW »

1) the trajectory will be delayed by the time of the unit deadline, until it is reassigned.

2) there is only one contributor running a single WU unless something goes wrong (the WU passes its preferred deadline or the client reports the failure)

3) sometimes, it's just the checkpoint that got corrupted, so the WU is not lost and it restarts from scratch

4) nothing that you can really do. You can still report the WU here : viewforum.php?f=19 so we'll tell you if the WU is defective or if someone else has been able to complete it.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
iBozz
Posts: 89
Joined: Wed Nov 26, 2008 7:01 pm
Hardware configuration: iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD running under macOS Catalina v10.15.7 (19G2021)
Location: NW England, UK

Re: What happens when a work unit fails?

Post by iBozz »

Many thanks.
iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD, macOS Catalina v10.15.7
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: What happens when a work unit fails?

Post by bruce »

Additional facts:

Q1) How much inconvenience does the loss of a work unit make to the total project;
A1) the trajectory will be delayed by the time of the unit deadline, until it is reassigned.

The version 6 client sometimes reports the failure and sometimes it doesn't. If the client does report the failure, the servers can reassign the WU immediately rather than waiting for the WU to expire. One of the improvements in the upcoming V7 client is that more of the failures are reported, reducing the the impact of lost WUs like the ones you're describing. (There's always going to be zero progress on the trajectory for however long the WU was assigned to your machine, but not necessarily the additional time until it expires.

To complete a trajectory of a certain number of Gens, you have to add total of all the productive simulation time when somebody is working on a WU to the total of all of the lost time when somebody has the WU assigned to them but then it has to be reassigned. That lost time can be "expensive"

Q4) Is there a method of advising the project that a unit has been lost without them having to sit there waiting until after the final deadline?
A4) nothing that you can really do. You can still report the WU here : viewforum.php?f=19 so we'll tell you if the WU is defective or if someone else has been able to complete it.

Do report it if you suspect the WU may be defective. Don't bother to report it if you know why the failure happened (such as your installing that software or something else happening that can be fixed).
Post Reply