Hi Everyone,
Windows updates causing WU's to fail, and strange PPD to be recorded
Ive noticed on rare occasions that when ive let windows restart the computer's while im folding, that the 0xa8 WU's have failed and after restart the 0x27 is showing im running at say something silly like 300,000,000 PPD and then drops down as the WU completes, depending on the anount of time left, but could end on 12,000,000 PPD which is not possible.
Im just wondering if the FAHclient is being forced to end by windows, after the client fails to respond to a request to close?
In the early hours of this morning windows decided to install a waiting update that i had ignored for a few days, and so it restarted, and killed my WU and FAHclient didnt restart until i logged in.
Note to self don't leave updates waiting more than a few days, and always pause folding before i restart a machine.
Many thanks,
David
Windows updates causing WU's to fail, and strange PPD to be recorded.
Moderators: Site Moderators, FAHC Science Team
-
muziqaz
- Posts: 2228
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580 - Location: London
- Contact:
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
PPD issue is known.
In regards to windows updates, you will have to control them better. When you see them lined up to be done, pause folding, update windows. Windows updates are annoying, I know, but they are also lethal for windows apps, as updates ignore them running and just restart the PC anyways
In regards to windows updates, you will have to control them better. When you see them lined up to be done, pause folding, update windows. Windows updates are annoying, I know, but they are also lethal for windows apps, as updates ignore them running and just restart the PC anyways
-
Joe_H
- Site Admin
- Posts: 8266
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
Yes, Windows Update is a known problem, and not just for F@h. There is code in F@h that is supposed to be observed by the Windows shutdown code, but frequently is not. The v8.5 public beta has improvements to that code, haven't seen wide reports on how much better that works.
Essentially this has been a problem with Windows Update and shutdown for decades. It ignores running apps signals to wait, even those using MS documented methods. Never mind autoupdating drivers in the middle of operations. Best practices has been to turn off autoupdates and manually run them periodically when apps can be shutdown cleanly first. Then check afterwards to make certain MS has not in their we know best manner turned autoupdating back on with one of the updates.
Essentially this has been a problem with Windows Update and shutdown for decades. It ignores running apps signals to wait, even those using MS documented methods. Never mind autoupdating drivers in the middle of operations. Best practices has been to turn off autoupdates and manually run them periodically when apps can be shutdown cleanly first. Then check afterwards to make certain MS has not in their we know best manner turned autoupdating back on with one of the updates.
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
Hi Everyone,
Would it be better to accept that if windows is going to force the Fahclient to close, that the software can save its current WU's, or have a somekind memoryfile in place so it could potentially recover.
I appreciate that there probably is a reason why fahclient is built as it is, but im just interested.
At 6 hours of processing a WU to have it fail in the middle is a loss, but from what i understand corrupted WU's is not helpful either.
Ive seen it mentioned that the fahclient is good at testing gpu cards after repair, because faulty gpus wont cope with folding.
Are the WU's failed by the Fahclient because it senses instability or corruption?
Many thanks.
David
Would it be better to accept that if windows is going to force the Fahclient to close, that the software can save its current WU's, or have a somekind memoryfile in place so it could potentially recover.
I appreciate that there probably is a reason why fahclient is built as it is, but im just interested.
At 6 hours of processing a WU to have it fail in the middle is a loss, but from what i understand corrupted WU's is not helpful either.
Ive seen it mentioned that the fahclient is good at testing gpu cards after repair, because faulty gpus wont cope with folding.
Are the WU's failed by the Fahclient because it senses instability or corruption?
Many thanks.
David
-
Joe_H
- Site Admin
- Posts: 8266
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
The client is in the process of doing just that when Windows comes along and just terminates the processes. That leaves files open and partly written, or corrupted by the time the restart is done. In the case of the GPU core processing, it writes out checkpoint files every so many steps. Those usually are useable for restarting, but not if the Windows process termination happens to coincide with that checkpoint. The CPU folding cores write a checkpoint every 15 minutes by default, and also start a checkpoint when paused. This works fine on Linux and macOS, but the code paths documented by MS to be checked on during a shutdown and cause the shutdown to wait on the finish are not being observed. Instead the shutdown just terminates the processes without waiting.
-
muziqaz
- Posts: 2228
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580 - Location: London
- Contact:
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
Look at FAH as video/3D rendering workload. What happens to your render/video if windows decides to just restart in the middle of rendering work? You have to start over from beginning, because the partly rendered file is unreadable due to untimely termination of the workload. I know FAH has checkpoints, but as Joe mentioned Windows uses sledgehammer to shutdown for the update.D.Record wrote: ↑Tue Nov 25, 2025 10:36 pm Hi Everyone,
Would it be better to accept that if windows is going to force the Fahclient to close, that the software can save its current WU's, or have a somekind memoryfile in place so it could potentially recover.
I appreciate that there probably is a reason why fahclient is built as it is, but im just interested.
At 6 hours of processing a WU to have it fail in the middle is a loss, but from what i understand corrupted WU's is not helpful either.
Ive seen it mentioned that the fahclient is good at testing gpu cards after repair, because faulty gpus wont cope with folding.
Are the WU's failed by the Fahclient because it senses instability or corruption?
Many thanks.
David
FAHClient itself does not do any heavy lifting. It just control FAHcores, which are actually simulating the protein folding. FAHcores are very sensitive to hardware instability, and it is very good at loading GPUs to their limits. However, please note, that FAH app should not be used as benchmarking, stress testing tool for OC attempts, or to find stable PC settings. It should be used after user makes sure that their hardware is 100% stable. I mean, FAH should not be the first line of benchmarking suite, only the final, once everything else shows hardware is stable.
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
There's an easy fix to that. Ideally you'd keep two sets of checkpoints and write to them in an alternating fashion (disk space should not be a concern, really), this way there's always at least one working checkpoint (after the first one, of course...
One could also write new checkpoints to a temp checkpoint file/s and then delete the old checkpoint, and rename the temp file/s to the proper name. Not as good a method, since there's still a chance windows could sneak in a restart between finishing the file write/s and the delete/rename step. Less security, and only "advantage" is keeping just one set of checkpoint file/s permanently on disk (you still need twice the space initially) - nobody should miss the storage space for having two sets of checkpoints at all times anymore, it's not the 1980s any longer.
-
calxalot
- Site Moderator
- Posts: 1713
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
It’s not just a matter of atomic checkpoints.
The client cannot distinguish a core being brutally killed from a crash of unknown cause.
The client cannot distinguish a core being brutally killed from a crash of unknown cause.
-
Joe_H
- Site Admin
- Posts: 8266
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: Windows updates causing WU's to fail, and strange PPD to be recorded.
The GROMACS code the CPU folding cores are based on used to support double checkpoints, no one used the feature. No idea if that feature still is available. However process recovery from an issue requiring use of a second checkpoint is more complicated and would increase the complexity of the client and core code. The decision was made years ago to not expend the single full time paid developer's effort in that direction for a rarely used feature.
Disk space was never a consideration.
The multi-step method you propose is also subject to being interrupted and leaving things in an inconsistent, I.e. for all intents corrupted, state.
Disk space was never a consideration.
The multi-step method you propose is also subject to being interrupted and leaving things in an inconsistent, I.e. for all intents corrupted, state.