Page 1 of 2

Creation of checkpoints taking over my CPU!

Posted: Mon Apr 21, 2025 6:36 am
by Schrödinger's cat
Why is it that my GPU creates a checkpoint every two percent, that requires almost, if not, 100% of my CPU? This makes watching videos painful, as they hang with a loud buzzing sound, while the checkpoint is created.

None of my RPI 5's create a single checkpoint.

Is it possible to set when a checkpoint is created, say once every 5%, or even disable the creation of checkpoints completely?

Re: Creation of checkpoints taking over my CPU!

Posted: Mon Apr 21, 2025 6:48 am
by arisu
It's not the checkpoint taking all the CPU, it's sanity checks that are being done to verify the accuracy of the work. They just happen to be done at the same time as checkpoints. They're important. You'll see things like this in the science.log file in the work folder:

Code: Select all

Completed 1800000 out of 2500000 steps (72%)
  Performance since last checkpoint: 11.30890052 ns/day
  Running tests
  All tests passed.
  Appending to XTC file positions.xtc
  Writing binary checkpoint
  Binary checkpoint complete. Cleared numRetries file.
It is between lines the "running tests" and "test passed" that all the CPU is being used. And that part shouldn't be disabled.

Re: Creation of checkpoints taking over my CPU!

Posted: Mon Apr 21, 2025 7:01 am
by calxalot
Checkpoint interval is set by researchers and cannot be changed. AFAIK

Re: Creation of checkpoints taking over my CPU!

Posted: Mon Apr 21, 2025 7:08 am
by arisu
calxalot wrote: Mon Apr 21, 2025 7:01 am Checkpoint interval is set by researchers and cannot be changed. AFAIK
For the GPU cores, it can't be changed without editing core.xml. But I imagine the core will detect that the file has been tampered with and the unit would probably get dumped. For CPU cores, it can be changed just by passing a flag to the core, but the client doesn't have a way to do that (anymore).

It's only the OpenMM GPU cores that have a long and slow sanity check during the checkpoint anyway.

Re: Creation of checkpoints taking over my CPU!

Posted: Mon Apr 21, 2025 2:35 pm
by muziqaz
If a GPU sanity check is causing video playback issues, I have to say that the computer is not fit for FAH. I am yet to see any issues caused by sanity checks, in all of my PCs

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 3:24 am
by arisu
OP, try to lock the GPU folding thread to just one core. On Linux you can do that with taskset. On Windows you can use something called Process Lasso. That will make it use only one core for the sanity check. It will slow down the sanity check but it will free up most of your CPU cores, and it shouldn't impact GPU folding in between checks.

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 4:41 am
by muziqaz
arisu wrote: Tue Apr 22, 2025 3:24 am OP, try to lock the GPU folding thread to just one core. On Linux you can do that with taskset. On Windows you can use something called Process Lasso. That will make it use only one core for the sanity check. It will slow down the sanity check but it will free up most of your CPU cores, and it shouldn't impact GPU folding in between checks.
That is not a solution for a broken system.
When your minute everyday tasks lag because CPU gets a bit loaded by some app, something is broken with the hardware or drivers.
Obviously, we would be able to guess better if OP posted system specs.
P.S. Just for sh*ts and giggles: last time I experienced something like that was back in very early 2000s, when by ATA HDD interface would drop from UDMA to PIO, and I would need to reset it in the HDD Properties :D That was on Athlon XP 1700, I think. Ever since SATA interface, I have never had anything like it

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 7:56 am
by Schrödinger's cat
Thanks for letting me know about the sanity check. It would be useful if that was logged, so one knew about it.
muziqaz wrote: Mon Apr 21, 2025 2:35 pm If a GPU sanity check is causing video playback issues, I have to say that the computer is not fit for FAH.
Ha ha! Diagnosing something without any details. Very useful muziqaz!
arisu wrote: Tue Apr 22, 2025 3:24 am On Windows you can use something called Process Lasso
I'm using that, and have all the fah-core's set to Real-time. That caused the issue. When watching videos, I'll just set the processor priority to High, instead of Real-time.

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 7:59 am
by muziqaz
Real time?
Why? 😲

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 8:08 am
by Schrödinger's cat
muziqaz wrote: Tue Apr 22, 2025 7:59 am Real time?
Why? 😲
To ensure folding has priority. Without Process Lasso, fahcore's had idle or background processor priority, making folding unnecessary slow.

Just remember sanity checks are undocumented, so how would I have adjusted one's system, to account for their effect on the system?

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 8:23 am
by calxalot
Running folding with real time priority is not something I remember anyone trying before.

Most people want it in the background.

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 9:12 am
by muziqaz
calxalot wrote: Tue Apr 22, 2025 8:23 am Running folding with real time priority is not something I remember anyone trying before.

Most people want it in the background.
People didn't try it because it is completely unnecessary. And mainly for the reason of this thread, where stuff goes pearshaped.

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 9:20 am
by foxpy
If you run a process with Realtime priority, then you should definitely expect audio hangs :)
And even if you pin fah core to a single cpu, it might still cause issues, because a Realtime process might interfere with audio-related DPC calls.

So, yeah, don't use Realtime. If you want to give fah priority, High is more than enough. And remember that modifying priority doesn't magically give fah more performance. It will just increase latency for interactive tasks like web browsing, etc. Everything interactive that runs under lower priority will still consume the same amount of resources it needs, just more slowly. And if you run long batch jobs other than fah, I would rather suggest dedicating less cpus to fah and less cpus for other jobs, so they don't have to compete with each other.

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 3:45 pm
by Schrödinger's cat
arisu wrote: Tue Apr 22, 2025 3:24 am OP, try to lock the GPU folding thread to just one core.
Fah core's average CPU use is 14.74%, which is more than one thread (12.5%), because of the sanity checks. So, locking fah core to one thread would slow down folding, by slowing down the sanity checks. Unless the sanity checks run in the background of folding, which I don't think is the case.

My understanding is that the process is this:
  • folding
  • stop/pause folding
  • checkpoint creation & sanity check
  • resume folding
  • repeat
Is this correct?

Re: Creation of checkpoints taking over my CPU!

Posted: Tue Apr 22, 2025 3:53 pm
by muziqaz
Yes, more or less.
Sanity checks can be multi threaded. It depends how researcher set up the project.