It can be RAM, CPU memory controller, but most likely GPU, GPU mem controller, VRAM, GPU VRM.
If it was Linux, we could easily blame FAHcores, but in windows things are relatively stable on that front.
F@H pausing itself?
Moderators: Site Moderators, FAHC Science Team
Re: F@H pausing itself?
I upped the GPU voltage by 6mV to 963mV@1801MHz and its been stable for ~50WU over the last few weeks. So apparently my perfectly stable GPU wasn't as perfectly stable as I had thought. Interesting that F@H seems to be more sensitive than any other usage.
-
- Posts: 1544
- Joined: Sun Dec 16, 2007 6:22 pm
- Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX 550 640SP - Location: London
- Contact:
Re: F@H pausing itself?
It is not sensitive, it is just properly loading your hardware
Re: F@H pausing itself?
Mild instability when playing video games will cause artifacts that you probably won't notice. Mild instability when folding can cause mistakes in the simulation that can make it converge to an impossible or unrealistic state that will be caught by sanity checks (in this case the position of a particle has become NaN). Folding doesn't make a system less stable, but it will catch small instabilities that other usages will not.
I think this means that the incorrect calculation happened before the last checkpoint and the bad simulation state was saved to the checkpoint. When it tried to resume the checkpoint with the bad data, it converged into a state where a particle's position was NaN (an invalid floating point number). It retried twice and reached that state each time, so the core concluded that the checkpoint itself had bad data (which was probably true).
Code: Select all
05:04:10:I1:WU145:An exception occurred at step 17067: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan
05:04:10:I1:WU145:Max number of attempts to resume from last checkpoint (2) reached. Aborting.