[Solved, kinda] Strange crash/reboot and CMOS corruption only with F@H

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by MeeLee »

1- might still be a psu issue. Could be a bad cap or voltage regulator. Or perhaps some dust, or a broken cable lead.
2- I would start by setting your CPU into ECO mode. Then install or configure your PC to limit GPU power, and see if it runs fine.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by toTOW »

Did you try to heavily underclock the CPU ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@FrankMB
Yes, my experience has been very similar. The silent reboot happens only in F@H. Nothing else I tried causes it. The silent reboot happens if I fold with only the GPU, only the CPU or CPU+GPU. It also behaves the same regardless if I fold in Windows or Linux. As have you, I upgraded every component in my system, so I know it's not hardware. I also got the largest liquid AIO I could find, with 3 140 mm fans. That doesn't make any difference either, but does give very low temps on everything else. I hope this starts getting noticed by the devs, since it's not affecting just me. As a software / firmware engineer with 38 years of experience, I know a software bug when I see one.
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@MeeLee, @ToTow
It's not a power supply issue. I had an EVGA Supernova 1600 P2, and I upgraded it to the most powerful PSU I could find, the Corsair AX1600i. No difference.
It's not a CPU issue. I had an AMD Threadripper 1950X and replaced it with a 5950X. No go.
Let me say this again, the silent reset only happens when using F@H. If you follow the thread, you will see I tried many different stress tests for several days without problems.
This is a software issue.
FrankMB
Posts: 7
Joined: Sat Mar 12, 2022 12:38 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by FrankMB »

MeeLee wrote:1- might still be a psu issue. Could be a bad cap or voltage regulator. Or perhaps some dust, or a broken cable lead.
2- I would start by setting your CPU into ECO mode. Then install or configure your PC to limit GPU power, and see if it runs fine.
@MeeLee
Thank you for theEco Mode suggestion!
I just found where is the Eco Mode setup in the BIOS menu of my Gigabyte X570 Aorus Master (under AMD Overclocking menu). See youtube video for BIOS setup (https://youtu.be/bOv7YCwzNk8?t=210)
I am running FAH on Eco Mode right now (CPU + GPU). I can already say that it severely limits power to the CPU (85 Watts Vs 140 Watts in stock settings). It also decrease my performance (PPD) quite dramatically. I am showing a 40% decrease in PPD. So I would not consider that a definitive solution but a really good test for motherboard stability.
I will report back with stability results.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by MeeLee »

FrankMB wrote:
MeeLee wrote:1- might still be a psu issue. Could be a bad cap or voltage regulator. Or perhaps some dust, or a broken cable lead.
2- I would start by setting your CPU into ECO mode. Then install or configure your PC to limit GPU power, and see if it runs fine.
@MeeLee
Thank you for theEco Mode suggestion!
I just found where is the Eco Mode setup in the BIOS menu of my Gigabyte X570 Aorus Master (under AMD Overclocking menu). See youtube video for BIOS setup (https://youtu.be/bOv7YCwzNk8?t=210)
I am running FAH on Eco Mode right now (CPU + GPU). I can already say that it severely limits power to the CPU (85 Watts Vs 140 Watts in stock settings). It also decrease my performance (PPD) quite dramatically. I am showing a 40% decrease in PPD. So I would not consider that a definitive solution but a really good test for motherboard stability.
I will report back with stability results.
Eco mode disables boost clock, and limits the CPU to 65W (95W effectively), which results in lower boost clock speeds, or quite often, running the CPU at it's rated speed.
If it runs fine, you can fine tune the ECO setting, gradually increase it.
Sometimes there's a threshold where you get max PPD, while consuming the least amount of power (and keeping the CPU below 75-80C preferably).
Eco setting is a quick preset, which you can modify to your liking; just like you can run in normal (non-eco) mode, and throttle down the TDP of the CPU (watts), by adjusting the AMPS and other values in BIOS.
You can wreck your PC if you set the values too low (won't boot), so be careful with it!
Never stray too far below the ECO values, or to high above the max settings.
I think I messed up a Mobo one time, by trying out a 45W eco setting on a Ryzen 3900x, and it stopped booting.
Without sufficient CPU power, the CPU can't load the bios values, and won't boot anymore...
Just thread with care!
FrankMB
Posts: 7
Joined: Sat Mar 12, 2022 12:38 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by FrankMB »

Ok, here is my update on the crashes of my AMD 5950x on the Gigabyte X570 Aorus Master.
(see previous post for my 10 previous attemps at viewtopic.php?f=38&t=37535&start=30#p355460)

11) Tried EcoMode : Failed. Took longer to crash, but eventually crashed anyway. I cannot say if the longer duration was caused by decrease folding performance or a small increased in stability.
12) Disabling Multithreading : Failed. I wondered if multithreading (hyperthreading in Intel language) could cause the instability, but disabling it did not increase stability.

So I am back to square one. Any more idea? I would like to test my CPU with another software. If I can replicate the crash with another software it would confirm I have a hardware issue with my motherboard or CPU (since every other component has been changed without success). What stress test software and settings would you recommend?

Thank you! Frank
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@FrankMB

I tried all that too, without success. For now, I am running Boinc with GPUGrid, Rosetta Stone and Milkyway projects. Of those, the GPUGrid project puts the heaviest load on the GPU, and it's pretty similar to the power consumption I've seen in F@H. I have been running it for several months now, and I have not yet had any problems with Boinc and the projects mentioned above.
Another benchmark I have used to determine if I had power delivery problems was FurMark. It puts about the same load on the GPU as F@H. I tested it in the past, and I left it running for several days without problems.
The only software where I see the silent reset issue if F@H.
FrankMB
Posts: 7
Joined: Sat Mar 12, 2022 12:38 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by FrankMB »

@Marius

New information : I was able to recreate the bug with Linpack Extreme at maximum settings! After many hours the PC froze solid. Unable to restart with the restart button or pressing power for many seconds. I had to remove power to the PSU to restart the PC. After the restart the BIOS was corrupted and automatically reset to default values. So Linpack Extreme at max setting is powerful enough in stress test mode with 10GB (max supported) and 10000 cycles to recreate the bug. So my bug is hardware related and not a software bug with Folding@Home. Folding is simply really really hard on the CPU.

Marius, you could try the Linpack with max settings if you want to verify your system stability. On my side every other stress test passed flawlessly.

For my part I am contacting Gigabyte for an exchange motherboard. I will report back as soon as I can test the new motherboard. I may take a few weeks. Of course I could be unlucky and simply having a bad CPU.
MeeLee
Posts: 1339
Joined: Tue Feb 19, 2019 10:16 pm

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by MeeLee »

Most of the crashes happen if you split PCIE power lanes. I remember on RTX GPUs I had to join 2x 6 pin PCIE headers into a 1x8 plug that would fit the GPU.
Splitting a 6-8pin to spread over 2 GPUs under full load definitely will result in failure.

Same perhaps with the CPU pin out.
What was the max registered CPU temperature?
Also, did you recently swap out the thermal paste on the CPU?
FrankMB
Posts: 7
Joined: Sat Mar 12, 2022 12:38 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by FrankMB »

@MeeLee
Max CPU temp 70 Celsius.
New CPU thermal paste with Noctua NH-D15 CPU cooler
My PSU has only one power rail so the split rail should not be an issue.
Note that the issue also happened even without GPU folding (only CPU) in Eco mode (65 watts TDP, 50 Celsius) and with a new PSU (changed PSU). If it is a power issue it is related to a bad component on the motherboard itself (bad VRM?)
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@FrankMB

I haven't tried Linpack yet, but the symptoms you describe are similar to what I get with F@H. My motherboard is a Gigabyte X570S Aorus Master, what is yours?
FrankMB
Posts: 7
Joined: Sat Mar 12, 2022 12:38 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by FrankMB »

@Marius
My motherboard is almost the same, except the previous version. It is the X570 Aorus Master V1.2 (not the X570S version).
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@FrankMB
Thanks for letting me know. I will setup Linpack and report back later.
Marius
Posts: 34
Joined: Thu Nov 04, 2021 3:08 am

Re: Strange crash/reboot and CMOS corruption only with F@H

Post by Marius »

@FrankMB

I have been running Linpack Extreme, in Linux, at max settings for 24 hours now, and there were no problems so far. I'm tempted to leave it running for the weekend, just in case. I will update this post then.
Post Reply