GPU crashing at 99.9%

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
maxisme
Posts: 1
Joined: Mon Mar 16, 2020 10:00 am

GPU crashing at 99.9%

Post by maxisme »

Hello,

I am currently using my GPU but for some reason my GPU crashes at 99.9% of the job.

When running

Code: Select all

nvidia-smi
I get:
Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
Image

and my Estimated Points just drop slowly.
Joe_H
Site Admin
Posts: 8002
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: GPU crashing at 99.9%

Post by Joe_H »

That sounds like your driver crashed, that could be from your GPU overheating, overclocking or another cause. After restarting the WU should start up at the last checkpoint.

The progress being shown as still progressing to 99.99% is a bug where the client does not detect the driver crash, and continues estimates based on prior progress. Examining the actual log entries will show progress stopped at some point.
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU crashing at 99.9%

Post by bruce »

See below to post FAH's log

(99.99% is most likely a sign of another problem.)
Post Reply