Page 1 of 1
GPU error.
Posted: Tue Mar 19, 2019 8:24 pm
by MeeLee
Is there a way we can implement in the client some sort of reset button in FAHControl?
When FAH is configured for an x-amount of graphics cards, and one fails to load, or is removed for whatever reason, FAHControl gives an error; and it's impossible to access the slots list to remove one GPU slot.
I spent like 20 minutes trying to find that one post on how to reset//reconfigure the client without success.
I even reinstalled the client and control.
I believe in Windows that should do it, but not in Linux.
In the end, I had to plug in a spare graphics card I had laying around, just to access the slots list, to remove a slot.
It would be better if there's a way the slots list can be accessed, even without internet access, or when something like RMA a card happens. Not everyone has an extra card laying around.
Re: GPU error.
Posted: Wed Mar 20, 2019 12:20 am
by bruce
Please post your log showing the error. I've removed a slot and don't remember an error.
The only way to change the hardware or to add a GPU is to reinstall FAHClient. That's the only time it detects which GPU(s) you have and creates the slots to match their characteristics.
Re: GPU error.
Posted: Wed Mar 20, 2019 2:49 am
by MeeLee
The issue is not removing a slot in software.
The issue is when removing a graphics card from a pcie slot, without adjusting the software; and then trying to reload the software.
If any hardware crash happens, and a card needs to be removed, Linux users will have a tough time getting FAH to work, other than finding a way to reset it; if they forgot (or were unable) to remove the GPU slot BEFORE reboot.
Though sometimes hardware crashes cause an automatic system reboot, or crash, and will force the user to restart FAH when the faulty hardware issue has been solved.
When the system reboots, FAH gets stuck at the beginning showing a message in the likes of:
"...client "local" 127.0.0.1
Option 'gpu-index' has no default and is not set.." and something more...
Re: GPU error.
Posted: Wed Mar 20, 2019 5:27 am
by bruce
I don't understand removing the GPU. If rebooting resets it, you have no problem.
If rebooting cannot get the GPU running again, you're suggesting it's trash or RMA time. In either case, FAH cannot proceed so you might as well disable it or uninstall it until you have a working GPU again. Recovering the in-process WU is not possible unless you replace the GPU with an identical GPU that works.
There must be something I'm missing.
Re: GPU error.
Posted: Thu Mar 21, 2019 3:10 am
by MeeLee
Well,
Nvidia sees 4 cards.
System is shut down.
1 card is removed.
Now nvidia is seeing 3 cards.
OS and everything else have no problem with there only being 3 cards.
Except for FAH, which seems like it's looking for the 4th card.
The settings can't be accessed.
Reboot or reinstall don't work.
Some person on this forum had recently posted how to reset the client in Linux (by typing something in the terminal, I could again put my username and passkey and team in), and FAHControl would restart stock (with 1 CPU).
Then it was just a matter of adding GPU slots again.
I don;t know anymore what that command was. I should have written it down when I saw it.
Re: GPU error.
Posted: Thu Mar 21, 2019 11:26 am
by des1957
I have experienced this error. Whether in Windows or Linux. The folding software is folding fine. The error is in the control panel software. The only way I have found to repair this is manually remove the slot from the config file. If you have 4 gpus and remove one manually, fah control will still have 4 gpus listed in the config file. You need to edit the file and remove the last gpu,save the file and reboot fah or reboot your system. I run 3 rigs with 4 gpus each and have run into this issue many times. Good luck
Re: GPU error.
Posted: Thu Mar 21, 2019 3:45 pm
by foldy
Yes, I can reproduce the issue on Windows 7. This is a FahControl bug
https://github.com/FoldingAtHome/fah-issues/issues/1274
Re: GPU error.
Posted: Fri Mar 22, 2019 5:55 am
by MeeLee
@foldy,
in windows, is the issue resolved when reinstalling the software?
Because in Linux, the configuration file remains somewhere.
Re: GPU error.
Posted: Fri Mar 22, 2019 6:48 pm
by bruce
Allowing the installer to re-create a configuration that has slot(s) that match the hardware that is a straightforward way of dealing with the problem.
We do not recommend manually editing config.xml but it's possible to delete the offending slot, replacing it with a slot configured for whatever supportable hardware is on your system.
Re: GPU error.
Posted: Sat Mar 23, 2019 9:10 am
by MeeLee
it would be easier to have an easy option to start from scratch, meaning zero slot usage, or access to config, even without internet.
I don't know why an internet connection is required, to change config and slots, or continue running a WU that's still good to fold...
Re: GPU error.
Posted: Sat Mar 23, 2019 11:52 am
by bruce
I'm not sure why you're concerned about having an internet connection is a concern. Whenever you alter you reconfigure your hardware (like removing and adding a new GPU) you have to have re-validate that the drivers are current and then immediately you'll need to download a new WU. The WUs are closely tied to whatever hardware is going to process them. A WU that was obtained to be processed by GPU type X will need to be returned or dumped and you can't expect be able to continue processing it by GPU Y. An internet connection will be required..
Re: GPU error.
Posted: Sat Mar 23, 2019 12:31 pm
by foldy
On my Linux Ubuntu the fah config file is here /etc/init.d/config.xml
Re: GPU error.
Posted: Sat Mar 23, 2019 2:18 pm
by bollix47
foldy wrote:On my Linux Ubuntu the fah config file is here /etc/init.d/config.xml
Interesting ... all my ubuntu setups show my configuration file is in
/etc/fahclient/config.xml