Page 1 of 1
The annoying "restart" incident.
Posted: Wed May 27, 2020 3:26 am
by Ibringapples
Hello all,
Due to the shortage of GPU WU I have to restart the linux FAH client.
When I do, the feeling of frustration is quite unpleasant because the service just doesn't obey me.
I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Thanks a lot.
Re: The annoying "restart" incident.
Posted: Wed May 27, 2020 4:24 am
by JimboPalmer
I have not noticed a shortage of GPU WUs, since about May 12. I run Wndows boxes, no disease specified, 3 Nvidia GPUs, two Pascals and a Turring. No Beta, no Advanced.
Is there a chance you are restricting your WUs in some way?
here is how to post your log
viewtopic.php?f=24&t=26036
Re: The annoying "restart" incident.
Posted: Wed May 27, 2020 9:47 am
by NRT_AntiKytherA
I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.
More complex, restart the service and fahclient using systemd. anyhow pay attention to bruce's thoughts on this subject:
bruce wrote:pcwolf wrote:When I become impatient waiting for WU downloads (i.e. considerable minutes/hours passing not Folding) I have found if I go to Manjaro System Settings and go to the SystemD tab, I can restart the "foldingathome.service" and when both the service and F@H Client return ... *BOOM* I immediately receive a new WU.
This behavior is consistent and repeatable. I have two GPUs Folding and the previously engaged slot goes immediately back to a checkpoint and resumes flawlessly.
You may (or may not) be guilty of biased perception. Restarting the service does initiate a fresh attempt to get work rather than waiting up to an hour for the next automatic attempt, but I know of no reason why the restart would be any more likely to succeed than if the next attempt was initiated by the timer. It would seem most likely that the client simply says to the server "I/m asking for a new work unit for my hardware ( ... description)" rather than the request being equivalent to "I'm asking
again for for a new work unit for my hardware ( ... description)" Why would the "again" message (if it's there) actually reduce your chances of getting a new assignment?
Re: The annoying "restart" incident.
Posted: Wed May 27, 2020 1:05 pm
by Ibringapples
Hello,
You're right...
Simply they are not running properly (WU)
Here the logs:
https://pastebin.com/cq0qB5Fd
Can you help me?
Thanks a lot.
---update--- 01
Now the only one that is not running is the CPU WU
But i'ts unstable. Suddenly 2 days
---update--- 02
Now.. all are running but I've lost 2 CPU from the 4 ones I have.
Re: The annoying "restart" incident.
Posted: Wed May 27, 2020 1:22 pm
by Ibringapples
NRT_AntiKytherA wrote:I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.
But.. reboot the machine could be a problem cause I have other services running inside.
Then, maybe with systemd? I have OpenRC and systemd but I prefer OpenRC...
Thanks.
Re: The annoying "restart" incident.
Posted: Sun May 31, 2020 2:15 am
by bruce
Ibringapples wrote:Now the only one that is not running is the CPU WU
But i'ts unstable. Suddenly 2 days
[/quote]
Now.. all are running but I've lost 2 CPU from the 4 ones I have.
Each GPU requires one CPU thread to send and receive data between main RAM and the GPU. With 2 GPUs and 4 CPUs you can fold with the remaining two.
Re: The annoying "restart" incident.
Posted: Sun May 31, 2020 5:40 am
by MeeLee
You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
Re: The annoying "restart" incident.
Posted: Mon Jun 01, 2020 3:40 pm
by Ibringapples
MeeLee wrote:You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
Actually no...
Code: Select all
~$ sudo /etc/init.d/FAHClient restart
Stopping fahclient ... OK
Starting fahclient ... FAIL
That's the awful issue here ...
Re: The annoying "restart" incident.
Posted: Mon Jun 01, 2020 10:15 pm
by bruce
If you're running one or more CPU based slots (FAHCore_a7) that's not true.
MeeLee wrote:You can safely run a script to use systemd to restart the service.
Unfortunately, there's a bug in FAHCore_a7 which fails to sync it's open files before shutting down. You have to pause all CPU slots and give them time to close their files.
Re: The annoying "restart" incident.
Posted: Thu Jun 04, 2020 12:32 am
by MeeLee
I was under the assumption that you needed to use systemd for restarts.
Or, perhaps try Fahclient stop, and on another line fahclient start.
Re: The annoying "restart" incident.
Posted: Fri Jun 05, 2020 4:09 am
by bruce
When I PAUSE a FAHCore_a7 WU, it can watch ir process for a bit before reporting that it has completed the stopping process. I have not evaluated whether that time varies with the project but I'd guess that it might. You need to allow at least that long before restarting, whether or not you use systemd. I have not heard if the bug will be fixed in the next version of the FAHCore, but I sure hope so.
Re: The annoying "restart" incident.
Posted: Fri Jun 05, 2020 6:38 pm
by MeeLee
I never had any issues on my system using the 'restart' function in terminal.
however, you could use the 'sleep' command to pause the script for an x-amount of seconds before going to the next
Eg:
Code: Select all
sudo /etc/init.d/fahclient stop
sleep 5
sudo /etc/init.d/fahclient start