The annoying "restart" incident.
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 42
- Joined: Fri Apr 10, 2020 3:53 am
The annoying "restart" incident.
Hello all,
Due to the shortage of GPU WU I have to restart the linux FAH client.
When I do, the feeling of frustration is quite unpleasant because the service just doesn't obey me.
I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Thanks a lot.
Due to the shortage of GPU WU I have to restart the linux FAH client.
When I do, the feeling of frustration is quite unpleasant because the service just doesn't obey me.
I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Thanks a lot.
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: The annoying "restart" incident.
I have not noticed a shortage of GPU WUs, since about May 12. I run Wndows boxes, no disease specified, 3 Nvidia GPUs, two Pascals and a Turring. No Beta, no Advanced.
Is there a chance you are restricting your WUs in some way?
here is how to post your log
viewtopic.php?f=24&t=26036
Is there a chance you are restricting your WUs in some way?
here is how to post your log
viewtopic.php?f=24&t=26036
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
-
- Posts: 107
- Joined: Sun May 10, 2020 11:50 pm
Re: The annoying "restart" incident.
Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
More complex, restart the service and fahclient using systemd. anyhow pay attention to bruce's thoughts on this subject:
bruce wrote:You may (or may not) be guilty of biased perception. Restarting the service does initiate a fresh attempt to get work rather than waiting up to an hour for the next automatic attempt, but I know of no reason why the restart would be any more likely to succeed than if the next attempt was initiated by the timer. It would seem most likely that the client simply says to the server "I/m asking for a new work unit for my hardware ( ... description)" rather than the request being equivalent to "I'm asking again for for a new work unit for my hardware ( ... description)" Why would the "again" message (if it's there) actually reduce your chances of getting a new assignment?pcwolf wrote:When I become impatient waiting for WU downloads (i.e. considerable minutes/hours passing not Folding) I have found if I go to Manjaro System Settings and go to the SystemD tab, I can restart the "foldingathome.service" and when both the service and F@H Client return ... *BOOM* I immediately receive a new WU. This behavior is consistent and repeatable. I have two GPUs Folding and the previously engaged slot goes immediately back to a checkpoint and resumes flawlessly.
-
- Posts: 42
- Joined: Fri Apr 10, 2020 3:53 am
Re: The annoying "restart" incident.
Hello,
You're right...
Simply they are not running properly (WU)
Here the logs:
https://pastebin.com/cq0qB5Fd
Can you help me?
Thanks a lot.
---update--- 01
Now the only one that is not running is the CPU WU
But i'ts unstable. Suddenly 2 days
---update--- 02
Now.. all are running but I've lost 2 CPU from the 4 ones I have.
You're right...
Simply they are not running properly (WU)
Here the logs:
https://pastebin.com/cq0qB5Fd
Can you help me?
Thanks a lot.
---update--- 01
Now the only one that is not running is the CPU WU
But i'ts unstable. Suddenly 2 days
---update--- 02
Now.. all are running but I've lost 2 CPU from the 4 ones I have.
Code: Select all
~$ nproc --all
4
Last edited by Ibringapples on Wed May 27, 2020 2:01 pm, edited 2 times in total.
-
- Posts: 42
- Joined: Fri Apr 10, 2020 3:53 am
Re: The annoying "restart" incident.
But.. reboot the machine could be a problem cause I have other services running inside.NRT_AntiKytherA wrote:Simplest would be to restart your machine which will signal the client to terminate gracefully preserving any running CPU work unit to the last save point.I have to kill the process to try to restart it. Or.. when finally I can stop it the service is up by itself. So I don't feel I have the control about this service.
Does anyone has a good way to do it?
Then, maybe with systemd? I have OpenRC and systemd but I prefer OpenRC...
Thanks.
Re: The annoying "restart" incident.
But i'ts unstable. Suddenly 2 days [/quote]Ibringapples wrote:Now the only one that is not running is the CPU WU
Each GPU requires one CPU thread to send and receive data between main RAM and the GPU. With 2 GPUs and 4 CPUs you can fold with the remaining two.Now.. all are running but I've lost 2 CPU from the 4 ones I have.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: The annoying "restart" incident.
You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
-
- Posts: 42
- Joined: Fri Apr 10, 2020 3:53 am
Re: The annoying "restart" incident.
Actually no...MeeLee wrote:You can safely run a script to use systemd to restart the service.
You can also use ssh to start it remotely.
Supposedly fahcontrol has a way to connect to a remote client.
Code: Select all
~$ sudo /etc/init.d/FAHClient restart
Stopping fahclient ... OK
Starting fahclient ... FAIL
Re: The annoying "restart" incident.
If you're running one or more CPU based slots (FAHCore_a7) that's not true.
Unfortunately, there's a bug in FAHCore_a7 which fails to sync it's open files before shutting down. You have to pause all CPU slots and give them time to close their files.MeeLee wrote:You can safely run a script to use systemd to restart the service.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: The annoying "restart" incident.
I was under the assumption that you needed to use systemd for restarts.
Or, perhaps try Fahclient stop, and on another line fahclient start.
Or, perhaps try Fahclient stop, and on another line fahclient start.
Re: The annoying "restart" incident.
When I PAUSE a FAHCore_a7 WU, it can watch ir process for a bit before reporting that it has completed the stopping process. I have not evaluated whether that time varies with the project but I'd guess that it might. You need to allow at least that long before restarting, whether or not you use systemd. I have not heard if the bug will be fixed in the next version of the FAHCore, but I sure hope so.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: The annoying "restart" incident.
I never had any issues on my system using the 'restart' function in terminal.
however, you could use the 'sleep' command to pause the script for an x-amount of seconds before going to the next
Eg:
however, you could use the 'sleep' command to pause the script for an x-amount of seconds before going to the next
Eg:
Code: Select all
sudo /etc/init.d/fahclient stop
sleep 5
sudo /etc/init.d/fahclient start