Page 1 of 1

Multiple client stalled due to "database locked"

Posted: Fri Sep 27, 2019 4:47 am
by markfw
I have had my entire farm off-line most of the day until I noticed that they all we stalled due to "database locked" . After rebooting them (an all day job) I was able to get them all working. But my question is, why did this happen ? and will it happen again in the future ? These were all linux clients, and stopping and starting the client using

sudo service FAHClient stop
sudo service FAHClient start

Did not clear the error, only a reboot. Oh, and I did have one windows client, and I also had to reboot to get it to work.

Re: Multiple client stalled due to "database locked"

Posted: Fri Sep 27, 2019 5:52 pm
by bruce
Without seeing your recent logs, I can only guess.

The most common cause of "database locked" comes from starting a second copy of FAHClient (concurrently). Ordinarily FAHClient runs as a service and two copies can't work at the same time.

Re: Multiple client stalled due to "database locked"

Posted: Fri Sep 27, 2019 6:56 pm
by markfw
Well, I restarted all my hosts, so the logs are gone, but NO, these 14 different hosts running linux have all been running 24/7 for months, so only one instance was running. It just all of a sudden affected all my hosts, so I think something happened to the system.

Re: Multiple client stalled due to "database locked"

Posted: Fri Sep 27, 2019 8:17 pm
by bruce
One instance of FAHClient per Host, Right? -- and each host has it's own database that is locked by the FAHClient running on that host. So which database was locked and what happened on that particular host?

Re: Multiple client stalled due to "database locked"

Posted: Fri Sep 27, 2019 9:54 pm
by Joe_H
markfw wrote:Well, I restarted all my hosts, so the logs are gone, ...
The client keeps copies of the last 16 log files by default, so unless you deleted them they should still be on your systems. The link in Bruce's sig includes directions on how to locate the log files depending on which OS you are running.

Re: Multiple client stalled due to "database locked"

Posted: Sat Sep 28, 2019 4:54 am
by markfw
Well, I looked on 2 different hosts, and could not find "database locked". If I find it on some host I will reply back. Just know that a very experienced user (15 years, and number 21 overall) had this issue, and I think something happened with the host servers. All of my hosts that have been up for months could not all have had the same issue without a system problem.

Re: Multiple client stalled due to "database locked"

Posted: Sat Sep 28, 2019 8:50 am
by HaloJones
it sounds like you hit the server that won't provide units properly and for some reason can't be fixed

Re: Multiple client stalled due to "database locked"

Posted: Sat Sep 28, 2019 3:44 pm
by rickoic
I've been having the same problem and even after reboot the download is extremely slow. I have 4 different computers running with 2 gpus on 3 and 1 on the other (laptop).

Re: Multiple client stalled due to "database locked"

Posted: Sun Sep 29, 2019 5:44 am
by bruce
Which WU are being downloade from which WorkServers?

When was the last time you restarted your LAN (Router)?

Re: Multiple client stalled due to "database locked"

Posted: Tue Oct 01, 2019 2:25 pm
by rickoic
12:02:07:WU01:FS01:Connecting to 65.254.110.245:8080
12:02:07:WU01:FS01:Assigned to work server 155.247.166.220
12:02:07:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GP104 [GeForce GTX 1080] 8873 from 155.247.166.220
12:02:07:WU01:FS01:Connecting to 155.247.166.220:8080
12:02:08:WU01:FS01:Downloading 15.63MiB

And there it sat all night long.

Did a hard restart of LAN just 2-3 days ago.

Here's my log for this gpu after I rebooted.

14:33:13:WU00:FS01:Assigned to work server 128.252.203.10
14:33:14:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1080] 8873 from 128.252.203.10
14:33:14:WU00:FS01:Connecting to 128.252.203.10:8080
14:33:14:WU02:FS02:0x21: Found a checkpoint file
14:33:15:WU00:FS01:Downloading 69.98MiB
14:33:21:WU00:FS01:Download 24.74%
14:33:27:WU00:FS01:Download 50.19%
14:33:33:WU00:FS01:Download 79.93%
14:33:37:WU00:FS01:Download complete
14:33:38:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14230 run:418 clone:1 gen:86 core:0x21 unit:0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:38:WU00:FS01:Starting
14:33:38:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ricko\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
14:33:38:WU00:FS01:Started FahCore on PID 9668
14:33:39:WU00:FS01:Core PID:9764
14:33:39:WU00:FS01:FahCore 0x21 started
14:33:42:WU00:FS01:0x21:*********************** Log Started 2019-10-01T14:33:42Z ***********************
14:33:42:WU00:FS01:0x21:Project: 14230 (Run 418, Clone 1, Gen 86)
14:33:42:WU00:FS01:0x21:Unit: 0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:42:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
14:33:42:WU00:FS01:0x21:Machine: 1
14:33:42:WU00:FS01:0x21:Reading tar file core.xml
14:33:42:WU00:FS01:0x21:Reading tar file integrator.xml
14:33:42:WU00:FS01:0x21:Reading tar file state.xml

Tks

Re: Multiple client stalled due to "database locked"

Posted: Tue Oct 01, 2019 5:05 pm
by bruce
I see no messages in what you posted mentioning database locked.

I do see that your report is associated with work server 155.247.166.* which has been experiencing network congestion ... and people are actively working on that problem. (See other discussions.) Please do a better job of reporting your actual problem.