Multiple client stalled due to "database locked"
Moderators: Site Moderators, FAHC Science Team
Multiple client stalled due to "database locked"
I have had my entire farm off-line most of the day until I noticed that they all we stalled due to "database locked" . After rebooting them (an all day job) I was able to get them all working. But my question is, why did this happen ? and will it happen again in the future ? These were all linux clients, and stopping and starting the client using
sudo service FAHClient stop
sudo service FAHClient start
Did not clear the error, only a reboot. Oh, and I did have one windows client, and I also had to reboot to get it to work.
sudo service FAHClient stop
sudo service FAHClient start
Did not clear the error, only a reboot. Oh, and I did have one windows client, and I also had to reboot to get it to work.
Re: Multiple client stalled due to "database locked"
Without seeing your recent logs, I can only guess.
The most common cause of "database locked" comes from starting a second copy of FAHClient (concurrently). Ordinarily FAHClient runs as a service and two copies can't work at the same time.
The most common cause of "database locked" comes from starting a second copy of FAHClient (concurrently). Ordinarily FAHClient runs as a service and two copies can't work at the same time.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Multiple client stalled due to "database locked"
Well, I restarted all my hosts, so the logs are gone, but NO, these 14 different hosts running linux have all been running 24/7 for months, so only one instance was running. It just all of a sudden affected all my hosts, so I think something happened to the system.
Re: Multiple client stalled due to "database locked"
One instance of FAHClient per Host, Right? -- and each host has it's own database that is locked by the FAHClient running on that host. So which database was locked and what happened on that particular host?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Site Admin
- Posts: 7943
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: Multiple client stalled due to "database locked"
The client keeps copies of the last 16 log files by default, so unless you deleted them they should still be on your systems. The link in Bruce's sig includes directions on how to locate the log files depending on which OS you are running.markfw wrote:Well, I restarted all my hosts, so the logs are gone, ...
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: Multiple client stalled due to "database locked"
Well, I looked on 2 different hosts, and could not find "database locked". If I find it on some host I will reply back. Just know that a very experienced user (15 years, and number 21 overall) had this issue, and I think something happened with the host servers. All of my hosts that have been up for months could not all have had the same issue without a system problem.
Re: Multiple client stalled due to "database locked"
it sounds like you hit the server that won't provide units properly and for some reason can't be fixed
single 1070
-
- Posts: 320
- Joined: Sat May 23, 2009 4:49 pm
- Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970 - Location: Mississippi near Memphis, Tn
Re: Multiple client stalled due to "database locked"
I've been having the same problem and even after reboot the download is extremely slow. I have 4 different computers running with 2 gpus on 3 and 1 on the other (laptop).
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Re: Multiple client stalled due to "database locked"
Which WU are being downloade from which WorkServers?
When was the last time you restarted your LAN (Router)?
When was the last time you restarted your LAN (Router)?
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 320
- Joined: Sat May 23, 2009 4:49 pm
- Hardware configuration: eVga x299 DARK 2070 Super, eVGA 2080, eVga 1070, eVga 2080 Super
MSI x399 eVga 2080, eVga 1070, eVga 1070, GT970 - Location: Mississippi near Memphis, Tn
Re: Multiple client stalled due to "database locked"
12:02:07:WU01:FS01:Connecting to 65.254.110.245:8080
12:02:07:WU01:FS01:Assigned to work server 155.247.166.220
12:02:07:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GP104 [GeForce GTX 1080] 8873 from 155.247.166.220
12:02:07:WU01:FS01:Connecting to 155.247.166.220:8080
12:02:08:WU01:FS01:Downloading 15.63MiB
And there it sat all night long.
Did a hard restart of LAN just 2-3 days ago.
Here's my log for this gpu after I rebooted.
14:33:13:WU00:FS01:Assigned to work server 128.252.203.10
14:33:14:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1080] 8873 from 128.252.203.10
14:33:14:WU00:FS01:Connecting to 128.252.203.10:8080
14:33:14:WU02:FS02:0x21: Found a checkpoint file
14:33:15:WU00:FS01:Downloading 69.98MiB
14:33:21:WU00:FS01:Download 24.74%
14:33:27:WU00:FS01:Download 50.19%
14:33:33:WU00:FS01:Download 79.93%
14:33:37:WU00:FS01:Download complete
14:33:38:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14230 run:418 clone:1 gen:86 core:0x21 unit:0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:38:WU00:FS01:Starting
14:33:38:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ricko\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
14:33:38:WU00:FS01:Started FahCore on PID 9668
14:33:39:WU00:FS01:Core PID:9764
14:33:39:WU00:FS01:FahCore 0x21 started
14:33:42:WU00:FS01:0x21:*********************** Log Started 2019-10-01T14:33:42Z ***********************
14:33:42:WU00:FS01:0x21:Project: 14230 (Run 418, Clone 1, Gen 86)
14:33:42:WU00:FS01:0x21:Unit: 0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:42:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
14:33:42:WU00:FS01:0x21:Machine: 1
14:33:42:WU00:FS01:0x21:Reading tar file core.xml
14:33:42:WU00:FS01:0x21:Reading tar file integrator.xml
14:33:42:WU00:FS01:0x21:Reading tar file state.xml
Tks
12:02:07:WU01:FS01:Assigned to work server 155.247.166.220
12:02:07:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GP104 [GeForce GTX 1080] 8873 from 155.247.166.220
12:02:07:WU01:FS01:Connecting to 155.247.166.220:8080
12:02:08:WU01:FS01:Downloading 15.63MiB
And there it sat all night long.
Did a hard restart of LAN just 2-3 days ago.
Here's my log for this gpu after I rebooted.
14:33:13:WU00:FS01:Assigned to work server 128.252.203.10
14:33:14:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GP104 [GeForce GTX 1080] 8873 from 128.252.203.10
14:33:14:WU00:FS01:Connecting to 128.252.203.10:8080
14:33:14:WU02:FS02:0x21: Found a checkpoint file
14:33:15:WU00:FS01:Downloading 69.98MiB
14:33:21:WU00:FS01:Download 24.74%
14:33:27:WU00:FS01:Download 50.19%
14:33:33:WU00:FS01:Download 79.93%
14:33:37:WU00:FS01:Download complete
14:33:38:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14230 run:418 clone:1 gen:86 core:0x21 unit:0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:38:WU00:FS01:Starting
14:33:38:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ricko\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 705 -lifeline 4752 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu 1
14:33:38:WU00:FS01:Started FahCore on PID 9668
14:33:39:WU00:FS01:Core PID:9764
14:33:39:WU00:FS01:FahCore 0x21 started
14:33:42:WU00:FS01:0x21:*********************** Log Started 2019-10-01T14:33:42Z ***********************
14:33:42:WU00:FS01:0x21:Project: 14230 (Run 418, Clone 1, Gen 86)
14:33:42:WU00:FS01:0x21:Unit: 0x0000007080fccb0a5d654d6f4cc8e9bf
14:33:42:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
14:33:42:WU00:FS01:0x21:Machine: 1
14:33:42:WU00:FS01:0x21:Reading tar file core.xml
14:33:42:WU00:FS01:0x21:Reading tar file integrator.xml
14:33:42:WU00:FS01:0x21:Reading tar file state.xml
Tks
I'm folding because Dec 2005 I had radical prostate surgery.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Lost brother to spinal cancer, brother-in-law to prostate cancer.
Several 1st cousins lost and a few who have survived.
Re: Multiple client stalled due to "database locked"
I see no messages in what you posted mentioning database locked.
I do see that your report is associated with work server 155.247.166.* which has been experiencing network congestion ... and people are actively working on that problem. (See other discussions.) Please do a better job of reporting your actual problem.
I do see that your report is associated with work server 155.247.166.* which has been experiencing network congestion ... and people are actively working on that problem. (See other discussions.) Please do a better job of reporting your actual problem.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.