99.99% Bug

Moderators: Site Moderators, FAHC Science Team

Post Reply
Aurum
Posts: 292
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

99.99% Bug

Post by Aurum »

The 99.99% bug is still not being handled.

Code: Select all

03:53:36:WU00:FS00:0x21:*********************** Log Started 2018-06-26T03:53:35Z ***********************
03:53:36:WU00:FS00:0x21:Project: 11713 (Run 18, Clone 226, Gen 91)
03:53:36:WU00:FS00:0x21:Unit: 0x0000007b8ca304e75adf7a96e85d66f4
03:53:36:WU00:FS00:0x21:CPU: 0x00000000000000000000000000000000
03:53:36:WU00:FS00:0x21:Machine: 0
03:53:36:WU00:FS00:0x21:Reading tar file core.xml
03:53:36:WU00:FS00:0x21:Reading tar file integrator.xml
03:53:36:WU00:FS00:0x21:Reading tar file state.xml
03:53:36:WU00:FS00:0x21:Reading tar file system.xml
03:53:36:WU00:FS00:0x21:Digital signatures verified
03:53:36:WU00:FS00:0x21:Folding@home GPU Core21 Folding@home Core
03:53:36:WU00:FS00:0x21:Version 0.0.18
03:53:48:WU00:FS00:0x21:Completed 0 out of 7500000 steps (0%)
03:53:48:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:55:26:WU00:FS00:0x21:Completed 75000 out of 7500000 steps (1%)
03:57:08:WU00:FS00:0x21:Completed 150000 out of 7500000 steps (2%)
03:58:52:WU00:FS00:0x21:Completed 225000 out of 7500000 steps (3%)
04:00:40:WU00:FS00:0x21:Completed 300000 out of 7500000 steps (4%)
04:02:26:WU00:FS00:0x21:Completed 375000 out of 7500000 steps (5%)
04:04:12:WU00:FS00:0x21:Completed 450000 out of 7500000 steps (6%)
04:06:00:WU00:FS00:0x21:Completed 525000 out of 7500000 steps (7%)
04:07:46:WU00:FS00:0x21:Completed 600000 out of 7500000 steps (8%)
04:09:31:WU00:FS00:0x21:Completed 675000 out of 7500000 steps (9%)
04:11:18:WU00:FS00:0x21:Completed 750000 out of 7500000 steps (10%)
04:13:06:WU00:FS00:0x21:Completed 825000 out of 7500000 steps (11%)
04:14:53:WU00:FS00:0x21:Completed 900000 out of 7500000 steps (12%)
04:16:39:WU00:FS00:0x21:Completed 975000 out of 7500000 steps (13%)
04:18:27:WU00:FS00:0x21:Completed 1050000 out of 7500000 steps (14%)
04:20:13:WU00:FS00:0x21:Completed 1125000 out of 7500000 steps (15%)
04:21:59:WU00:FS00:0x21:Completed 1200000 out of 7500000 steps (16%)
04:23:46:WU00:FS00:0x21:Completed 1275000 out of 7500000 steps (17%)
04:25:33:WU00:FS00:0x21:Completed 1350000 out of 7500000 steps (18%)
04:27:19:WU00:FS00:0x21:Completed 1425000 out of 7500000 steps (19%)
04:29:04:WU00:FS00:0x21:Completed 1500000 out of 7500000 steps (20%)
04:30:54:WU00:FS00:0x21:Completed 1575000 out of 7500000 steps (21%)
04:32:39:WU00:FS00:0x21:Completed 1650000 out of 7500000 steps (22%)
04:34:24:WU00:FS00:0x21:Completed 1725000 out of 7500000 steps (23%)
04:36:11:WU00:FS00:0x21:Completed 1800000 out of 7500000 steps (24%)
04:37:56:WU00:FS00:0x21:Completed 1875000 out of 7500000 steps (25%)
04:39:42:WU00:FS00:0x21:Completed 1950000 out of 7500000 steps (26%)
04:41:29:WU00:FS00:0x21:Completed 2025000 out of 7500000 steps (27%)
04:43:14:WU00:FS00:0x21:Completed 2100000 out of 7500000 steps (28%)
04:45:00:WU00:FS00:0x21:Completed 2175000 out of 7500000 steps (29%)
04:46:46:WU00:FS00:0x21:Completed 2250000 out of 7500000 steps (30%)
04:48:34:WU00:FS00:0x21:Completed 2325000 out of 7500000 steps (31%)
******************************* Date: 2018-06-26 *******************************
03:53:48:WU00:FS00:0x21:Completed 0 out of 7500000 steps (0%)
03:53:48:WU00:FS00:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
03:55:26:WU00:FS00:0x21:Completed 75000 out of 7500000 steps (1%)
03:57:08:WU00:FS00:0x21:Completed 150000 out of 7500000 steps (2%)
03:58:52:WU00:FS00:0x21:Completed 225000 out of 7500000 steps (3%)
04:00:40:WU00:FS00:0x21:Completed 300000 out of 7500000 steps (4%)
04:02:26:WU00:FS00:0x21:Completed 375000 out of 7500000 steps (5%)
04:04:12:WU00:FS00:0x21:Completed 450000 out of 7500000 steps (6%)
04:06:00:WU00:FS00:0x21:Completed 525000 out of 7500000 steps (7%)
04:07:46:WU00:FS00:0x21:Completed 600000 out of 7500000 steps (8%)
04:09:31:WU00:FS00:0x21:Completed 675000 out of 7500000 steps (9%)
04:11:18:WU00:FS00:0x21:Completed 750000 out of 7500000 steps (10%)
04:13:06:WU00:FS00:0x21:Completed 825000 out of 7500000 steps (11%)
04:14:53:WU00:FS00:0x21:Completed 900000 out of 7500000 steps (12%)
04:16:39:WU00:FS00:0x21:Completed 975000 out of 7500000 steps (13%)
04:18:27:WU00:FS00:0x21:Completed 1050000 out of 7500000 steps (14%)
04:20:13:WU00:FS00:0x21:Completed 1125000 out of 7500000 steps (15%)
04:21:59:WU00:FS00:0x21:Completed 1200000 out of 7500000 steps (16%)
04:23:46:WU00:FS00:0x21:Completed 1275000 out of 7500000 steps (17%)
04:25:33:WU00:FS00:0x21:Completed 1350000 out of 7500000 steps (18%)
04:27:19:WU00:FS00:0x21:Completed 1425000 out of 7500000 steps (19%)
04:29:04:WU00:FS00:0x21:Completed 1500000 out of 7500000 steps (20%)
04:30:54:WU00:FS00:0x21:Completed 1575000 out of 7500000 steps (21%)
04:32:39:WU00:FS00:0x21:Completed 1650000 out of 7500000 steps (22%)
04:34:24:WU00:FS00:0x21:Completed 1725000 out of 7500000 steps (23%)
04:36:11:WU00:FS00:0x21:Completed 1800000 out of 7500000 steps (24%)
04:37:56:WU00:FS00:0x21:Completed 1875000 out of 7500000 steps (25%)
04:39:42:WU00:FS00:0x21:Completed 1950000 out of 7500000 steps (26%)
04:41:29:WU00:FS00:0x21:Completed 2025000 out of 7500000 steps (27%)
04:43:14:WU00:FS00:0x21:Completed 2100000 out of 7500000 steps (28%)
04:45:00:WU00:FS00:0x21:Completed 2175000 out of 7500000 steps (29%)
04:46:46:WU00:FS00:0x21:Completed 2250000 out of 7500000 steps (30%)
04:48:34:WU00:FS00:0x21:Completed 2325000 out of 7500000 steps (31%)
******************************* Date: 2018-06-26 *******************************
04:04:12:WU00:FS00:0x21:Completed 450000 out of 7500000 steps (6%)
04:06:00:WU00:FS00:0x21:Completed 525000 out of 7500000 steps (7%)
04:07:46:WU00:FS00:0x21:Completed 600000 out of 7500000 steps (8%)
04:09:31:WU00:FS00:0x21:Completed 675000 out of 7500000 steps (9%)
04:11:18:WU00:FS00:0x21:Completed 750000 out of 7500000 steps (10%)
04:13:06:WU00:FS00:0x21:Completed 825000 out of 7500000 steps (11%)
04:14:53:WU00:FS00:0x21:Completed 900000 out of 7500000 steps (12%)
04:16:39:WU00:FS00:0x21:Completed 975000 out of 7500000 steps (13%)
04:18:27:WU00:FS00:0x21:Completed 1050000 out of 7500000 steps (14%)
04:20:13:WU00:FS00:0x21:Completed 1125000 out of 7500000 steps (15%)
04:21:59:WU00:FS00:0x21:Completed 1200000 out of 7500000 steps (16%)
04:23:46:WU00:FS00:0x21:Completed 1275000 out of 7500000 steps (17%)
04:25:33:WU00:FS00:0x21:Completed 1350000 out of 7500000 steps (18%)
04:27:19:WU00:FS00:0x21:Completed 1425000 out of 7500000 steps (19%)
04:29:04:WU00:FS00:0x21:Completed 1500000 out of 7500000 steps (20%)
04:30:54:WU00:FS00:0x21:Completed 1575000 out of 7500000 steps (21%)
04:32:39:WU00:FS00:0x21:Completed 1650000 out of 7500000 steps (22%)
04:34:24:WU00:FS00:0x21:Completed 1725000 out of 7500000 steps (23%)
04:36:11:WU00:FS00:0x21:Completed 1800000 out of 7500000 steps (24%)
04:37:56:WU00:FS00:0x21:Completed 1875000 out of 7500000 steps (25%)
04:39:42:WU00:FS00:0x21:Completed 1950000 out of 7500000 steps (26%)
04:41:29:WU00:FS00:0x21:Completed 2025000 out of 7500000 steps (27%)
04:43:14:WU00:FS00:0x21:Completed 2100000 out of 7500000 steps (28%)
04:45:00:WU00:FS00:0x21:Completed 2175000 out of 7500000 steps (29%)
04:46:46:WU00:FS00:0x21:Completed 2250000 out of 7500000 steps (30%)
04:48:34:WU00:FS00:0x21:Completed 2325000 out of 7500000 steps (31%)
******************************* Date: 2018-06-26 *******************************
04:39:42:WU00:FS00:0x21:Completed 1950000 out of 7500000 steps (26%)
04:41:29:WU00:FS00:0x21:Completed 2025000 out of 7500000 steps (27%)
04:43:14:WU00:FS00:0x21:Completed 2100000 out of 7500000 steps (28%)
04:45:00:WU00:FS00:0x21:Completed 2175000 out of 7500000 steps (29%)
04:46:46:WU00:FS00:0x21:Completed 2250000 out of 7500000 steps (30%)
04:48:34:WU00:FS00:0x21:Completed 2325000 out of 7500000 steps (31%)
******************************* Date: 2018-06-26 *******************************
******************************* Date: 2018-06-26 *******************************
04:39:42:WU00:FS00:0x21:Completed 1950000 out of 7500000 steps (26%)
04:41:29:WU00:FS00:0x21:Completed 2025000 out of 7500000 steps (27%)
04:43:14:WU00:FS00:0x21:Completed 2100000 out of 7500000 steps (28%)
04:45:00:WU00:FS00:0x21:Completed 2175000 out of 7500000 steps (29%)
04:46:46:WU00:FS00:0x21:Completed 2250000 out of 7500000 steps (30%)
04:48:34:WU00:FS00:0x21:Completed 2325000 out of 7500000 steps (31%)
******************************* Date: 2018-06-26 *******************************
******************************* Date: 2018-06-26 *******************************
In Science We Trust Image
SteveWillis
Posts: 389
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: 99.99% Bug

Post by SteveWillis »

On Linux my reboot.sh scrip handles that (among a good number of other problems to keep all the folding slots folding) by automatically executing a client restart. It can be found at https://drive.google.com/drive/folders/ ... sp=sharing
Image

1080 and 1080TI GPUs on Linux Mint
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 99.99% Bug

Post by Joe_H »

The most common cause of seeing this problem is your video drivers and GPU resetting and calculations on your GPU stopping. If you are seeing this often, your system is not folding stable and you should check for overheating, or reduce any overclocking set on the GPU. You may just need to reduce the GPU memory clock, that is less important to folding speed and cuts the card's power consumption and heat output.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 99.99% Bug

Post by bruce »

FAH cannot do anything to make an unstable GPU into a stable one. You have to figure that out.

The best FAH can do is to figure out how to dump the WU that's hung (which probably isn't what you'd like to happen, since the work you've done on that WU is probably recoverable). As joe suggests, you need to reduce the overclocking or do a better job of managing the heat.

I think this is probably the best we can do https://github.com/FoldingAtHome/fah-issues/issues/1240
SteveWillis
Posts: 389
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: 99.99% Bug

Post by SteveWillis »

When this happens all that is generally required is a client restart (or reboot if you don't know how).

My Linux client restart script. Must be run as root.

Code: Select all

#!/bin/ksh
# first install ksh  sudo apt-get install ksh
if ! [ $(id -u) = 0 ]; then
   echo "This script must be run as root (sudo   path/restartclient.sh)"
   exit 1
fi
set -x
    for i in {1..5} 
    do
        systemctl stop FAHClient || true
        sleep 5
        [[ $(pgrep -c FahCore) -gt 0 ]] && pkill -e -9 FahCore
        [[ $(pgrep -c FAHClient) -gt 0 ]] && pkill -e -9 FAHClient
        sleep 5
        systemctl restart FAHClient  || true
        sleep 5
        running=$(/etc/init.d/FAHClient status|grep -c "fahclient is running")
        if [ $running == 1 ]    #success
        then
            break
        fi
        sleep 10
    done

exit
Image

1080 and 1080TI GPUs on Linux Mint
Aurum
Posts: 292
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: 99.99% Bug

Post by Aurum »

I never overclock or change the memory clock.
Might be a hot card, summer's here.
I'm going to use Steve's reboot script when and if I start folding again.
CURE + FLDC used to pay my $1,000 a month electric bill but no longer. Retired and can't pay that. Anyone want some sweet folding rigs at a reasonable price???
In Science We Trust Image
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: 99.99% Bug

Post by Joe_H »

Are any of your cards factory overclocked? That can be an issue at times, especially if the cooling is marginal for the ambient conditions. The card makers overclock the chips compared to the reference designs from nVidia or AMD, and they are just looking for them to be stable providing video for games and other such software. They are not usually testing for stability doing GPU number crunching.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
SteveWillis
Posts: 389
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: 99.99% Bug

Post by SteveWillis »

Aurum, I wouldn't mind picking up a couple of 1080TIs at the right price. PM me if interested.
Image

1080 and 1080TI GPUs on Linux Mint
Post Reply