FAHCore stops working mid-task

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
Skufer
Posts: 39
Joined: Fri Feb 14, 2014 10:45 am
Location: UK/London

FAHCore stops working mid-task

Post by Skufer »

Hi all,

I've got a strange one here, a couple of times recently I've had a Core_18 task on one of my GTX 960s just stop writing to the log file. It appears to continue to fold in FAHControl but will get to 99.99% and just stop there too. Pausing/running the task doesn't help, only a reboot seems to fix it and when I do the task starts from the percentage it stopped logging at:

Code: Select all

11:56:07:WU02:FS00:Connecting to 171.67.108.200:80
11:56:09:WU02:FS00:Assigned to work server 171.64.65.84
11:56:09:WU02:FS00:Requesting new work unit for slot 00: RUNNING gpu:1:GM206 [GeForce GTX 960] from 171.64.65.84
11:56:09:WU02:FS00:Connecting to 171.64.65.84:8080
11:56:10:WU02:FS00:Downloading 3.48MiB
11:56:11:WU01:FS00:0x18:Saving result file logfile_01.txt
11:56:11:WU01:FS00:0x18:Saving result file checkpointState.xml
11:56:12:WU01:FS00:0x18:Saving result file checkpt.crc
11:56:12:WU01:FS00:0x18:Saving result file log.txt
11:56:12:WU01:FS00:0x18:Saving result file positions.xtc
11:56:13:WU02:FS00:Download complete
11:56:13:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:9104 run:56 clone:0 gen:157 core:0x18 unit:0x000000cb0a3b1e78546a56084b25ac28
11:56:14:WU01:FS00:0x18:Folding@home Core Shutdown: FINISHED_UNIT
11:56:14:WU01:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
11:56:14:WU01:FS00:Sending unit results: id:01 state:SEND error:NO_ERROR project:9128 run:31 clone:0 gen:2 core:0x18 unit:0x000000030a3b1e81554ba88cb196ac28
11:56:14:WU01:FS00:Uploading 7.18MiB to 171.64.65.93
11:56:14:WU01:FS00:Connecting to 171.64.65.93:8080
11:56:14:WU02:FS00:Starting
11:56:14:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 02 -suffix 01 -version 704 -lifeline 948 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
11:56:14:WU02:FS00:Started FahCore on PID 26085
11:56:14:WU02:FS00:Core PID:26089
11:56:14:WU02:FS00:FahCore 0x18 started
11:56:15:WU02:FS00:0x18:*********************** Log Started 2015-06-29T11:56:14Z ***********************
11:56:15:WU02:FS00:0x18:Project: 9104 (Run 56, Clone 0, Gen 157)
11:56:15:WU02:FS00:0x18:Unit: 0x000000cb0a3b1e78546a56084b25ac28
11:56:15:WU02:FS00:0x18:CPU: 0x00000000000000000000000000000000
11:56:15:WU02:FS00:0x18:Machine: 0
11:56:15:WU02:FS00:0x18:Reading tar file state.xml
11:56:15:WU02:FS00:0x18:Reading tar file system.xml
11:56:15:WU02:FS00:0x18:Reading tar file integrator.xml
11:56:15:WU02:FS00:0x18:Reading tar file core.xml
11:56:15:WU02:FS00:0x18:Digital signatures verified
11:56:15:WU02:FS00:0x18:Folding@home GPU core18
11:56:15:WU02:FS00:0x18:Version 0.0.4
11:56:24:WU01:FS00:Upload complete
11:56:24:WU01:FS00:Server responded WORK_ACK (400)
11:56:24:WU01:FS00:Final credit estimate, 29536.00 points
11:56:24:WU01:FS00:Cleaning up
11:56:41:WU02:FS00:0x18:Completed 0 out of 2500000 steps (0%)
11:56:41:WU02:FS00:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
11:57:22:WU00:FS01:0x18:Completed 650000 out of 2500000 steps (26%)
11:59:26:WU02:FS00:0x18:Completed 25000 out of 2500000 steps (1%)
12:00:06:WU00:FS01:0x18:Completed 675000 out of 2500000 steps (27%)
12:02:04:WU02:FS00:0x18:Completed 50000 out of 2500000 steps (2%)
12:02:50:WU00:FS01:0x18:Completed 700000 out of 2500000 steps (28%)
12:04:42:WU02:FS00:0x18:Completed 75000 out of 2500000 steps (3%)
12:05:44:WU00:FS01:0x18:Completed 725000 out of 2500000 steps (29%)
12:07:20:WU02:FS00:0x18:Completed 100000 out of 2500000 steps (4%)
12:08:29:WU00:FS01:0x18:Completed 750000 out of 2500000 steps (30%)
12:10:09:WU02:FS00:0x18:Completed 125000 out of 2500000 steps (5%)
12:11:13:WU00:FS01:0x18:Completed 775000 out of 2500000 steps (31%)
12:12:46:WU02:FS00:0x18:Completed 150000 out of 2500000 steps (6%)
12:13:57:WU00:FS01:0x18:Completed 800000 out of 2500000 steps (32%)
12:15:24:WU02:FS00:0x18:Completed 175000 out of 2500000 steps (7%)
12:16:51:WU00:FS01:0x18:Completed 825000 out of 2500000 steps (33%)
12:18:02:WU02:FS00:0x18:Completed 200000 out of 2500000 steps (8%)
12:19:35:WU00:FS01:0x18:Completed 850000 out of 2500000 steps (34%)
12:20:50:WU02:FS00:0x18:Completed 225000 out of 2500000 steps (9%)
12:22:19:WU00:FS01:0x18:Completed 875000 out of 2500000 steps (35%)
12:23:29:WU02:FS00:0x18:Completed 250000 out of 2500000 steps (10%)
12:25:04:WU00:FS01:0x18:Completed 900000 out of 2500000 steps (36%)
12:26:07:WU02:FS00:0x18:Completed 275000 out of 2500000 steps (11%)
12:27:58:WU00:FS01:0x18:Completed 925000 out of 2500000 steps (37%)
12:28:44:WU02:FS00:0x18:Completed 300000 out of 2500000 steps (12%)
12:30:42:WU00:FS01:0x18:Completed 950000 out of 2500000 steps (38%)
12:31:33:WU02:FS00:0x18:Completed 325000 out of 2500000 steps (13%)
12:33:26:WU00:FS01:0x18:Completed 975000 out of 2500000 steps (39%)
12:34:11:WU02:FS00:0x18:Completed 350000 out of 2500000 steps (14%)
12:36:11:WU00:FS01:0x18:Completed 1000000 out of 2500000 steps (40%)
12:36:49:WU02:FS00:0x18:Completed 375000 out of 2500000 steps (15%)
12:39:05:WU00:FS01:0x18:Completed 1025000 out of 2500000 steps (41%)
12:39:27:WU02:FS00:0x18:Completed 400000 out of 2500000 steps (16%)
12:41:49:WU00:FS01:0x18:Completed 1050000 out of 2500000 steps (42%)
12:42:15:WU02:FS00:0x18:Completed 425000 out of 2500000 steps (17%)
12:44:34:WU00:FS01:0x18:Completed 1075000 out of 2500000 steps (43%)
12:44:53:WU02:FS00:0x18:Completed 450000 out of 2500000 steps (18%)
12:47:18:WU00:FS01:0x18:Completed 1100000 out of 2500000 steps (44%)
12:47:31:WU02:FS00:0x18:Completed 475000 out of 2500000 steps (19%)
12:50:09:WU02:FS00:0x18:Completed 500000 out of 2500000 steps (20%)
12:50:12:WU00:FS01:0x18:Completed 1125000 out of 2500000 steps (45%)
12:52:57:WU00:FS01:0x18:Completed 1150000 out of 2500000 steps (46%)
12:52:57:WU02:FS00:0x18:Completed 525000 out of 2500000 steps (21%)
12:55:35:WU02:FS00:0x18:Completed 550000 out of 2500000 steps (22%)
12:55:41:WU00:FS01:0x18:Completed 1175000 out of 2500000 steps (47%)
12:58:13:WU02:FS00:0x18:Completed 575000 out of 2500000 steps (23%)
12:58:25:WU00:FS01:0x18:Completed 1200000 out of 2500000 steps (48%)
13:00:51:WU02:FS00:0x18:Completed 600000 out of 2500000 steps (24%)
13:01:20:WU00:FS01:0x18:Completed 1225000 out of 2500000 steps (49%)
13:03:40:WU02:FS00:0x18:Completed 625000 out of 2500000 steps (25%)
13:04:04:WU00:FS01:0x18:Completed 1250000 out of 2500000 steps (50%)
13:06:18:WU02:FS00:0x18:Completed 650000 out of 2500000 steps (26%)
13:06:48:WU00:FS01:0x18:Completed 1275000 out of 2500000 steps (51%)
13:08:56:WU02:FS00:0x18:Completed 675000 out of 2500000 steps (27%)
13:09:32:WU00:FS01:0x18:Completed 1300000 out of 2500000 steps (52%)
13:11:34:WU02:FS00:0x18:Completed 700000 out of 2500000 steps (28%)
13:12:27:WU00:FS01:0x18:Completed 1325000 out of 2500000 steps (53%)
13:14:22:WU02:FS00:0x18:Completed 725000 out of 2500000 steps (29%)
13:15:11:WU00:FS01:0x18:Completed 1350000 out of 2500000 steps (54%)
13:17:00:WU02:FS00:0x18:Completed 750000 out of 2500000 steps (30%)
13:17:56:WU00:FS01:0x18:Completed 1375000 out of 2500000 steps (55%)
13:19:38:WU02:FS00:0x18:Completed 775000 out of 2500000 steps (31%)
13:20:40:WU00:FS01:0x18:Completed 1400000 out of 2500000 steps (56%)
13:22:16:WU02:FS00:0x18:Completed 800000 out of 2500000 steps (32%)
13:23:34:WU00:FS01:0x18:Completed 1425000 out of 2500000 steps (57%)
13:25:04:WU02:FS00:0x18:Completed 825000 out of 2500000 steps (33%)
13:26:19:WU00:FS01:0x18:Completed 1450000 out of 2500000 steps (58%)
13:27:42:WU02:FS00:0x18:Completed 850000 out of 2500000 steps (34%)
13:29:03:WU00:FS01:0x18:Completed 1475000 out of 2500000 steps (59%)
13:30:20:WU02:FS00:0x18:Completed 875000 out of 2500000 steps (35%)
13:31:47:WU00:FS01:0x18:Completed 1500000 out of 2500000 steps (60%)
13:32:58:WU02:FS00:0x18:Completed 900000 out of 2500000 steps (36%)
13:34:42:WU00:FS01:0x18:Completed 1525000 out of 2500000 steps (61%)
13:35:47:WU02:FS00:0x18:Completed 925000 out of 2500000 steps (37%)
13:37:26:WU00:FS01:0x18:Completed 1550000 out of 2500000 steps (62%)
13:38:25:WU02:FS00:0x18:Completed 950000 out of 2500000 steps (38%)
13:40:10:WU00:FS01:0x18:Completed 1575000 out of 2500000 steps (63%)
13:41:03:WU02:FS00:0x18:Completed 975000 out of 2500000 steps (39%)
13:42:55:WU00:FS01:0x18:Completed 1600000 out of 2500000 steps (64%)
13:43:41:WU02:FS00:0x18:Completed 1000000 out of 2500000 steps (40%)
13:45:50:WU00:FS01:0x18:Completed 1625000 out of 2500000 steps (65%)
13:46:29:WU02:FS00:0x18:Completed 1025000 out of 2500000 steps (41%)
13:48:34:WU00:FS01:0x18:Completed 1650000 out of 2500000 steps (66%)
13:49:07:WU02:FS00:0x18:Completed 1050000 out of 2500000 steps (42%)
13:51:18:WU00:FS01:0x18:Completed 1675000 out of 2500000 steps (67%)
13:51:45:WU02:FS00:0x18:Completed 1075000 out of 2500000 steps (43%)
13:54:02:WU00:FS01:0x18:Completed 1700000 out of 2500000 steps (68%)
13:54:23:WU02:FS00:0x18:Completed 1100000 out of 2500000 steps (44%)
13:56:57:WU00:FS01:0x18:Completed 1725000 out of 2500000 steps (69%)
13:57:11:WU02:FS00:0x18:Completed 1125000 out of 2500000 steps (45%)
13:59:41:WU00:FS01:0x18:Completed 1750000 out of 2500000 steps (70%)
13:59:49:WU02:FS00:0x18:Completed 1150000 out of 2500000 steps (46%)
14:02:26:WU00:FS01:0x18:Completed 1775000 out of 2500000 steps (71%)
14:02:27:WU02:FS00:0x18:Completed 1175000 out of 2500000 steps (47%)
14:05:06:WU02:FS00:0x18:Completed 1200000 out of 2500000 steps (48%)
14:05:11:WU00:FS01:0x18:Completed 1800000 out of 2500000 steps (72%)
14:07:54:WU02:FS00:0x18:Completed 1225000 out of 2500000 steps (49%)
14:08:05:WU00:FS01:0x18:Completed 1825000 out of 2500000 steps (73%)
14:10:32:WU02:FS00:0x18:Completed 1250000 out of 2500000 steps (50%)
14:10:51:WU00:FS01:0x18:Completed 1850000 out of 2500000 steps (74%)
14:13:10:WU02:FS00:0x18:Completed 1275000 out of 2500000 steps (51%)
14:13:36:WU00:FS01:0x18:Completed 1875000 out of 2500000 steps (75%)
14:15:48:WU02:FS00:0x18:Completed 1300000 out of 2500000 steps (52%)
14:16:21:WU00:FS01:0x18:Completed 1900000 out of 2500000 steps (76%)
14:18:36:WU02:FS00:0x18:Completed 1325000 out of 2500000 steps (53%)
14:19:16:WU00:FS01:0x18:Completed 1925000 out of 2500000 steps (77%)
14:21:14:WU02:FS00:0x18:Completed 1350000 out of 2500000 steps (54%)
14:22:01:WU00:FS01:0x18:Completed 1950000 out of 2500000 steps (78%)
14:23:52:WU02:FS00:0x18:Completed 1375000 out of 2500000 steps (55%)
14:24:47:WU00:FS01:0x18:Completed 1975000 out of 2500000 steps (79%)
14:26:30:WU02:FS00:0x18:Completed 1400000 out of 2500000 steps (56%)
14:27:32:WU00:FS01:0x18:Completed 2000000 out of 2500000 steps (80%)
14:29:19:WU02:FS00:0x18:Completed 1425000 out of 2500000 steps (57%)
14:30:27:WU00:FS01:0x18:Completed 2025000 out of 2500000 steps (81%)
14:31:57:WU02:FS00:0x18:Completed 1450000 out of 2500000 steps (58%)
14:33:13:WU00:FS01:0x18:Completed 2050000 out of 2500000 steps (82%)
14:34:35:WU02:FS00:0x18:Completed 1475000 out of 2500000 steps (59%)
14:35:58:WU00:FS01:0x18:Completed 2075000 out of 2500000 steps (83%)
14:37:13:WU02:FS00:0x18:Completed 1500000 out of 2500000 steps (60%)
14:38:43:WU00:FS01:0x18:Completed 2100000 out of 2500000 steps (84%)
14:41:37:WU00:FS01:0x18:Completed 2125000 out of 2500000 steps (85%)
14:44:21:WU00:FS01:0x18:Completed 2150000 out of 2500000 steps (86%)
******************************* Date: 2015-06-29 *******************************
14:47:05:WU00:FS01:0x18:Completed 2175000 out of 2500000 steps (87%)
14:49:49:WU00:FS01:0x18:Completed 2200000 out of 2500000 steps (88%)
14:52:43:WU00:FS01:0x18:Completed 2225000 out of 2500000 steps (89%)
14:55:26:WU00:FS01:0x18:Completed 2250000 out of 2500000 steps (90%)
14:58:10:WU00:FS01:0x18:Completed 2275000 out of 2500000 steps (91%)
15:00:54:WU00:FS01:0x18:Completed 2300000 out of 2500000 steps (92%)
15:03:48:WU00:FS01:0x18:Completed 2325000 out of 2500000 steps (93%)
15:06:32:WU00:FS01:0x18:Completed 2350000 out of 2500000 steps (94%)
15:09:15:WU00:FS01:0x18:Completed 2375000 out of 2500000 steps (95%)
15:11:59:WU00:FS01:0x18:Completed 2400000 out of 2500000 steps (96%)
15:14:53:WU00:FS01:0x18:Completed 2425000 out of 2500000 steps (97%)
15:17:37:WU00:FS01:0x18:Completed 2450000 out of 2500000 steps (98%)
15:20:21:WU00:FS01:0x18:Completed 2475000 out of 2500000 steps (99%)
15:23:05:WU00:FS01:0x18:Completed 2500000 out of 2500000 steps (100%)
15:23:14:WU01:FS01:Connecting to 171.67.108.200:80
15:23:15:WU01:FS01:Assigned to work server 171.64.65.92
15:23:15:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM206 [GeForce GTX 960] from 171.64.65.92
15:23:15:WU01:FS01:Connecting to 171.64.65.92:8080
15:23:15:WU00:FS01:0x18:Saving result file logfile_01.txt
15:23:15:WU00:FS01:0x18:Saving result file checkpointState.xml
15:23:16:WU01:FS01:Downloading 3.59MiB
15:23:17:WU00:FS01:0x18:Saving result file checkpt.crc
15:23:17:WU00:FS01:0x18:Saving result file log.txt
15:23:17:WU00:FS01:0x18:Saving result file positions.xtc
15:23:18:WU00:FS01:0x18:Folding@home Core Shutdown: FINISHED_UNIT
15:23:19:WU01:FS01:Download complete
15:23:19:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9110 run:5 clone:0 gen:139 core:0x18 unit:0x000000bc0a3b1e80546a987725abb2a7
*********************** Log Started 2015-06-30T09:23:10Z ***********************
09:23:10:************************* Folding@home Client *************************
09:23:10:    Website: http://folding.stanford.edu/
09:23:10:  Copyright: (c) 2009-2014 Stanford University
09:23:10:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:23:10:       Args: --child --lifeline 905 /etc/fahclient/config.xml --run-as fahclient
09:23:10:             --pid-file=/var/run/fahclient.pid --daemon
09:23:10:     Config: /etc/fahclient/config.xml
09:23:10:******************************** Build ********************************
09:23:10:    Version: 7.4.4
09:23:10:       Date: Mar 4 2014
09:23:10:       Time: 12:02:38
09:23:10:    SVN Rev: 4130
09:23:10:     Branch: fah/trunk/client
09:23:10:   Compiler: GNU 4.4.7
09:23:10:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
09:23:10:             -fno-unsafe-math-optimizations -msse2
09:23:10:   Platform: linux2 3.2.0-1-amd64
09:23:10:       Bits: 64
09:23:10:       Mode: Release
09:23:10:******************************* System ********************************
09:23:10:        CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
09:23:10:     CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
09:23:10:       CPUs: 4
09:23:10:     Memory: 3.86GiB
09:23:10:Free Memory: 3.65GiB
09:23:10:    Threads: POSIX_THREADS
09:23:10: OS Version: 3.16
09:23:10:Has Battery: false
09:23:10: On Battery: false
09:23:10: UTC Offset: 1
09:23:10:        PID: 907
09:23:10:        CWD: /var/lib/fahclient
09:23:10:         OS: Linux 3.16.0-4-amd64 x86_64
09:23:10:    OS Arch: AMD64
09:23:10:       GPUs: 2
09:23:10:      GPU 0: NVIDIA:5 GM206 [GeForce GTX 960]
09:23:10:      GPU 1: NVIDIA:5 GM206 [GeForce GTX 960]
09:23:10:       CUDA: 5.2
09:23:10:CUDA Driver: 7000
09:23:10:***********************************************************************
09:23:10:<config>
09:23:10:  <!-- Folding Slot Configuration -->
09:23:10:  <client-type v='advanced'/>
09:23:10:
09:23:10:  <!-- Network -->
09:23:10:  <proxy v=':8080'/>
09:23:10:
09:23:10:  <!-- Slot Control -->
09:23:10:  <power v='full'/>
09:23:10:
09:23:10:  <!-- User Information -->
09:23:10:  <passkey v='********************************'/>
09:23:10:  <team v='212997'/>
09:23:10:  <user v='BestPony'/>
09:23:10:
09:23:10:  <!-- Folding Slots -->
09:23:10:  <slot id='1' type='GPU'/>
09:23:10:  <slot id='0' type='GPU'/>
09:23:10:</config>
09:23:10:Switching to user fahclient
09:23:10:Trying to access database...
09:23:11:Successfully acquired database lock
09:23:11:Enabled folding slot 01: READY gpu:0:GM206 [GeForce GTX 960]
09:23:11:Enabled folding slot 00: READY gpu:1:GM206 [GeForce GTX 960]
09:23:11:WU02:FS00:Starting
09:23:11:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 02 -suffix 01 -version 704 -lifeline 907 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
09:23:11:WU02:FS00:Started FahCore on PID 931
09:23:11:WU02:FS00:Core PID:944
09:23:11:WU02:FS00:FahCore 0x18 started
09:23:12:WU00:FS01:Starting
09:23:12:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 00 -suffix 01 -version 704 -lifeline 907 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
09:23:12:WU00:FS01:Started FahCore on PID 965
09:23:12:WU00:FS01:Core PID:969
09:23:12:WU00:FS01:FahCore 0x18 started
09:23:13:WU00:FS01:0x18:*********************** Log Started 2015-06-30T09:23:12Z ***********************
09:23:13:WU00:FS01:0x18:Project: 9122 (Run 29, Clone 7, Gen 11)
09:23:13:WU00:FS01:0x18:Unit: 0x000000110a3b1e805543eabdaa67b161
09:23:13:WU00:FS01:0x18:CPU: 0x00000000000000000000000000000000
09:23:13:WU00:FS01:0x18:Machine: 1
09:23:13:WU02:FS00:0x18:*********************** Log Started 2015-06-30T09:23:12Z ***********************
09:23:13:WU02:FS00:0x18:Project: 9104 (Run 56, Clone 0, Gen 157)
09:23:13:WU02:FS00:0x18:Unit: 0x000000cb0a3b1e78546a56084b25ac28
09:23:13:WU02:FS00:0x18:CPU: 0x00000000000000000000000000000000
09:23:13:WU02:FS00:0x18:Machine: 0
09:23:13:WU02:FS00:0x18:Digital signatures verified
09:23:13:WU02:FS00:0x18:Folding@home GPU core18
09:23:13:WU02:FS00:0x18:Version 0.0.4
09:23:13:WU00:FS01:0x18:Reading tar file state.xml
09:23:14:WU02:FS00:0x18:  Found a checkpoint file
09:23:15:WU00:FS01:0x18:Reading tar file system.xml
09:23:15:WU00:FS01:0x18:Reading tar file integrator.xml
09:23:15:WU00:FS01:0x18:Reading tar file core.xml
09:23:15:WU00:FS01:0x18:Digital signatures verified
09:23:15:WU00:FS01:0x18:Folding@home GPU core18
09:23:15:WU00:FS01:0x18:Version 0.0.4
09:23:58:WU02:FS00:0x18:Completed 1500000 out of 2500000 steps (60%)
09:23:58:WU02:FS00:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:23:59:WU00:FS01:0x18:Completed 0 out of 2500000 steps (0%)
09:23:59:WU00:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:26:46:WU02:FS00:0x18:Completed 1525000 out of 2500000 steps (61%)
09:26:52:WU00:FS01:0x18:Completed 25000 out of 2500000 steps (1%)
09:29:24:WU02:FS00:0x18:Completed 1550000 out of 2500000 steps (62%)
09:29:36:WU00:FS01:0x18:Completed 50000 out of 2500000 steps (2%)
09:32:02:WU02:FS00:0x18:Completed 1575000 out of 2500000 steps (63%)
09:32:20:WU00:FS01:0x18:Completed 75000 out of 2500000 steps (3%)
09:34:40:WU02:FS00:0x18:Completed 1600000 out of 2500000 steps (64%)
09:35:05:WU00:FS01:0x18:Completed 100000 out of 2500000 steps (4%)
09:37:28:WU02:FS00:0x18:Completed 1625000 out of 2500000 steps (65%)
No other errors seem to happen, the GPUs are both very well cooled though I will admit that the CPU is running hot. Can anyone suggest why this keeps happening?
JimF
Posts: 651
Joined: Thu Jan 21, 2010 2:03 pm

Re: FAHCore stops working mid-task

Post by JimF »

Are you overclocking the 960? I would reduce the clock; running cool is no guarantee there are not errors.
Also, make sure that you leave a CPU core free to support the GPU.
(I have never seen it on my 960 under Win7 64-bit; 352.86 drivers.)
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: FAHCore stops working mid-task

Post by 7im »

If there was a gpu reset (error) the client stops folding, but no indicator is sent to fahcontrol. If restarting the client the work starts up from the last check point, that confirms it. As mentioned above, check OC and temps. If this was in Windows, it would show the gpu reset in the event log. Not sure what the Linux equivalent is.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Skufer
Posts: 39
Joined: Fri Feb 14, 2014 10:45 am
Location: UK/London

Re: FAHCore stops working mid-task

Post by Skufer »

I have overclocked both cards but honestly not by very much at all, I guess I'll have to lose the OC all together just to be sure. I'll be disappointed if that's what's been causing this though.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FAHCore stops working mid-task

Post by bruce »

Looks can be deceiving. When the FAHCore stops writing to the log, it really has stopped processing. That's why when it restarts, it picks up from the last checkpoint that was written.

The software that estimates the progress does assume it's still running and so it improperly reports progress up to 99.99% even though it has stopped getting any progress reports from the FAHCore.
Skufer
Posts: 39
Joined: Fri Feb 14, 2014 10:45 am
Location: UK/London

Re: FAHCore stops working mid-task

Post by Skufer »

I've lost the overclock but it's still happening, now on both cards together :(

There's two physical CPU cores for each card so I don't think that's the issue in this case. Any ideas?
calkapokole
Posts: 80
Joined: Sun Nov 18, 2012 11:03 pm
Hardware configuration: Lenovo IdeaPad Y580: Chipset Intel HM76 | Socket G2 | BIOS 2.07
Display: 15.6" | 1920x1080 | LG LP156WF1-TLC1 | TN LED | glossy
CPU: Intel Core i7-3610QM | 2.3-3.3 GHz | 6 MB L3 | 22 nm | TDP 45 W
iGPU: Intel HD Graphics 4000 (GT2) | 22 nm | 16 Unified Shaders: 1100 Mhz
dGPU: NVIDIA GeForce GTX 660M (GK107) | 28 nm:
- 384 Unified Shaders: 835@1215 MHz (45.5% OC)
- 2 GB GDDR5 128 bit: 1000@1250 (5000 effective) MHz (25% OC)
RAM: Patriot | 16 GB | DDR3 1600 MHz | 11-11-11-28-1 | Dual Channel
SSD 1: Crucial MX200 | mSATA | 250 GB | Micron 16 nm 128 Gb MLC NAND
SSD 2: Crucial MX500 | 2.5" | 2 TB | Micron 256 Gb 64L 3D TLC NAND
HDD: WD Scorpio Black WD7500BPKT | 2.5" | 750 GB | 7200 RPM | 16 MB
ODD: Samsung SN-506BB | 4 MB | BD-RE XL
WiFi: Intel Centrino Wireless-N 2200
OS: Windows 10 Pro (x64) | ForceWare 425.31 WHQL
Cooler: Zalman ZM-NC2000
Location: Poland

Re: FAHCore stops working mid-task

Post by calkapokole »

Try lowering the clocks below the reference level. I would start with the memory clock. My card GTX 660M has default memory clock 5000 MHz and it is unstable in FAH, so I have lowered the clock down to 4000 MHz. Memory clock is not important for FAH, in my case performance decreased only by 1% or 2%, so not very much.
Image
Skufer
Posts: 39
Joined: Fri Feb 14, 2014 10:45 am
Location: UK/London

Re: FAHCore stops working mid-task

Post by Skufer »

I think something is badly wrong with either the GPUs or this system in general, sometimes after a reboot the GPUs are not properly recognised and now I've just lost one (though it still appears to be folding in the logs?)

Code: Select all

:~$ nvidia-smi
Thu Jul  2 12:21:33 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.59     Driver Version: 346.59         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     Off  | 0000:01:00.0     N/A |                  N/A |
| 70%   60C    P2    N/A /  N/A |    320MiB /  2047MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  ERR!               ERR!  | ERR!            ERR! |                 ERR! |
|ERR!  ERR! ERR!    ERR! / ERR! |     27MiB /  2047MiB |    ERR!         ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0            C+G   Not Supported                                         |
|    1            C+G   ERROR: GPU is lost                                    |
+-----------------------------------------------------------------------------+

Code: Select all

*********************** Log Started 2015-07-02T08:12:41Z ***********************
08:12:41:************************* Folding@home Client *************************
08:12:41:    Website: http://folding.stanford.edu/
08:12:41:  Copyright: (c) 2009-2014 Stanford University
08:12:41:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:12:41:       Args: --child --lifeline 910 /etc/fahclient/config.xml --run-as fahclient
08:12:41:             --pid-file=/var/run/fahclient.pid --daemon
08:12:41:     Config: /etc/fahclient/config.xml
08:12:41:******************************** Build ********************************
08:12:41:    Version: 7.4.4
08:12:41:       Date: Mar 4 2014
08:12:41:       Time: 12:02:38
08:12:41:    SVN Rev: 4130
08:12:41:     Branch: fah/trunk/client
08:12:41:   Compiler: GNU 4.4.7
08:12:41:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
08:12:41:             -fno-unsafe-math-optimizations -msse2
08:12:41:   Platform: linux2 3.2.0-1-amd64
08:12:41:       Bits: 64
08:12:41:       Mode: Release
08:12:41:******************************* System ********************************
08:12:41:        CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
08:12:41:     CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
08:12:41:       CPUs: 4
08:12:41:     Memory: 3.86GiB
08:12:41:Free Memory: 3.65GiB
08:12:41:    Threads: POSIX_THREADS
08:12:41: OS Version: 3.16
08:12:41:Has Battery: false
08:12:41: On Battery: false
08:12:41: UTC Offset: 1
08:12:41:        PID: 916
08:12:41:        CWD: /var/lib/fahclient
08:12:41:         OS: Linux 3.16.0-4-amd64 x86_64
08:12:41:    OS Arch: AMD64
08:12:41:       GPUs: 2
08:12:41:      GPU 0: NVIDIA:5 GM206 [GeForce GTX 960]
08:12:41:      GPU 1: NVIDIA:5 GM206 [GeForce GTX 960]
08:12:41:       CUDA: 5.2
08:12:41:CUDA Driver: 7000
08:12:41:***********************************************************************
08:12:41:<config>
08:12:41:  <!-- Folding Slot Configuration -->
08:12:41:  <client-type v='advanced'/>
08:12:41:
08:12:41:  <!-- Network -->
08:12:41:  <proxy v=':8080'/>
08:12:41:
08:12:41:  <!-- Slot Control -->
08:12:41:  <power v='full'/>
08:12:41:
08:12:41:  <!-- User Information -->
08:12:41:  <passkey v='********************************'/>
08:12:41:  <team v='212997'/>
08:12:41:  <user v='BestPony'/>
08:12:41:
08:12:41:  <!-- Folding Slots -->
08:12:41:  <slot id='1' type='GPU'/>
08:12:41:  <slot id='0' type='GPU'/>
08:12:41:</config>
08:12:41:Switching to user fahclient
08:12:41:Trying to access database...
08:12:42:Successfully acquired database lock
08:12:42:Enabled folding slot 01: READY gpu:0:GM206 [GeForce GTX 960]
08:12:42:Enabled folding slot 00: READY gpu:1:GM206 [GeForce GTX 960]
08:12:42:WU02:FS00:Starting
08:12:42:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 02 -suffix 01 -version 704 -lifeline 916 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
08:12:42:WU02:FS00:Started FahCore on PID 945
08:12:42:WU02:FS00:Core PID:962
08:12:42:WU02:FS00:FahCore 0x18 started
08:12:43:WU00:FS01:Starting
08:12:43:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 00 -suffix 01 -version 704 -lifeline 916 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
08:12:43:WU00:FS01:Started FahCore on PID 1004
08:12:43:WU00:FS01:Core PID:1008
08:12:43:WU00:FS01:FahCore 0x18 started
08:12:46:WU00:FS01:0x18:*********************** Log Started 2015-07-02T08:12:45Z ***********************
08:12:46:WU02:FS00:0x18:*********************** Log Started 2015-07-02T08:12:46Z ***********************
08:12:46:WU02:FS00:0x18:Project: 9116 (Run 63, Clone 0, Gen 79)
08:12:46:WU02:FS00:0x18:Unit: 0x000000610a3b1e7854c17105f057e2d3
08:12:46:WU02:FS00:0x18:CPU: 0x00000000000000000000000000000000
08:12:46:WU02:FS00:0x18:Machine: 0
08:12:46:WU02:FS00:0x18:Digital signatures verified
08:12:46:WU02:FS00:0x18:Folding@home GPU core18
08:12:46:WU02:FS00:0x18:Version 0.0.4
08:12:47:WU00:FS01:0x18:Project: 9137 (Run 60, Clone 0, Gen 66)
08:12:47:WU00:FS01:0x18:Unit: 0x000000440a3b1e61556647d9d24d0f27
08:12:47:WU00:FS01:0x18:CPU: 0x00000000000000000000000000000000
08:12:47:WU00:FS01:0x18:Machine: 1
08:12:47:WU00:FS01:0x18:Reading tar file core.xml
08:12:47:WU00:FS01:0x18:Reading tar file system.xml
08:12:47:WU02:FS00:0x18:  Found a checkpoint file
08:12:48:WU00:FS01:0x18:Reading tar file integrator.xml
08:12:48:WU00:FS01:0x18:Reading tar file state.xml
08:12:49:WU00:FS01:0x18:Digital signatures verified
08:12:49:WU00:FS01:0x18:Folding@home GPU core18
08:12:49:WU00:FS01:0x18:Version 0.0.4
08:13:27:WU02:FS00:0x18:Completed 2100000 out of 2500000 steps (84%)
08:13:27:WU02:FS00:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:13:27:WU00:FS01:0x18:Completed 0 out of 2500000 steps (0%)
08:13:27:WU00:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:15:49:WU02:FS00:0x18:Completed 2125000 out of 2500000 steps (85%)
08:15:53:WU00:FS01:0x18:Completed 25000 out of 2500000 steps (1%)
08:18:05:WU02:FS00:0x18:Completed 2150000 out of 2500000 steps (86%)
08:18:11:WU00:FS01:0x18:Completed 50000 out of 2500000 steps (2%)
08:20:20:WU02:FS00:0x18:Completed 2175000 out of 2500000 steps (87%)
08:20:29:WU00:FS01:0x18:Completed 75000 out of 2500000 steps (3%)
08:22:36:WU02:FS00:0x18:Completed 2200000 out of 2500000 steps (88%)
08:22:47:WU00:FS01:0x18:Completed 100000 out of 2500000 steps (4%)
08:25:01:WU02:FS00:0x18:Completed 2225000 out of 2500000 steps (89%)
08:25:14:WU00:FS01:0x18:Completed 125000 out of 2500000 steps (5%)
08:27:16:WU02:FS00:0x18:Completed 2250000 out of 2500000 steps (90%)
08:27:32:WU00:FS01:0x18:Completed 150000 out of 2500000 steps (6%)
08:29:32:WU02:FS00:0x18:Completed 2275000 out of 2500000 steps (91%)
08:29:50:WU00:FS01:0x18:Completed 175000 out of 2500000 steps (7%)
08:31:48:WU02:FS00:0x18:Completed 2300000 out of 2500000 steps (92%)
08:32:08:WU00:FS01:0x18:Completed 200000 out of 2500000 steps (8%)
08:34:12:WU02:FS00:0x18:Completed 2325000 out of 2500000 steps (93%)
08:34:34:WU00:FS01:0x18:Completed 225000 out of 2500000 steps (9%)
08:36:28:WU02:FS00:0x18:Completed 2350000 out of 2500000 steps (94%)
08:36:52:WU00:FS01:0x18:Completed 250000 out of 2500000 steps (10%)
08:38:43:WU02:FS00:0x18:Completed 2375000 out of 2500000 steps (95%)
08:39:10:WU00:FS01:0x18:Completed 275000 out of 2500000 steps (11%)
08:40:59:WU02:FS00:0x18:Completed 2400000 out of 2500000 steps (96%)
08:41:28:WU00:FS01:0x18:Completed 300000 out of 2500000 steps (12%)
08:43:23:WU02:FS00:0x18:Completed 2425000 out of 2500000 steps (97%)
08:43:54:WU00:FS01:0x18:Completed 325000 out of 2500000 steps (13%)
08:45:39:WU02:FS00:0x18:Completed 2450000 out of 2500000 steps (98%)
08:46:12:WU00:FS01:0x18:Completed 350000 out of 2500000 steps (14%)
08:47:55:WU02:FS00:0x18:Completed 2475000 out of 2500000 steps (99%)
08:48:30:WU00:FS01:0x18:Completed 375000 out of 2500000 steps (15%)
08:50:11:WU02:FS00:0x18:Completed 2500000 out of 2500000 steps (100%)
08:50:15:WU03:FS00:Connecting to 171.67.108.200:80
08:50:16:WU03:FS00:Assigned to work server 171.64.65.93
08:50:16:WU03:FS00:Requesting new work unit for slot 00: RUNNING gpu:1:GM206 [GeForce GTX 960] from 171.64.65.93
08:50:16:WU03:FS00:Connecting to 171.64.65.93:8080
08:50:17:WU03:FS00:Downloading 3.52MiB
08:50:19:WU03:FS00:Download complete
08:50:19:WU03:FS00:Received Unit: id:03 state:DOWNLOAD error:NO_ERROR project:9106 run:7 clone:3 gen:148 core:0x18 unit:0x000000b90a3b1e81546bd2f00cc4c610
08:50:19:WU02:FS00:0x18:Saving result file logfile_01.txt
08:50:19:WU02:FS00:0x18:Saving result file checkpointState.xml
08:50:20:WU02:FS00:0x18:Saving result file checkpt.crc
08:50:20:WU02:FS00:0x18:Saving result file log.txt
08:50:20:WU02:FS00:0x18:Saving result file positions.xtc
08:50:21:WU02:FS00:0x18:Folding@home Core Shutdown: FINISHED_UNIT
08:50:22:WU02:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:50:22:WU02:FS00:Sending unit results: id:02 state:SEND error:NO_ERROR project:9116 run:63 clone:0 gen:79 core:0x18 unit:0x000000610a3b1e7854c17105f057e2d3
08:50:22:WU02:FS00:Uploading 5.30MiB to 171.64.65.84
08:50:22:WU02:FS00:Connecting to 171.64.65.84:8080
08:50:22:WU03:FS00:Starting
08:50:22:WU03:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 03 -suffix 01 -version 704 -lifeline 916 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
08:50:22:WU03:FS00:Started FahCore on PID 1689
08:50:22:WU03:FS00:Core PID:1693
08:50:22:WU03:FS00:FahCore 0x18 started
08:50:22:WU03:FS00:0x18:*********************** Log Started 2015-07-02T08:50:22Z ***********************
08:50:22:WU03:FS00:0x18:Project: 9106 (Run 7, Clone 3, Gen 148)
08:50:22:WU03:FS00:0x18:Unit: 0x000000b90a3b1e81546bd2f00cc4c610
08:50:22:WU03:FS00:0x18:CPU: 0x00000000000000000000000000000000
08:50:22:WU03:FS00:0x18:Machine: 0
08:50:22:WU03:FS00:0x18:Reading tar file state.xml
08:50:23:WU03:FS00:0x18:Reading tar file system.xml
08:50:23:WU03:FS00:0x18:Reading tar file integrator.xml
08:50:23:WU03:FS00:0x18:Reading tar file core.xml
08:50:23:WU03:FS00:0x18:Digital signatures verified
08:50:23:WU03:FS00:0x18:Folding@home GPU core18
08:50:23:WU03:FS00:0x18:Version 0.0.4
08:50:30:WU02:FS00:Upload complete
08:50:30:WU02:FS00:Server responded WORK_ACK (400)
08:50:30:WU02:FS00:Final credit estimate, 12363.00 points
08:50:30:WU02:FS00:Cleaning up
08:50:48:WU00:FS01:0x18:Completed 400000 out of 2500000 steps (16%)
08:50:48:WU03:FS00:0x18:Completed 0 out of 2500000 steps (0%)
08:50:48:WU03:FS00:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:53:14:WU00:FS01:0x18:Completed 425000 out of 2500000 steps (17%)
08:53:47:WU03:FS00:0x18:Completed 25000 out of 2500000 steps (1%)
08:55:32:WU00:FS01:0x18:Completed 450000 out of 2500000 steps (18%)
08:56:38:WU03:FS00:0x18:Completed 50000 out of 2500000 steps (2%)
08:57:50:WU00:FS01:0x18:Completed 475000 out of 2500000 steps (19%)
08:59:29:WU03:FS00:0x18:Completed 75000 out of 2500000 steps (3%)
09:00:08:WU00:FS01:0x18:Completed 500000 out of 2500000 steps (20%)
09:02:19:WU03:FS00:0x18:Completed 100000 out of 2500000 steps (4%)
09:02:34:WU00:FS01:0x18:Completed 525000 out of 2500000 steps (21%)
09:04:52:WU00:FS01:0x18:Completed 550000 out of 2500000 steps (22%)
09:05:20:WU03:FS00:0x18:Completed 125000 out of 2500000 steps (5%)
09:07:10:WU00:FS01:0x18:Completed 575000 out of 2500000 steps (23%)
09:08:11:WU03:FS00:0x18:Completed 150000 out of 2500000 steps (6%)
09:09:28:WU00:FS01:0x18:Completed 600000 out of 2500000 steps (24%)
09:11:02:WU03:FS00:0x18:Completed 175000 out of 2500000 steps (7%)
09:11:55:WU00:FS01:0x18:Completed 625000 out of 2500000 steps (25%)
09:13:53:WU03:FS00:0x18:Completed 200000 out of 2500000 steps (8%)
09:14:12:WU00:FS01:0x18:Completed 650000 out of 2500000 steps (26%)
09:16:30:WU00:FS01:0x18:Completed 675000 out of 2500000 steps (27%)
09:16:53:WU03:FS00:0x18:Completed 225000 out of 2500000 steps (9%)
09:18:48:WU00:FS01:0x18:Completed 700000 out of 2500000 steps (28%)
09:19:44:WU03:FS00:0x18:Completed 250000 out of 2500000 steps (10%)
09:21:15:WU00:FS01:0x18:Completed 725000 out of 2500000 steps (29%)
09:23:32:WU00:FS01:0x18:Completed 750000 out of 2500000 steps (30%)
09:25:50:WU00:FS01:0x18:Completed 775000 out of 2500000 steps (31%)
09:28:08:WU00:FS01:0x18:Completed 800000 out of 2500000 steps (32%)
09:30:34:WU00:FS01:0x18:Completed 825000 out of 2500000 steps (33%)
09:31:50:WARNING:WU03:FS00:FahCore returned: UNKNOWN_ENUM (127 = 0x7f)
09:31:50:WU03:FS00:Starting
09:31:50:WU03:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18 -dir 03 -suffix 01 -version 704 -lifeline 916 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
09:31:50:WU03:FS00:Started FahCore on PID 1747
09:31:50:WU03:FS00:Core PID:1751
09:31:50:WU03:FS00:FahCore 0x18 started
09:31:51:WU03:FS00:0x18:*********************** Log Started 2015-07-02T09:31:50Z ***********************
09:31:51:WU03:FS00:0x18:Project: 9106 (Run 7, Clone 3, Gen 148)
09:31:51:WU03:FS00:0x18:Unit: 0x000000b90a3b1e81546bd2f00cc4c610
09:31:51:WU03:FS00:0x18:CPU: 0x00000000000000000000000000000000
09:31:51:WU03:FS00:0x18:Machine: 0
09:31:51:WU03:FS00:0x18:Digital signatures verified
09:31:51:WU03:FS00:0x18:Folding@home GPU core18
09:31:51:WU03:FS00:0x18:Version 0.0.4
09:31:51:WU03:FS00:0x18:  Found a checkpoint file
09:32:13:WU03:FS00:0x18:Completed 200000 out of 2500000 steps (8%)
09:32:13:WU03:FS00:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:33:26:WU00:FS01:0x18:Completed 850000 out of 2500000 steps (34%)
09:37:35:WU03:FS00:0x18:Completed 225000 out of 2500000 steps (9%)
09:38:13:WU00:FS01:0x18:Completed 875000 out of 2500000 steps (35%)
09:42:49:WU03:FS00:0x18:Completed 250000 out of 2500000 steps (10%)
09:43:01:WU00:FS01:0x18:Completed 900000 out of 2500000 steps (36%)
09:47:55:WU03:FS00:0x18:Completed 275000 out of 2500000 steps (11%)
09:47:58:WU00:FS01:0x18:Completed 925000 out of 2500000 steps (37%)
09:52:46:WU00:FS01:0x18:Completed 950000 out of 2500000 steps (38%)
09:53:09:WU03:FS00:0x18:Completed 300000 out of 2500000 steps (12%)
09:57:23:WU00:FS01:0x18:Completed 975000 out of 2500000 steps (39%)
09:58:32:WU03:FS00:0x18:Completed 325000 out of 2500000 steps (13%)
10:02:11:WU00:FS01:0x18:Completed 1000000 out of 2500000 steps (40%)
10:03:39:WU03:FS00:0x18:Completed 350000 out of 2500000 steps (14%)
10:07:08:WU00:FS01:0x18:Completed 1025000 out of 2500000 steps (41%)
10:08:53:WU03:FS00:0x18:Completed 375000 out of 2500000 steps (15%)
10:11:56:WU00:FS01:0x18:Completed 1050000 out of 2500000 steps (42%)
10:14:06:WU03:FS00:0x18:Completed 400000 out of 2500000 steps (16%)
10:16:33:WU00:FS01:0x18:Completed 1075000 out of 2500000 steps (43%)
10:19:30:WU03:FS00:0x18:Completed 425000 out of 2500000 steps (17%)
10:21:21:WU00:FS01:0x18:Completed 1100000 out of 2500000 steps (44%)
10:24:36:WU03:FS00:0x18:Completed 450000 out of 2500000 steps (18%)
10:26:18:WU00:FS01:0x18:Completed 1125000 out of 2500000 steps (45%)
10:29:50:WU03:FS00:0x18:Completed 475000 out of 2500000 steps (19%)
10:31:06:WU00:FS01:0x18:Completed 1150000 out of 2500000 steps (46%)
10:35:04:WU03:FS00:0x18:Completed 500000 out of 2500000 steps (20%)
10:35:44:WU00:FS01:0x18:Completed 1175000 out of 2500000 steps (47%)
10:40:28:WU03:FS00:0x18:Completed 525000 out of 2500000 steps (21%)
10:40:32:WU00:FS01:0x18:Completed 1200000 out of 2500000 steps (48%)
10:45:28:WU00:FS01:0x18:Completed 1225000 out of 2500000 steps (49%)
10:45:34:WU03:FS00:0x18:Completed 550000 out of 2500000 steps (22%)
10:50:16:WU00:FS01:0x18:Completed 1250000 out of 2500000 steps (50%)
10:50:48:WU03:FS00:0x18:Completed 575000 out of 2500000 steps (23%)
10:55:04:WU00:FS01:0x18:Completed 1275000 out of 2500000 steps (51%)
10:56:02:WU03:FS00:0x18:Completed 600000 out of 2500000 steps (24%)
10:59:42:WU00:FS01:0x18:Completed 1300000 out of 2500000 steps (52%)
11:01:18:WU03:FS00:0x18:Completed 625000 out of 2500000 steps (25%)
11:04:39:WU00:FS01:0x18:Completed 1325000 out of 2500000 steps (53%)
11:06:32:WU03:FS00:0x18:Completed 650000 out of 2500000 steps (26%)
11:09:27:WU00:FS01:0x18:Completed 1350000 out of 2500000 steps (54%)
11:11:46:WU03:FS00:0x18:Completed 675000 out of 2500000 steps (27%)
11:14:15:WU00:FS01:0x18:Completed 1375000 out of 2500000 steps (55%)
11:16:59:WU03:FS00:0x18:Completed 700000 out of 2500000 steps (28%)
11:18:52:WU00:FS01:0x18:Completed 1400000 out of 2500000 steps (56%)
11:22:16:WU03:FS00:0x18:Completed 725000 out of 2500000 steps (29%)
11:23:49:WU00:FS01:0x18:Completed 1425000 out of 2500000 steps (57%)
I'm going to try to find a surrogate system to put these things into and see how that goes.
bollix47
Posts: 2957
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: FAHCore stops working mid-task

Post by bollix47 »

Were both GPUs in the system when the drivers were installed?

If not, reinstall the drivers.

If so, try the following:

pause all slots
exit the client
open a terminal and run:

Code: Select all

sudo nvidia-xconfig --enable-all-gpus
if you need fan control:

Code: Select all

sudo nvidia-xconfig --cool-bits=4
reboot


source: viewtopic.php?p=267165#p267165
Post Reply