Re: Bad work units on NV GPU slot
Posted: Sun Oct 23, 2016 6:51 am
Hi Folding folks
Just want to report that my folding machines are failing on a large number of WUs for some reason. They are all on GPU slots.
They are all on different versions of Windows 10, but all run 7.4.4 (tried the later version but had too many problems with my multiple GPU set ups)and have the latest nVidia driver ( 375.57). The main problem on one machine relates to a GTX 780 - but its companion 980 has had similar problems - it's just not showing that at the moment.
I observe for each machine frequent downloads of WUs, then the machine discarding them and trying another until finally the GPU slot fails.
I attach the log file from one of them:
BTW, I have stopped folding on my CPU slots as we are heading to summer and I want to reduce thermal overload.
Any suggestions?
Cheers
Nick
Just want to report that my folding machines are failing on a large number of WUs for some reason. They are all on GPU slots.
They are all on different versions of Windows 10, but all run 7.4.4 (tried the later version but had too many problems with my multiple GPU set ups)and have the latest nVidia driver ( 375.57). The main problem on one machine relates to a GTX 780 - but its companion 980 has had similar problems - it's just not showing that at the moment.
I observe for each machine frequent downloads of WUs, then the machine discarding them and trying another until finally the GPU slot fails.
I attach the log file from one of them:
Code: Select all
*********************** Log Started 2016-10-23T06:33:49Z ***********************
06:34:23:WU00:FS01:Connecting to 171.67.108.45:80
06:34:26:WU00:FS01:Assigned to work server 140.163.4.244
06:34:26:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM204 [GeForce GTX 980] from 140.163.4.244
06:34:26:WU00:FS01:Connecting to 140.163.4.244:8080
06:34:28:WU00:FS01:Downloading 2.77MiB
06:34:30:WU00:FS01:Download complete
06:34:30:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13500 run:2 clone:419 gen:28 core:0x21 unit:0x000000268ca304f457a359cb20d62cbd
06:34:30:WU00:FS01:Starting
06:34:30:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/nickm/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 704 -lifeline 8256 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
06:34:30:WU00:FS01:Started FahCore on PID 15704
06:34:31:WU00:FS01:Core PID:5532
06:34:31:WU00:FS01:FahCore 0x21 started
06:34:32:WU00:FS01:0x21:*********************** Log Started 2016-10-23T06:34:32Z ***********************
06:34:32:WU00:FS01:0x21:Project: 13500 (Run 2, Clone 419, Gen 28)
06:34:32:WU00:FS01:0x21:Unit: 0x000000268ca304f457a359cb20d62cbd
06:34:32:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
06:34:32:WU00:FS01:0x21:Machine: 1
06:34:32:WU00:FS01:0x21:Reading tar file core.xml
06:34:32:WU00:FS01:0x21:Reading tar file system.xml
06:34:32:WU00:FS01:0x21:Reading tar file integrator.xml
06:34:32:WU00:FS01:0x21:Reading tar file state.xml
06:34:33:WU00:FS01:0x21:Digital signatures verified
06:34:33:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
06:34:33:WU00:FS01:0x21:Version 0.0.17
06:34:40:WU00:FS01:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
06:34:40:WU00:FS01:0x21:Saving result file logfile_01.txt
06:34:40:WU00:FS01:0x21:Saving result file log.txt
06:34:40:WU00:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:34:43:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:34:43:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13500 run:2 clone:419 gen:28 core:0x21 unit:0x000000268ca304f457a359cb20d62cbd
06:34:43:WU00:FS01:Uploading 2.54KiB to 140.163.4.244
06:34:43:WU00:FS01:Connecting to 140.163.4.244:8080
06:34:44:WU00:FS01:Upload complete
06:34:44:WU00:FS01:Server responded WORK_ACK (400)
06:34:44:WU00:FS01:Cleaning up
06:35:05:WU00:FS02:Connecting to 171.67.108.45:80
06:35:06:WU00:FS02:Assigned to work server 140.163.4.244
06:35:06:WU00:FS02:Requesting new work unit for slot 02: READY gpu:1:GK110 [GeForce GTX 780] from 140.163.4.244
06:35:06:WU00:FS02:Connecting to 140.163.4.244:8080
06:35:08:WU00:FS02:Downloading 2.54MiB
06:35:11:WU00:FS02:Download complete
06:35:11:WU00:FS02:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:10490 run:232 clone:0 gen:444 core:0x18 unit:0x000001ff8ca304f45537e902f17f7939
06:35:11:WU00:FS02:Starting
06:35:11:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/nickm/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 00 -suffix 01 -version 704 -lifeline 8256 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
06:35:11:WU00:FS02:Started FahCore on PID 15424
06:35:11:WU00:FS02:Core PID:3376
06:35:11:WU00:FS02:FahCore 0x18 started
06:35:13:WU00:FS02:0x18:*********************** Log Started 2016-10-23T06:35:12Z ***********************
06:35:13:WU00:FS02:0x18:Project: 10490 (Run 232, Clone 0, Gen 444)
06:35:13:WU00:FS02:0x18:Unit: 0x000001ff8ca304f45537e902f17f7939
06:35:13:WU00:FS02:0x18:CPU: 0x00000000000000000000000000000000
06:35:13:WU00:FS02:0x18:Machine: 2
06:35:13:WU00:FS02:0x18:Reading tar file core.xml
06:35:13:WU00:FS02:0x18:Reading tar file system.xml
06:35:13:WU00:FS02:0x18:Reading tar file integrator.xml
06:35:13:WU00:FS02:0x18:Reading tar file state.xml
06:35:13:WU00:FS02:0x18:Digital signatures verified
06:35:13:WU00:FS02:0x18:Folding@home GPU core18
06:35:13:WU00:FS02:0x18:Version 0.0.4
06:35:28:WU00:FS02:0x18:Completed 0 out of 5000000 steps (0%)
06:35:28:WU00:FS02:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
06:37:00:WU00:FS02:0x18:ERROR:exception: Error downloading array posq: clEnqueueReadBuffer (-5)
06:37:00:WU00:FS02:0x18:Saving result file logfile_01.txt
06:37:00:WU00:FS02:0x18:Saving result file log.txt
06:37:00:WU00:FS02:0x18:Folding@home Core Shutdown: BAD_WORK_UNIT
06:37:03:WARNING:WU00:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:37:03:WU00:FS02:Sending unit results: id:00 state:SEND error:FAULTY project:10490 run:232 clone:0 gen:444 core:0x18 unit:0x000001ff8ca304f45537e902f17f7939
06:37:03:WU00:FS02:Uploading 2.66KiB to 140.163.4.244
06:37:03:WU00:FS02:Connecting to 140.163.4.244:8080
06:37:04:WU00:FS02:Upload complete
06:37:04:WU00:FS02:Server responded WORK_ACK (400)
06:37:04:WU00:FS02:Cleaning up
06:37:23:WU00:FS02:Connecting to 171.67.108.45:80
06:37:25:WU00:FS02:Assigned to work server 171.67.108.105
06:37:25:WU00:FS02:Requesting new work unit for slot 02: READY gpu:1:GK110 [GeForce GTX 780] from 171.67.108.105
06:37:25:WU00:FS02:Connecting to 171.67.108.105:8080
06:37:26:WU00:FS02:Downloading 20.15MiB
06:37:31:WU00:FS02:Download complete
06:37:31:WU00:FS02:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9177 run:7 clone:8 gen:50 core:0x21 unit:0x0000003dab436c6957b24c29a356c742
06:37:31:WU00:FS02:Starting
06:37:31:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/nickm/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 704 -lifeline 8256 -checkpoint 15 -gpu 1 -gpu-vendor nvidia
06:37:31:WU00:FS02:Started FahCore on PID 5784
06:37:31:WU00:FS02:Core PID:7772
06:37:31:WU00:FS02:FahCore 0x21 started
06:37:33:WU00:FS02:0x21:*********************** Log Started 2016-10-23T06:37:32Z ***********************
06:37:33:WU00:FS02:0x21:Project: 9177 (Run 7, Clone 8, Gen 50)
06:37:33:WU00:FS02:0x21:Unit: 0x0000003dab436c6957b24c29a356c742
06:37:33:WU00:FS02:0x21:CPU: 0x00000000000000000000000000000000
06:37:33:WU00:FS02:0x21:Machine: 2
06:37:33:WU00:FS02:0x21:Reading tar file core.xml
06:37:33:WU00:FS02:0x21:Reading tar file integrator.xml
06:37:33:WU00:FS02:0x21:Reading tar file state.xml
06:37:33:WU00:FS02:0x21:Reading tar file system.xml
06:37:33:WU00:FS02:0x21:Digital signatures verified
06:37:33:WU00:FS02:0x21:Folding@home GPU Core21 Folding@home Core
06:37:33:WU00:FS02:0x21:Version 0.0.17
06:37:38:WU00:FS02:0x21:ERROR:exception: Error downloading array interactionCount: clEnqueueReadBuffer (-5)
06:37:38:WU00:FS02:0x21:Saving result file logfile_01.txt
06:37:38:WU00:FS02:0x21:Saving result file log.txt
06:37:38:WU00:FS02:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
06:37:42:WARNING:WU00:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:37:42:WU00:FS02:Sending unit results: id:00 state:SEND error:FAULTY project:9177 run:7 clone:8 gen:50 core:0x21 unit:0x0000003dab436c6957b24c29a356c742
06:37:42:WU00:FS02:Uploading 7.50KiB to 171.67.108.105
06:37:42:WU00:FS02:Connecting to 171.67.108.105:8080
06:37:47:WU00:FS02:Upload complete
06:37:47:WU00:FS02:Server responded WORK_ACK (400)
06:37:47:WU00:FS02:Cleaning up
06:37:50:WU00:FS01:Connecting to 171.67.108.45:80
06:37:51:WU00:FS01:Assigned to work server 171.67.108.155
06:37:51:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM204 [GeForce GTX 980] from 171.67.108.155
06:37:51:WU00:FS01:Connecting to 171.67.108.155:8080
06:37:54:WU00:FS01:Downloading 902.77KiB
06:37:55:WU00:FS01:Download complete
06:37:55:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9660 run:0 clone:73 gen:87 core:0x18 unit:0x00000069ab436c9b56de69ba9dccc137
06:37:55:WU00:FS01:Starting
06:37:55:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/nickm/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 00 -suffix 01 -version 704 -lifeline 8256 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
06:37:55:WU00:FS01:Started FahCore on PID 1176
06:37:55:WU00:FS01:Core PID:14804
06:37:55:WU00:FS01:FahCore 0x18 started
06:37:57:WU00:FS01:0x18:*********************** Log Started 2016-10-23T06:37:57Z ***********************
06:37:57:WU00:FS01:0x18:Project: 9660 (Run 0, Clone 73, Gen 87)
06:37:57:WU00:FS01:0x18:Unit: 0x00000069ab436c9b56de69ba9dccc137
06:37:57:WU00:FS01:0x18:CPU: 0x00000000000000000000000000000000
06:37:57:WU00:FS01:0x18:Machine: 1
06:37:57:WU00:FS01:0x18:Reading tar file core.xml
06:37:57:WU00:FS01:0x18:Reading tar file integrator.xml
06:37:57:WU00:FS01:0x18:Reading tar file state.xml
06:37:57:WU00:FS01:0x18:Reading tar file system.xml
06:37:57:WU00:FS01:0x18:Digital signatures verified
06:37:57:WU00:FS01:0x18:Folding@home GPU core18
06:37:57:WU00:FS01:0x18:Version 0.0.4
06:38:06:WU00:FS01:0x18:Completed 0 out of 2000000 steps (0%)
06:38:06:WU00:FS01:0x18:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
06:38:37:WU00:FS01:0x18:Completed 20000 out of 2000000 steps (1%)
06:39:12:WU00:FS01:0x18:Completed 40000 out of 2000000 steps (2%)
06:39:40:WU00:FS01:0x18:Completed 60000 out of 2000000 steps (3%)
06:40:11:WU00:FS01:0x18:Completed 80000 out of 2000000 steps (4%)
06:40:39:WU00:FS01:0x18:Completed 100000 out of 2000000 steps (5%)
06:41:11:WU00:FS01:0x18:Completed 120000 out of 2000000 steps (6%)
06:41:39:WU00:FS01:0x18:Completed 140000 out of 2000000 steps (7%)
06:42:07:WU00:FS01:0x18:Completed 160000 out of 2000000 steps (8%)
06:42:35:WU00:FS01:0x18:Completed 180000 out of 2000000 steps (9%)
06:43:03:WU00:FS01:0x18:Completed 200000 out of 2000000 steps (10%)
06:43:34:WU00:FS01:0x18:Completed 220000 out of 2000000 steps (11%)
06:44:02:WU00:FS01:0x18:Completed 240000 out of 2000000 steps (12%)
06:44:30:WU00:FS01:0x18:Completed 260000 out of 2000000 steps (13%)
06:44:58:WU00:FS01:0x18:Completed 280000 out of 2000000 steps (14%)
06:45:26:WU00:FS01:0x18:Completed 300000 out of 2000000 steps (15%)
06:45:57:WU00:FS01:0x18:Completed 320000 out of 2000000 steps (16%)
06:46:25:WU00:FS01:0x18:Completed 340000 out of 2000000 steps (17%)
06:46:53:WU00:FS01:0x18:Completed 360000 out of 2000000 steps (18%)
06:47:21:WU00:FS01:0x18:Completed 380000 out of 2000000 steps (19%)
06:47:49:WU00:FS01:0x18:Completed 400000 out of 2000000 steps (20%)
06:48:20:WU00:FS01:0x18:Completed 420000 out of 2000000 steps (21%)
Any suggestions?
Cheers
Nick