NVIDIA GPU units now failing on CENTOS7
Posted: Mon Jan 03, 2022 2:36 pm
				
				I have been using this hardware for over a year.
Recently the GPU work units have been failing.
If I pause and restart the slot it will eventually get a work unit that succeeds.
I have installed the latest NVIDIA drivers
Here is a sample from the log
			Recently the GPU work units have been failing.
If I pause and restart the slot it will eventually get a work unit that succeeds.
I have installed the latest NVIDIA drivers
Here is a sample from the log
Code: Select all
14:26:38:WU02:FS02:Sending unit results: id:02 state:SEND error:FAILED project:18432 run:93 clone:4 gen:149 core:0x22 unit:0x0000000400000095000048000000005d
14:26:38:WU02:FS02:Connecting to 129.32.209.202:8080
14:26:38:WU01:FS02:Connecting to assign1.foldingathome.org:80
14:26:38:WU02:FS02:Server responded WORK_ACK (400)
14:26:38:WU02:FS02:Cleaning up
14:26:39:WU01:FS02:Assigned to work server 34.72.228.44
14:26:39:WU01:FS02:Requesting new work unit for slot 02: gpu:1:0 TU106 [Geforce RTX 2060] from 34.72.228.44
14:26:39:WU01:FS02:Connecting to 34.72.228.44:8080
14:26:39:WU01:FS02:Downloading 8.27MiB
14:26:45:WU01:FS02:Download 100.00%
14:26:45:WU01:FS02:Download complete
14:26:45:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:18021 run:22 clone:10 gen:70 core:0x22 unit:0x0000000a000000460000466500000016
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31517
14:26:45:WU01:FS02:Core PID:31521
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:45:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31523
14:26:45:WU01:FS02:Core PID:31527
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:46:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)