NVIDIA GPU units now failing on CENTOS7

Moderators: Site Moderators, FAHC Science Team

Post Reply
stfarley
Posts: 1
Joined: Fri Nov 19, 2021 1:18 pm

NVIDIA GPU units now failing on CENTOS7

Post by stfarley »

I have been using this hardware for over a year.
Recently the GPU work units have been failing.
If I pause and restart the slot it will eventually get a work unit that succeeds.
I have installed the latest NVIDIA drivers

Here is a sample from the log

Code: Select all

14:26:38:WU02:FS02:Sending unit results: id:02 state:SEND error:FAILED project:18432 run:93 clone:4 gen:149 core:0x22 unit:0x0000000400000095000048000000005d
14:26:38:WU02:FS02:Connecting to 129.32.209.202:8080
14:26:38:WU01:FS02:Connecting to assign1.foldingathome.org:80
14:26:38:WU02:FS02:Server responded WORK_ACK (400)
14:26:38:WU02:FS02:Cleaning up
14:26:39:WU01:FS02:Assigned to work server 34.72.228.44
14:26:39:WU01:FS02:Requesting new work unit for slot 02: gpu:1:0 TU106 [Geforce RTX 2060] from 34.72.228.44
14:26:39:WU01:FS02:Connecting to 34.72.228.44:8080
14:26:39:WU01:FS02:Downloading 8.27MiB
14:26:45:WU01:FS02:Download 100.00%
14:26:45:WU01:FS02:Download complete
14:26:45:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:18021 run:22 clone:10 gen:70 core:0x22 unit:0x0000000a000000460000466500000016
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31517
14:26:45:WU01:FS02:Core PID:31521
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:45:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
14:26:45:WU01:FS02:Starting
14:26:45:WU01:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /root/folding/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 20495 -checkpoint 15 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
14:26:45:WU01:FS02:Started FahCore on PID 31523
14:26:45:WU01:FS02:Core PID:31527
14:26:45:WU01:FS02:FahCore 0x22 started
14:26:46:WARNING:WU01:FS02:FahCore returned: FAILED_2 (1 = 0x1)
Joe_H
Site Admin
Posts: 8082
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: NVIDIA GPU units now failing on CENTOS7

Post by Joe_H »

The recent update to Core_22 uses a newer glibc than is available in CentOS 7. See this topic - viewtopic.php?f=74&t=37598.

They are looking into it, there may a fixed Core_22 version out in the near future that will use the version of glibc available on CentOS 7.
Image
toTOW
Site Moderator
Posts: 6416
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: NVIDIA GPU units now failing on CENTOS7

Post by toTOW »

This issue will be fixed with the upcoming v0.0.19 of Core 22 ...

To avoid future issues like this, it would be a good idea to update to a distribution that provides more frequent updates.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply