FahCore returned: FAILED_2 (1 = 0x1) when running a22 WUs?

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
novosirj
Posts: 6
Joined: Thu Mar 26, 2020 5:19 am
Hardware configuration: Spare cycles on multiple supercomputers.

FahCore returned: FAILED_2 (1 = 0x1) when running a22 WUs?

Post by novosirj »

Hi there,

We often run F@H on systems that need simulated load or what have you – makes productive use of equipment that's being tested for another reason. I'm seeing this on one such system, however, and am not sure what to make of it. Any ideas? These are RTX 2080 Ti cards, and they've got driver 470.74. F@H client is 7.6.21, and I know this both did, work, and I've seen the occasional work unit succeed in the last couple of days I've been running (CPU WUs are running fine):

Code: Select all

11:44:42:WU07:FS07:Starting
11:44:42:WU07:FS07:Running FahCore: /usr/bin/FAHCoreWrapper /scratch/novosirj/FAH/16880643_14/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 35953 -checkpoint 1 -opencl-platform 0 -opencl-device 6 -cuda-device 6 -gpu-vendor nvidia -gpu 6 -gpu-usage 100
11:44:42:WU07:FS07:Started FahCore on PID 36477
11:44:42:WU07:FS07:Core PID:36481
11:44:42:WU07:FS07:FahCore 0x22 started
11:44:42:WARNING:WU07:FS07:FahCore returned: FAILED_2 (1 = 0x1)
11:45:42:WU07:FS07:Starting
11:45:42:WU07:FS07:Running FahCore: /usr/bin/FAHCoreWrapper /scratch/novosirj/FAH/16880643_14/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 35953 -checkpoint 1 -opencl-platform 0 -opencl-device 6 -cuda-device 6 -gpu-vendor nvidia -gpu 6 -gpu-usage 100
11:45:42:WU07:FS07:Started FahCore on PID 36609
11:45:42:WU07:FS07:Core PID:36613
11:45:42:WU07:FS07:FahCore 0x22 started
11:45:43:WARNING:WU07:FS07:FahCore returned: FAILED_2 (1 = 0x1)
11:45:43:WARNING:WU07:FS07:Too many errors, failing
11:45:43:WU07:FS07:Sending unit results: id:07 state:SEND error:FAILED project:18201 run:44260 clone:0 gen:25 core:0x22 unit:0x0000000000000019000047190000ace4
11:45:43:WU07:FS07:Connecting to 128.252.203.11:8080
11:45:43:WU07:FS07:Server responded WORK_ACK (400)
11:45:43:WU07:FS07:Cleaning up
I don't really see any more information anywhere. Just FWIW, the reason I'm running this on this machine is that I suspect a problem with one of the GPUs (falling off the bus), but there's no indication that that is causing this problem.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by toTOW »

If you try to start the core manually from a terminal, you'll get a more detailed error.

See the global announcement about core 22 v0.0.18 : viewtopic.php?f=24&t=37391
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
novosirj
Posts: 6
Joined: Thu Mar 26, 2020 5:19 am
Hardware configuration: Spare cycles on multiple supercomputers.

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by novosirj »

Thanks. My guess is I'll need to build a new container with a newer version of the OS with newer GLIBC support. I may have used CentOS 7.x for my current container.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by toTOW »

I confirm that CentOS 7 has a too old version of glibc ... :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
PaulTV
Posts: 211
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by PaulTV »

CentOS 7 (and RHEL 7) doesn't even have Python 3 in the default repo... RH releases are more conservative than carrot-haired 70-yo presidents.
Image

Ryzen 9800X3D / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 22.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by Neil-B »

PaulTV wrote:CentOS 7 (and RHEL 7) doesn't even have Python 3 in the default repo... RH releases are more conservative than carrot-haired 70-yo presidents.
... since FaH currently doesn't need Python 3 they are a perfect match ;)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by toTOW »

No, because CentOS 7 has a glibc implementation that is too old for core 22 v0.0.18 ... :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
novosirj
Posts: 6
Joined: Thu Mar 26, 2020 5:19 am
Hardware configuration: Spare cycles on multiple supercomputers.

Re: FahCore returned: FAILED_2 (1 = 0x1) when running a22 WU

Post by novosirj »

Generating a new Singularity container (that's how I currently run FaH on our clusters) that uses CentOS 8 solved the problem with no other changes.

It seems kind of like a shame, but I guess it's true that most of the target audience for this software isn't running legacy-ish enterprise systems (and we have a solution for that anyway).
Post Reply