FahCore returned: FAILED_2 (1 = 0x1) when running a22 WUs?
Posted: Mon Dec 06, 2021 3:55 pm
Hi there,
We often run F@H on systems that need simulated load or what have you – makes productive use of equipment that's being tested for another reason. I'm seeing this on one such system, however, and am not sure what to make of it. Any ideas? These are RTX 2080 Ti cards, and they've got driver 470.74. F@H client is 7.6.21, and I know this both did, work, and I've seen the occasional work unit succeed in the last couple of days I've been running (CPU WUs are running fine):
I don't really see any more information anywhere. Just FWIW, the reason I'm running this on this machine is that I suspect a problem with one of the GPUs (falling off the bus), but there's no indication that that is causing this problem.
We often run F@H on systems that need simulated load or what have you – makes productive use of equipment that's being tested for another reason. I'm seeing this on one such system, however, and am not sure what to make of it. Any ideas? These are RTX 2080 Ti cards, and they've got driver 470.74. F@H client is 7.6.21, and I know this both did, work, and I've seen the occasional work unit succeed in the last couple of days I've been running (CPU WUs are running fine):
Code: Select all
11:44:42:WU07:FS07:Starting
11:44:42:WU07:FS07:Running FahCore: /usr/bin/FAHCoreWrapper /scratch/novosirj/FAH/16880643_14/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 35953 -checkpoint 1 -opencl-platform 0 -opencl-device 6 -cuda-device 6 -gpu-vendor nvidia -gpu 6 -gpu-usage 100
11:44:42:WU07:FS07:Started FahCore on PID 36477
11:44:42:WU07:FS07:Core PID:36481
11:44:42:WU07:FS07:FahCore 0x22 started
11:44:42:WARNING:WU07:FS07:FahCore returned: FAILED_2 (1 = 0x1)
11:45:42:WU07:FS07:Starting
11:45:42:WU07:FS07:Running FahCore: /usr/bin/FAHCoreWrapper /scratch/novosirj/FAH/16880643_14/cores/cores.foldingathome.org/lin/64bit/22-0.0.18/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 35953 -checkpoint 1 -opencl-platform 0 -opencl-device 6 -cuda-device 6 -gpu-vendor nvidia -gpu 6 -gpu-usage 100
11:45:42:WU07:FS07:Started FahCore on PID 36609
11:45:42:WU07:FS07:Core PID:36613
11:45:42:WU07:FS07:FahCore 0x22 started
11:45:43:WARNING:WU07:FS07:FahCore returned: FAILED_2 (1 = 0x1)
11:45:43:WARNING:WU07:FS07:Too many errors, failing
11:45:43:WU07:FS07:Sending unit results: id:07 state:SEND error:FAILED project:18201 run:44260 clone:0 gen:25 core:0x22 unit:0x0000000000000019000047190000ace4
11:45:43:WU07:FS07:Connecting to 128.252.203.11:8080
11:45:43:WU07:FS07:Server responded WORK_ACK (400)
11:45:43:WU07:FS07:Cleaning up