Too many Core Dumped on GPU

Moderators: Site Moderators, FAHC Science Team

azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Too many Core Dumped on GPU

Post by azhad »

[RANT]Warning: Maybe a Rant

I am getting too many Core Dumped from GPU work loads - cannot do any more GPU work. GPU is rank 40 of 399 according to LAR systems. I guess it does not have ECC. The Core Dumps happen when I sleep and probably when the home air conditioner if off. But note that Temps don't rise much. The Windows AMD driver recovers gracefully without reboot. But 3 hours of work is lost - that is USD 0.50, precious for me (in this part of the world). I usually work with GIMPS - their project developed ways of detecting the errors at times (Jacobi error checking) and now even ways to fix the error (PRP with Gerbitz error checking) with rollback if it occurs.

It happens about once or twice a day. Folding at home needs to take into account that these are consumer grade systems and occasional errors such as this should not disrupt the processing.

My main memory is ECC protected, running in ECC mode, but that does not help the GPU. So giving up on GPU processing until Folding at home does sometime to recover dumped GPUs (what's the point of checkpoints if it doesn't try to recover from them?). Note that this is second time I am having to give up on Folding at home - i remember a few years ago, using a different system then, I had to give up on Folding at home for a similar issue - GPU crashes.

Please fix.
[/RANT]
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

P.S. I am taking about 1-3 errors per day ruining 1-3 hours of a work unit. Also the errors occur with 0x24. Any setting to avoid core 0x24?
calxalot
Site Moderator
Posts: 1670
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Too many Core Dumped on GPU

Post by calxalot »

What OS? What client version?

Don’t let your system sleep when there is an active GPU WU. The GPU cores cannot recover from sleep.
muziqaz
Posts: 2131
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Too many Core Dumped on GPU

Post by muziqaz »

Can I RANT back?

PLEASE give up folding until you fix your hardware and software issues. If your GPU driver is resetting and recovering that means something is broken on your system, and not FAH.
Please make sure to have 100% stable system before attempting to contribute to any distributed projects.
And for the love of exaflops, next time provide FULL system information for us to help you to mitigate your issues ;)
P.S. ECC RAM is not a sign of stable system

End of RANT :P
FAH Omega tester
Image
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

@calxalot Windows 11H2. Client 8.4.9. System works 24/7. GPU is Sapphire 6950XT Nitro+ Pure. Underclocked for efficiency. -- I have raised the voltage a little bit to see if it helps.

@muziqaz I am not giving up yet. My request is whether there is anyway for the core to rollback to a checkpoint and resume instead of dumping and giving up on the workunit totally.
arisu
Posts: 586
Joined: Mon Feb 24, 2025 11:11 pm

Re: Too many Core Dumped on GPU

Post by arisu »

Post the logs. That will make it easier to determine if the failure is something that is even recoverable or not. Maybe it's something easy to fix. Maybe you have an unstable machine. Maybe it's a software bug. Without the logs, there's no real way to know. There's a big difference between a "core was killed" error and a "particle position is NaN" error, but both will cause a dump. One is something you can probably easily fix, the other indicates either a problem with the WU or a problem with your hardware.

Arguably the client is too liberal with deciding that a failure is dump-worthy, but it's better to be too liberal than too conservative, and risk sending back bad data. But no, it does not rollback on most failures. Even if the core is simply killed or fails to succeed for a benign reason, it will dump the WU.
azhad wrote: Sat Mar 15, 2025 9:58 am Windows AMD driver recovers gracefully without reboot.
As far as I'm aware, only the Linux amdgpu driver can fully recover without a reboot, and even then only sometimes. The Windows driver, which uses a different codebase, can only recover into a degraded mode that lacks full functionality.

Your system is probably not unstable, but something about your setup is likely triggering a bug in the graphics driver. If that's what's happening, then the FAH core is having its OpenCL instance ripped out underneath it so it terminates. The client sees it terminate, freaks out, and dumps the WU (even though the problem is in the core, not the WU). The GPU itself is fine, but the driver, which interfaces your user software with the GPU hardware, is infamously buggy (graphics drivers in general are). But you should get this fixed before continuing to fold on your GPU. Each time a WU is dumped, hours of progress are lost.
muziqaz
Posts: 2131
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Too many Core Dumped on GPU

Post by muziqaz »

Windows can recover without reboot. Problem is, that it recovers to the state that is not suitable for FAH.
Why driver is resetting? I have the same GPU (sh*t no, I don't, mine is 7900xtx nitro+) on Win11 with latest drivers, I have not seen driver reset for very long time. I also do not touch any undevolting stuff, I swear this is new "stable overclocking" trend. FAH prefers unmolested hardware. Set your GPU to what AIB intended it to be and see if problem persists. FAH is not suited for your undervolting tuning :)

FAH cannot recover if its workload is mangled beyond repair by crashing driver ;)
FAH Omega tester
Image
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

Thank you arisu. I have observed the metrics closely. Have disabled the underclock (2150MHz, 850mV = 170W avg power) and gone back to Default (2400 avg MHz, 1200mV = 300W power). If it works, I may try a 2150MHz, 887mV to gain back some efficiency.

10 Successful GPU WUs. 3 Dumped.

WU10:

Code: Select all

12:34:23:I1:WU10:Received WU P18251 R201 C1 G214
12:34:24:I3:Running FahCore: D:\ProgramData\FAHClient\cores/openmm-core-24/windows-10-64bit/release/fahcore-24-windows-10-64bit-release-8.1.4/FahCore_24.exe -dir QiHOPHj8ZqANrCb6-Bl9npNlox307L_7S43C5rwfO5Y -suffix 01 -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
12:34:24:I3:WU10:Started FahCore on PID 2384
12:34:25:I1:WU10:*********************** Log Started 2025-03-13T12:34:25Z ***********************
12:34:25:I1:WU10:*************************** Core24 Folding@home Core ***************************
12:34:25:I1:WU10:       Core: Core24
12:34:25:I1:WU10:       Type: 0x24
12:34:25:I1:WU10:    Version: 8.1.4
12:34:25:I1:WU10:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:34:25:I1:WU10:  Copyright: 2022 foldingathome.org
12:34:25:I1:WU10:   Homepage: https://foldingathome.org/
12:34:25:I1:WU10:       Date: Jul 25 2024
12:34:25:I1:WU10:       Time: 05:42:49
12:34:25:I1:WU10:   Revision: cf9f0139862b8945a2091772770e4631aac37792
12:34:25:I1:WU10:     Branch: HEAD
12:34:25:I1:WU10:   Compiler: Visual C++
12:34:25:I1:WU10:    Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2
12:34:25:I1:WU10:             /Zc:throwingNew /MT -DOPENMM_VERSION="\"8.1.1\"" /Ox /std:c++14
12:34:25:I1:WU10:   Platform: win32 10
12:34:25:I1:WU10:       Bits: 64
12:34:25:I1:WU10:       Mode: Release
12:34:25:I1:WU10:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
12:34:25:I1:WU10:             <peastman@stanford.edu>
12:34:25:I1:WU10:       Args: -dir QiHOPHj8ZqANrCb6-Bl9npNlox307L_7S43C5rwfO5Y -suffix 01
12:34:25:I1:WU10:             -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd
12:34:25:I1:WU10:             -opencl-platform 0 -opencl-device 1 -gpu 1
12:34:25:I1:WU10:************************************ libFAH ************************************
12:34:25:I1:WU10:       Date: Jul 25 2024
12:34:25:I1:WU10:       Time: 05:23:50
12:34:25:I1:WU10:   Revision: c7d2824a47eb025fa8cda8968c7a5e971585d90c
12:34:25:I1:WU10:     Branch: HEAD
12:34:25:I1:WU10:   Compiler: Visual C++
12:34:25:I1:WU10:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
12:34:25:I1:WU10:   Platform: win32 10
12:34:25:I1:WU10:       Bits: 64
12:34:25:I1:WU10:       Mode: Release
12:34:25:I1:WU10:************************************ CBang *************************************
12:34:25:I1:WU10:    Version: 1.7.2
12:34:25:I1:WU10:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:34:25:I1:WU10:        Org: Cauldron Development LLC
12:34:25:I1:WU10:  Copyright: Cauldron Development LLC, 2003-2024
12:34:25:I1:WU10:   Homepage: https://cauldrondevelopment.com/
12:34:25:I1:WU10:    License: LGPL-2.1-or-later
12:34:25:I1:WU10:       Date: Jul 25 2024
12:34:25:I1:WU10:       Time: 05:22:43
12:34:25:I1:WU10:   Revision: f1cd4c791e8c40a35dcfeab3ab85d910949cc0cb
12:34:25:I1:WU10:     Branch: HEAD
12:34:25:I1:WU10:   Compiler: Visual C++
12:34:25:I1:WU10:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
12:34:25:I1:WU10:   Platform: win32 10
12:34:25:I1:WU10:       Bits: 64
12:34:25:I1:WU10:       Mode: Release
12:34:25:I1:WU10:************************************ System ************************************
12:34:25:I1:WU10:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
12:34:25:I1:WU10:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
12:34:25:I1:WU10:       CPUs: 16
12:34:25:I1:WU10:     Memory: 95.16GiB
12:34:25:I1:WU10:Free Memory: 73.69GiB
12:34:25:I1:WU10: OS Version: 10.0
12:34:25:I1:WU10:Has Battery: false
12:34:25:I1:WU10: On Battery: false
12:34:25:I1:WU10:   Hostname: AZHAD-PC2
12:34:25:I1:WU10: UTC Offset: 5
12:34:25:I1:WU10:        PID: 2384
12:34:25:I1:WU10:        CWD: D:\ProgramData\FAHClient\work
12:34:25:I1:WU10:       Exec: D:\ProgramData\FAHClient\cores\openmm-core-24\windows-10-64bit\release\fahcore-24-windows-10-64bit-release-8.1.4\FahCore_24.exe
12:34:25:I1:WU10:************************************ OpenMM ************************************
12:34:25:I1:WU10:    Version: 8.1.1
12:34:25:I1:WU10:********************************************************************************
12:34:25:I1:WU10:Project: 18251 (Run 201, Clone 1, Gen 214)
12:34:25:I1:WU10:Reading tar file core.xml
12:34:25:I1:WU10:Reading tar file integrator.xml
12:34:25:I1:WU10:Reading tar file state.xml.bz2
12:34:25:I1:WU10:Reading tar file system.xml.bz2
12:34:25:I1:WU10:Digital signatures verified
12:34:25:I1:WU10:Folding@home GPU Core24 Folding@home Core
12:34:25:I1:WU10:Version 8.1.4
12:34:25:I1:WU10:  Checkpoint write interval: 12500 steps (5%) [20 total]
12:34:25:I1:WU10:  JSON viewer frame write interval: 2500 steps (1%) [100 total]
12:34:25:I1:WU10:  XTC frame write interval: 5000 steps (2%) [50 total]
12:34:25:I1:WU10:  TRR frame write interval: disabled
12:34:25:I1:WU10:  Global context and integrator variables write interval: disabled
12:34:26:I1:WU10:There are 3 platforms available.
12:34:26:I1:WU10:Platform 0: Reference
12:34:26:I1:WU10:Platform 1: CPU
12:34:26:I1:WU10:Platform 2: OpenCL
12:34:26:I1:WU10:  opencl-device 1 specified
12:36:42:I1:WU8:Completed 227500 out of 250000 steps (91%)
12:37:19:I1:WU10:Attempting to create OpenCL context:
12:37:19:I1:WU10:  Configuring platform OpenCL
12:38:03:I1:WU10:  Using OpenCL on OpenCL platformId 0 and gpu 1
12:38:03:I1:WU10:  GPU info: Platform: OpenCL: AMD Accelerated Parallel Processing
12:38:03:I1:WU10:  GPU info: PlatformIndex: 0
12:38:03:I1:WU10:  GPU info: Device: gfx1030
12:38:03:I1:WU10:  GPU info: DeviceIndex: 1
12:38:03:I1:WU10:  GPU info: Vendor: 0x1002
12:38:03:I1:WU10:  GPU info: PCI: 03:00:00
12:38:03:I1:WU10:  GPU info: Compute: 2.0
12:38:03:I1:WU10:  GPU info: Driver: 3640.0
12:38:03:I1:WU10:  GPU info: GPU: true
12:38:03:I1:WU10:Completed 0 out of 250000 steps (0%)
12:38:10:I1:WU10:Checkpoint completed at step 0
12:39:37:I1:WU8:Completed 230000 out of 250000 steps (92%)
12:40:58:I1:WU10:Completed 2500 out of 250000 steps (1%)
12:42:20:I1:WU8:Completed 232500 out of 250000 steps (93%)
12:42:32:W :WU8:Visualization frame 93 unchanged, skipping
12:43:43:I1:WU10:Completed 5000 out of 250000 steps (2%)
12:45:06:I1:WU8:Completed 235000 out of 250000 steps (94%)
12:46:25:I1:WU10:Completed 7500 out of 250000 steps (3%)
12:47:49:I1:WU8:Completed 237500 out of 250000 steps (95%)
12:48:01:W :WU8:Visualization frame 95 unchanged, skipping
12:49:05:I1:WU10:Completed 10000 out of 250000 steps (4%)
12:50:35:I1:WU8:Completed 240000 out of 250000 steps (96%)
12:51:45:I1:WU10:Completed 12500 out of 250000 steps (5%)
12:51:53:I1:WU10:Checkpoint completed at step 12500
12:53:21:I1:WU8:Completed 242500 out of 250000 steps (97%)
12:53:34:W :WU8:Visualization frame 97 unchanged, skipping
12:54:34:I1:WU10:Completed 15000 out of 250000 steps (6%)
12:56:07:I1:WU8:Completed 245000 out of 250000 steps (98%)
12:57:16:I1:WU10:Completed 17500 out of 250000 steps (7%)
12:58:51:I1:WU8:Completed 247500 out of 250000 steps (99%)
12:59:03:W :WU8:Visualization frame 99 unchanged, skipping
12:59:57:I1:WU10:Completed 20000 out of 250000 steps (8%)
13:01:36:I1:WU8:Completed 250000 out of 250000 steps (100%)
13:01:44:I1:WU8:Saving result file ..\logfile_01.txt
13:01:44:I1:WU8:Saving result file frame399.gro
13:01:44:I1:WU8:Saving result file frame399.xtc
13:01:44:I1:WU8:Saving result file md.log
13:01:44:I1:WU8:Saving result file science.log
13:01:44:I1:WU8:Saving result file state.cpt
13:01:44:I1:WU8:Folding@home Core Shutdown: FINISHED_UNIT
13:01:44:I1:WU8:Core returned FINISHED_UNIT (100)
13:01:45:I1:Default:Added new work unit: cpus:14 gpus:
13:01:45:I1:WU8:Uploading WU results
13:01:46:I1:WU11:Requesting WU assignment for user azhad team 0
13:01:46:I1:OUT50:> POST https://fahserver1.flatironinstitute.org/api/results HTTP/1.1
13:01:46:I1:OUT51:> POST https://assign3.foldingathome.org/api/assign HTTP/1.1
13:01:47:I1:OUT51:< HTTP/1.1 200 HTTP_OK
13:01:47:I1:WU11:Received WU assignment fsDyVXaSYYLKUaJObQjjA4WrfC5PofR3GaBzjsm7yM4
13:01:47:I1:WU11:Downloading WU
13:01:47:I1:OUT52:> POST https://fahserver1.flatironinstitute.org/api/assign HTTP/1.1
13:01:52:I1:OUT52:< HTTP/1.1 200 HTTP_OK
13:01:53:I1:WU11:Received WU P18806 R71 C18 G374
13:01:53:I3:Running FahCore: D:\ProgramData\FAHClient\cores/gromacs-core-a9/windows-10-64bit/cpu-avx2_256-release/fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12/FahCore_a9.exe -dir fsDyVXaSYYLKUaJObQjjA4WrfC5PofR3GaBzjsm7yM4 -suffix 01 -version 8.4.9 -lifeline 29216 -np 14
13:01:53:I3:WU11:Started FahCore on PID 28240
13:01:53:I1:WU11:*********************** Log Started 2025-03-13T13:01:53Z ***********************
13:01:53:I1:WU11:************************** Gromacs Folding@home Core ***************************
13:01:53:I1:WU11:       Core: Gromacs
13:01:53:I1:WU11:       Type: 0xa9
13:01:53:I1:WU11:    Version: 0.0.12
13:01:53:I1:WU11:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:01:53:I1:WU11:  Copyright: 2022 foldingathome.org
13:01:53:I1:WU11:   Homepage: https://foldingathome.org/
13:01:53:I1:WU11:       Date: Nov 15 2022
13:01:53:I1:WU11:       Time: 13:31:08
13:01:53:I1:WU11:   Compiler: Visual C++
13:01:53:I1:WU11:    Options: /TP /std:c++17 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
13:01:53:I1:WU11:   Platform: win32 10
13:01:53:I1:WU11:       Bits: 64
13:01:53:I1:WU11:       Mode: Release
13:01:53:I1:WU11:       SIMD: avx2_256
13:01:53:I1:WU11:     OpenMP: ON
13:01:53:I1:WU11:       CUDA: OFF
13:01:53:I1:WU11:     OpenCL: OFF
13:01:53:I1:WU11:       Args: -dir fsDyVXaSYYLKUaJObQjjA4WrfC5PofR3GaBzjsm7yM4 -suffix 01
13:01:53:I1:WU11:             -version 8.4.9 -lifeline 29216 -np 14
13:01:53:I1:WU11:************************************ libFAH ************************************
13:01:53:I1:WU11:       Date: Nov 15 2022
13:01:53:I1:WU11:       Time: 13:30:33
13:01:53:I1:WU11:   Compiler: Visual C++
13:01:53:I1:WU11:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
13:01:53:I1:WU11:   Platform: win32 10
13:01:53:I1:WU11:       Bits: 64
13:01:53:I1:WU11:       Mode: Release
13:01:53:I1:WU11:************************************ CBang *************************************
13:01:53:I1:WU11:       Date: Nov 15 2022
13:01:53:I1:WU11:       Time: 13:29:57
13:01:53:I1:WU11:   Compiler: Visual C++
13:01:53:I1:WU11:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
13:01:53:I1:WU11:   Platform: win32 10
13:01:53:I1:WU11:       Bits: 64
13:01:53:I1:WU11:       Mode: Release
13:01:53:I1:WU11:************************************ System ************************************
13:01:53:I1:WU11:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
13:01:53:I1:WU11:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
13:01:53:I1:WU11:       CPUs: 16
13:01:53:I1:WU11:     Memory: 95.16GiB
13:01:53:I1:WU11:Free Memory: 71.41GiB
13:01:53:I1:WU11:    Threads: WINDOWS_THREADS
13:01:53:I1:WU11: OS Version: 6.2
13:01:53:I1:WU11:Has Battery: false
13:01:53:I1:WU11: On Battery: false
13:01:53:I1:WU11: UTC Offset: 5
13:01:53:I1:WU11:        PID: 28240
13:01:53:I1:WU11:        CWD: D:\ProgramData\FAHClient\work
13:01:53:I1:WU11:       Exec: D:\ProgramData\FAHClient\cores\gromacs-core-a9\windows-10-64bit\cpu-avx2_256-release\fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12\FahCore_a9.exe
13:01:53:I1:WU11:********************************************************************************
13:01:53:I1:WU11:Project: 18806 (Run 71, Clone 18, Gen 374)
13:01:53:I1:WU11:Reading tar file core.xml
13:01:53:I1:WU11:Reading tar file frame374.tpr
13:01:53:I1:WU11:Digital signatures verified
13:01:53:I1:WU11:Calling: mdrun -c frame374.gro -s frame374.tpr -x frame374.xtc -cpt 5 -nt 14 -ntmpi 1 -update cpu -nb cpu -bonded cpu -pme cpu -pmefft cpu
13:01:53:I1:WU11:Steps: first=93500000 total=93750000
13:01:57:I1:WU11:Completed 1 out of 250000 steps (0%)
13:02:15:I1:OUT50:< HTTP/1.1 200 HTTP_OK
13:02:15:I1:WU8:Credited
13:02:37:I1:WU10:Completed 22500 out of 250000 steps (9%)
13:04:39:I1:WU11:Completed 2500 out of 250000 steps (1%)
13:04:51:W :WU11:Visualization frame 1 unchanged, skipping
13:05:17:I1:WU10:Completed 25000 out of 250000 steps (10%)
13:05:25:I1:WU10:Checkpoint completed at step 25000
13:07:27:I1:WU11:Completed 5000 out of 250000 steps (2%)
13:08:06:I1:WU10:Completed 27500 out of 250000 steps (11%)
13:10:09:I1:WU11:Completed 7500 out of 250000 steps (3%)
13:10:22:W :WU11:Visualization frame 3 unchanged, skipping
13:10:45:I1:WU10:Completed 30000 out of 250000 steps (12%)
13:12:55:I1:WU11:Completed 10000 out of 250000 steps (4%)
13:13:25:I1:WU10:Completed 32500 out of 250000 steps (13%)
13:15:37:I1:WU11:Completed 12500 out of 250000 steps (5%)
13:15:50:W :WU11:Visualization frame 5 unchanged, skipping
13:16:04:I1:WU10:Completed 35000 out of 250000 steps (14%)
13:18:23:I1:WU11:Completed 15000 out of 250000 steps (6%)
13:18:46:I1:WU10:Completed 37500 out of 250000 steps (15%)
13:18:54:I1:WU10:Checkpoint completed at step 37500
13:21:08:I1:WU11:Completed 17500 out of 250000 steps (7%)
13:21:20:W :WU11:Visualization frame 7 unchanged, skipping
13:21:34:I1:WU10:Completed 40000 out of 250000 steps (16%)
13:23:54:I1:WU11:Completed 20000 out of 250000 steps (8%)
13:24:12:I1:WU10:Completed 42500 out of 250000 steps (17%)
13:26:36:I1:WU11:Completed 22500 out of 250000 steps (9%)
13:26:49:W :WU11:Visualization frame 9 unchanged, skipping
13:26:52:I1:WU10:Completed 45000 out of 250000 steps (18%)
13:29:22:I1:WU11:Completed 25000 out of 250000 steps (10%)
13:29:31:I1:WU10:Completed 47500 out of 250000 steps (19%)
13:32:08:I1:WU11:Completed 27500 out of 250000 steps (11%)
13:32:11:I1:WU10:Completed 50000 out of 250000 steps (20%)
13:32:20:I1:WU10:Checkpoint completed at step 50000
13:34:53:I1:WU11:Completed 30000 out of 250000 steps (12%)
13:34:59:I1:WU10:Completed 52500 out of 250000 steps (21%)
13:35:05:W :WU11:Visualization frame 12 unchanged, skipping
13:37:39:I1:WU11:Completed 32500 out of 250000 steps (13%)
13:37:39:I1:WU10:Completed 55000 out of 250000 steps (22%)
13:40:20:I1:WU10:Completed 57500 out of 250000 steps (23%)
13:40:21:I1:WU11:Completed 35000 out of 250000 steps (14%)
13:40:34:W :WU11:Visualization frame 14 unchanged, skipping
13:43:00:I1:WU10:Completed 60000 out of 250000 steps (24%)
13:43:07:I1:WU11:Completed 37500 out of 250000 steps (15%)
13:45:39:I1:WU10:Completed 62500 out of 250000 steps (25%)
13:45:47:I1:WU10:Checkpoint completed at step 62500
13:45:52:I1:WU11:Completed 40000 out of 250000 steps (16%)
13:46:05:W :WU11:Visualization frame 16 unchanged, skipping
13:48:30:I1:WU10:Completed 65000 out of 250000 steps (26%)
13:48:39:I1:WU11:Completed 42500 out of 250000 steps (17%)
13:51:07:I1:WU10:Completed 67500 out of 250000 steps (27%)
13:51:21:I1:WU11:Completed 45000 out of 250000 steps (18%)
13:51:34:W :WU11:Visualization frame 18 unchanged, skipping
13:53:48:I1:WU10:Completed 70000 out of 250000 steps (28%)
13:54:07:I1:WU11:Completed 47500 out of 250000 steps (19%)
13:56:30:I1:WU10:Completed 72500 out of 250000 steps (29%)
13:56:52:I1:WU11:Completed 50000 out of 250000 steps (20%)
13:59:10:I1:WU10:Completed 75000 out of 250000 steps (30%)
13:59:18:I1:WU10:Checkpoint completed at step 75000
13:59:40:I1:WU11:Completed 52500 out of 250000 steps (21%)
14:01:57:I1:WU10:Completed 77500 out of 250000 steps (31%)
14:02:26:I1:WU11:Completed 55000 out of 250000 steps (22%)
14:04:37:I1:WU10:Completed 80000 out of 250000 steps (32%)
14:05:09:I1:WU11:Completed 57500 out of 250000 steps (23%)
14:05:21:W :WU11:Visualization frame 23 unchanged, skipping
14:07:17:I1:WU10:Completed 82500 out of 250000 steps (33%)
14:07:55:I1:WU11:Completed 60000 out of 250000 steps (24%)
14:09:57:I1:WU10:Completed 85000 out of 250000 steps (34%)
14:10:38:I1:WU11:Completed 62500 out of 250000 steps (25%)
14:10:50:W :WU11:Visualization frame 25 unchanged, skipping
14:12:36:I1:WU10:Completed 87500 out of 250000 steps (35%)
14:12:44:I1:WU10:Checkpoint completed at step 87500
14:13:26:I1:WU11:Completed 65000 out of 250000 steps (26%)
14:15:22:I1:WU10:Completed 90000 out of 250000 steps (36%)
14:16:09:I1:WU11:Completed 67500 out of 250000 steps (27%)
14:16:21:W :WU11:Visualization frame 27 unchanged, skipping
14:18:03:I1:WU10:Completed 92500 out of 250000 steps (37%)
14:18:54:I1:WU11:Completed 70000 out of 250000 steps (28%)
14:20:43:I1:WU10:Completed 95000 out of 250000 steps (38%)
14:21:37:I1:WU11:Completed 72500 out of 250000 steps (29%)
14:21:50:W :WU11:Visualization frame 29 unchanged, skipping
14:23:24:I1:WU10:Completed 97500 out of 250000 steps (39%)
14:24:23:I1:WU11:Completed 75000 out of 250000 steps (30%)
14:26:04:I1:WU10:Completed 100000 out of 250000 steps (40%)
14:26:12:I1:WU10:Checkpoint completed at step 100000
14:27:12:I1:WU11:Completed 77500 out of 250000 steps (31%)
14:28:52:I1:WU10:Completed 102500 out of 250000 steps (41%)
14:29:54:I1:WU11:Completed 80000 out of 250000 steps (32%)
14:30:07:W :WU11:Visualization frame 32 unchanged, skipping
14:31:33:I1:WU10:Completed 105000 out of 250000 steps (42%)
14:32:40:I1:WU11:Completed 82500 out of 250000 steps (33%)
14:34:14:I1:WU10:Completed 107500 out of 250000 steps (43%)
14:35:23:I1:WU11:Completed 85000 out of 250000 steps (34%)
14:35:36:W :WU11:Visualization frame 34 unchanged, skipping
14:36:39:I1:WU10:Completed 110000 out of 250000 steps (44%)
14:36:40:I1:WU10:Caught signal SIGABRT(22)
14:36:40:I1:WU10:WARNING:Unexpected exit
14:36:40:E :WU10:Core returned EARLY_UNIT_END (123)
14:36:40:E :WU10:Run did not produce any results. Dumping WU
14:36:40:I1:Default:Added new work unit: cpus:0 gpus:gpu:03:00:00
14:36:40:I1:WU10:Sending dump report
14:36:40:I1:WU12:Requesting WU assignment for user azhad team 0
14:36:40:I1:WU11:WARNING:Console control signal 1 on PID 28240
14:36:40:I1:WU11:Exiting, please wait. . .
14:36:41:I1:OUT54:> POST https://assign4.foldingathome.org/api/assign HTTP/1.1
14:36:41:I1:OUT53:> POST https://highland3.seas.upenn.edu/api/results HTTP/1.1
14:36:41:I1:OUT54:< HTTP/1.1 200 HTTP_OK
14:36:41:I1:WU12:Received WU assignment 1dbjJQnsMNRQRPbx8buHRp-Xw7neTWPmVTUgznuxT1k
14:36:41:I1:WU12:Downloading WU
14:36:42:I1:OUT53:< HTTP/1.1 200 HTTP_OK
14:36:42:I1:WU10:Dumped
WU22 (a CPU one) and WU23:

Code: Select all

21:53:04:I1:WU22:Received WU P18806 R88 C14 G388
21:53:04:I3:Running FahCore: D:\ProgramData\FAHClient\cores/gromacs-core-a9/windows-10-64bit/cpu-avx2_256-release/fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12/FahCore_a9.exe -dir 9zXqJDVx0SXvPKVcGUZREjIQSz5kOZTR6eCFPXWLC1c -suffix 01 -version 8.4.9 -lifeline 29216 -np 14
21:53:04:I3:WU22:Started FahCore on PID 28696
21:53:04:I1:WU22:*********************** Log Started 2025-03-14T21:53:04Z ***********************
21:53:04:I1:WU22:************************** Gromacs Folding@home Core ***************************
21:53:04:I1:WU22:       Core: Gromacs
21:53:04:I1:WU22:       Type: 0xa9
21:53:04:I1:WU22:    Version: 0.0.12
21:53:04:I1:WU22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:53:04:I1:WU22:  Copyright: 2022 foldingathome.org
21:53:04:I1:WU22:   Homepage: https://foldingathome.org/
21:53:04:I1:WU22:       Date: Nov 15 2022
21:53:04:I1:WU22:       Time: 13:31:08
21:53:04:I1:WU22:   Compiler: Visual C++
21:53:04:I1:WU22:    Options: /TP /std:c++17 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
21:53:04:I1:WU22:   Platform: win32 10
21:53:04:I1:WU22:       Bits: 64
21:53:04:I1:WU22:       Mode: Release
21:53:04:I1:WU22:       SIMD: avx2_256
21:53:04:I1:WU22:     OpenMP: ON
21:53:04:I1:WU22:       CUDA: OFF
21:53:04:I1:WU22:     OpenCL: OFF
21:53:04:I1:WU22:       Args: -dir 9zXqJDVx0SXvPKVcGUZREjIQSz5kOZTR6eCFPXWLC1c -suffix 01
21:53:04:I1:WU22:             -version 8.4.9 -lifeline 29216 -np 14
21:53:04:I1:WU22:************************************ libFAH ************************************
21:53:04:I1:WU22:       Date: Nov 15 2022
21:53:04:I1:WU22:       Time: 13:30:33
21:53:04:I1:WU22:   Compiler: Visual C++
21:53:04:I1:WU22:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
21:53:04:I1:WU22:   Platform: win32 10
21:53:04:I1:WU22:       Bits: 64
21:53:04:I1:WU22:       Mode: Release
21:53:04:I1:WU22:************************************ CBang *************************************
21:53:04:I1:WU22:       Date: Nov 15 2022
21:53:04:I1:WU22:       Time: 13:29:57
21:53:04:I1:WU22:   Compiler: Visual C++
21:53:04:I1:WU22:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
21:53:04:I1:WU22:   Platform: win32 10
21:53:04:I1:WU22:       Bits: 64
21:53:04:I1:WU22:       Mode: Release
21:53:04:I1:WU22:************************************ System ************************************
21:53:04:I1:WU22:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
21:53:04:I1:WU22:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
21:53:04:I1:WU22:       CPUs: 16
21:53:04:I1:WU22:     Memory: 95.16GiB
21:53:04:I1:WU22:Free Memory: 68.22GiB
21:53:04:I1:WU22:    Threads: WINDOWS_THREADS
21:53:04:I1:WU22: OS Version: 6.2
21:53:04:I1:WU22:Has Battery: false
21:53:04:I1:WU22: On Battery: false
21:53:04:I1:WU22: UTC Offset: 5
21:53:04:I1:WU22:        PID: 28696
21:53:04:I1:WU22:        CWD: D:\ProgramData\FAHClient\work
21:53:04:I1:WU22:       Exec: D:\ProgramData\FAHClient\cores\gromacs-core-a9\windows-10-64bit\cpu-avx2_256-release\fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12\FahCore_a9.exe
21:53:04:I1:WU22:********************************************************************************
21:53:04:I1:WU22:Project: 18806 (Run 88, Clone 14, Gen 388)
21:53:04:I1:WU22:Reading tar file core.xml
21:53:04:I1:WU22:Reading tar file frame388.tpr
21:53:04:I1:WU22:Digital signatures verified
21:53:04:I1:WU22:Calling: mdrun -c frame388.gro -s frame388.tpr -x frame388.xtc -cpt 5 -nt 14 -ntmpi 1 -update cpu -nb cpu -bonded cpu -pme cpu -pmefft cpu
21:53:04:I1:WU22:Steps: first=97000000 total=97250000
21:53:08:I1:WU22:Completed 1 out of 250000 steps (0%)
21:53:31:I1:OUT97:< HTTP/1.1 200 HTTP_OK
21:53:31:I1:WU20:Credited
21:54:11:I1:WU21:Completed 182500 out of 250000 steps (73%)
21:55:51:I1:WU22:Completed 2500 out of 250000 steps (1%)
21:56:04:W :WU22:Visualization frame 1 unchanged, skipping
21:56:43:I1:WU21:Completed 185000 out of 250000 steps (74%)
21:58:38:I1:WU22:Completed 5000 out of 250000 steps (2%)
21:59:17:I1:WU21:Completed 187500 out of 250000 steps (75%)
21:59:25:I1:WU21:Checkpoint completed at step 187500
22:01:24:I1:WU22:Completed 7500 out of 250000 steps (3%)
22:01:37:W :WU22:Visualization frame 3 unchanged, skipping
22:01:59:I1:WU21:Completed 190000 out of 250000 steps (76%)
22:04:11:I1:WU22:Completed 10000 out of 250000 steps (4%)
22:04:32:I1:WU21:Completed 192500 out of 250000 steps (77%)
22:06:54:I1:WU22:Completed 12500 out of 250000 steps (5%)
22:07:02:I1:WU21:Completed 195000 out of 250000 steps (78%)
22:07:06:W :WU22:Visualization frame 5 unchanged, skipping
22:09:36:I1:WU21:Completed 197500 out of 250000 steps (79%)
22:09:40:I1:WU22:Completed 15000 out of 250000 steps (6%)
22:12:10:I1:WU21:Completed 200000 out of 250000 steps (80%)
22:12:18:I1:WU21:Checkpoint completed at step 200000
22:12:26:I1:WU22:Completed 17500 out of 250000 steps (7%)
22:12:39:W :WU22:Visualization frame 7 unchanged, skipping
22:14:49:I1:WU21:Completed 202500 out of 250000 steps (81%)
22:15:13:I1:WU22:Completed 20000 out of 250000 steps (8%)
22:17:19:I1:WU21:Completed 205000 out of 250000 steps (82%)
22:17:56:I1:WU22:Completed 22500 out of 250000 steps (9%)
22:18:09:W :WU22:Visualization frame 9 unchanged, skipping
22:19:51:I1:WU21:Completed 207500 out of 250000 steps (83%)
22:20:43:I1:WU22:Completed 25000 out of 250000 steps (10%)
22:22:25:I1:WU21:Completed 210000 out of 250000 steps (84%)
22:23:29:I1:WU22:Completed 27500 out of 250000 steps (11%)
22:24:56:I1:WU21:Completed 212500 out of 250000 steps (85%)
22:25:04:I1:WU21:Checkpoint completed at step 212500
22:26:15:I1:WU22:Completed 30000 out of 250000 steps (12%)
22:26:28:W :WU22:Visualization frame 12 unchanged, skipping
22:27:32:I1:WU21:Completed 215000 out of 250000 steps (86%)
22:29:02:I1:WU22:Completed 32500 out of 250000 steps (13%)
22:30:03:I1:WU21:Completed 217500 out of 250000 steps (87%)
22:31:45:I1:WU22:Completed 35000 out of 250000 steps (14%)
22:31:58:W :WU22:Visualization frame 14 unchanged, skipping
22:32:36:I1:WU21:Completed 220000 out of 250000 steps (88%)
22:34:32:I1:WU22:Completed 37500 out of 250000 steps (15%)
22:35:07:I1:WU21:Completed 222500 out of 250000 steps (89%)
22:37:15:I1:WU22:Completed 40000 out of 250000 steps (16%)
22:37:28:W :WU22:Visualization frame 16 unchanged, skipping
22:37:38:I1:WU21:Completed 225000 out of 250000 steps (90%)
22:37:46:I1:WU21:Checkpoint completed at step 225000
22:40:04:I1:WU22:Completed 42500 out of 250000 steps (17%)
22:40:20:I1:WU21:Completed 227500 out of 250000 steps (91%)
22:42:48:I1:WU22:Completed 45000 out of 250000 steps (18%)
22:42:53:I1:WU21:Completed 230000 out of 250000 steps (92%)
22:43:00:W :WU22:Visualization frame 18 unchanged, skipping
22:45:23:I1:WU21:Completed 232500 out of 250000 steps (93%)
22:45:34:I1:WU22:Completed 47500 out of 250000 steps (19%)
22:47:56:I1:WU21:Completed 235000 out of 250000 steps (94%)
22:48:21:I1:WU22:Completed 50000 out of 250000 steps (20%)
22:50:28:I1:WU21:Completed 237500 out of 250000 steps (95%)
22:50:36:I1:WU21:Checkpoint completed at step 237500
22:51:07:I1:WU22:Completed 52500 out of 250000 steps (21%)
22:51:19:W :WU22:Visualization frame 21 unchanged, skipping
22:53:09:I1:WU21:Completed 240000 out of 250000 steps (96%)
22:53:53:I1:WU22:Completed 55000 out of 250000 steps (22%)
22:55:39:I1:WU21:Completed 242500 out of 250000 steps (97%)
22:56:37:I1:WU22:Completed 57500 out of 250000 steps (23%)
22:56:50:W :WU22:Visualization frame 23 unchanged, skipping
22:58:10:I1:WU21:Completed 245000 out of 250000 steps (98%)
22:59:23:I1:WU22:Completed 60000 out of 250000 steps (24%)
23:00:42:I1:WU21:Completed 247500 out of 250000 steps (99%)
23:02:07:I1:WU22:Completed 62500 out of 250000 steps (25%)
23:02:19:W :WU22:Visualization frame 25 unchanged, skipping
23:03:14:I1:WU21:Completed 250000 out of 250000 steps (100%)
23:03:14:I1:WU21:Average performance: 5.69921 ns/day
23:03:22:I1:WU21:Checkpoint completed at step 250000
23:04:34:I1:WU21:Saving result file ..\logfile_01.txt
23:04:34:I1:WU21:Saving result file checkpointIntegrator.xml
23:04:34:I1:WU21:Saving result file checkpointState.xml.bz2
23:04:34:I1:WU21:Saving result file positions.xtc
23:04:34:I1:WU21:Saving result file science.log
23:04:34:I1:WU21:Saving result file xtcAtoms.csv.bz2
23:04:34:I1:WU21:Folding@home Core Shutdown: FINISHED_UNIT
23:04:42:I1:WU21:Core returned FINISHED_UNIT (100)
23:04:43:I1:Default:Added new work unit: cpus:0 gpus:gpu:03:00:00
23:04:43:I1:WU21:Uploading WU results
23:04:44:I1:WU23:Requesting WU assignment for user azhad team 0
23:04:44:I1:WU22:WARNING:Console control signal 1 on PID 28696
23:04:44:I1:WU22:Exiting, please wait. . .
23:04:44:I1:OUT102:> POST https://assign3.foldingathome.org/api/assign HTTP/1.1
23:04:44:I1:OUT101:> POST https://highland3.seas.upenn.edu/api/results HTTP/1.1
23:04:45:I1:OUT102:< HTTP/1.1 200 HTTP_OK
23:04:45:I1:WU23:Received WU assignment IAFKrY66z0ZCz-pzAw2VxJ6rdrGicEwc6m7-MLIBDcg
23:04:45:I1:WU23:Downloading WU
23:04:45:I1:OUT103:> POST https://highland3.seas.upenn.edu/api/assign HTTP/1.1
23:04:48:I1:WU22:Folding@home Core Shutdown: INTERRUPTED
23:04:49:I1:WU22:Core returned INTERRUPTED (102)
23:04:49:I3:Running FahCore: D:\ProgramData\FAHClient\cores/gromacs-core-a9/windows-10-64bit/cpu-avx2_256-release/fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12/FahCore_a9.exe -dir 9zXqJDVx0SXvPKVcGUZREjIQSz5kOZTR6eCFPXWLC1c -suffix 01 -version 8.4.9 -lifeline 29216 -np 14
23:04:49:I3:WU22:Started FahCore on PID 5812
23:04:49:I1:WU22:*********************** Log Started 2025-03-14T23:04:49Z ***********************
23:04:49:I1:WU22:************************** Gromacs Folding@home Core ***************************
23:04:49:I1:WU22:       Core: Gromacs
23:04:49:I1:WU22:       Type: 0xa9
23:04:49:I1:WU22:    Version: 0.0.12
23:04:49:I1:WU22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:04:49:I1:WU22:  Copyright: 2022 foldingathome.org
23:04:49:I1:WU22:   Homepage: https://foldingathome.org/
23:04:49:I1:WU22:       Date: Nov 15 2022
23:04:49:I1:WU22:       Time: 13:31:08
23:04:49:I1:WU22:   Compiler: Visual C++
23:04:49:I1:WU22:    Options: /TP /std:c++17 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:04:49:I1:WU22:   Platform: win32 10
23:04:49:I1:WU22:       Bits: 64
23:04:49:I1:WU22:       Mode: Release
23:04:49:I1:WU22:       SIMD: avx2_256
23:04:49:I1:WU22:     OpenMP: ON
23:04:49:I1:WU22:       CUDA: OFF
23:04:49:I1:WU22:     OpenCL: OFF
23:04:49:I1:WU22:       Args: -dir 9zXqJDVx0SXvPKVcGUZREjIQSz5kOZTR6eCFPXWLC1c -suffix 01
23:04:49:I1:WU22:             -version 8.4.9 -lifeline 29216 -np 14
23:04:49:I1:WU22:************************************ libFAH ************************************
23:04:49:I1:WU22:       Date: Nov 15 2022
23:04:49:I1:WU22:       Time: 13:30:33
23:04:49:I1:WU22:   Compiler: Visual C++
23:04:49:I1:WU22:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:04:49:I1:WU22:   Platform: win32 10
23:04:49:I1:WU22:       Bits: 64
23:04:49:I1:WU22:       Mode: Release
23:04:49:I1:WU22:************************************ CBang *************************************
23:04:49:I1:WU22:       Date: Nov 15 2022
23:04:49:I1:WU22:       Time: 13:29:57
23:04:49:I1:WU22:   Compiler: Visual C++
23:04:49:I1:WU22:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:04:49:I1:WU22:   Platform: win32 10
23:04:49:I1:WU22:       Bits: 64
23:04:49:I1:WU22:       Mode: Release
23:04:49:I1:WU22:************************************ System ************************************
23:04:49:I1:WU22:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
23:04:49:I1:WU22:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
23:04:49:I1:WU22:       CPUs: 16
23:04:49:I1:WU22:     Memory: 95.16GiB
23:04:49:I1:WU22:Free Memory: 78.40GiB
23:04:49:I1:WU22:    Threads: WINDOWS_THREADS
23:04:49:I1:WU22: OS Version: 6.2
23:04:49:I1:WU22:Has Battery: false
23:04:49:I1:WU22: On Battery: false
23:04:49:I1:WU22: UTC Offset: 5
23:04:49:I1:WU22:        PID: 5812
23:04:49:I1:WU22:        CWD: D:\ProgramData\FAHClient\work
23:04:49:I1:WU22:       Exec: D:\ProgramData\FAHClient\cores\gromacs-core-a9\windows-10-64bit\cpu-avx2_256-release\fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12\FahCore_a9.exe
23:04:49:I1:WU22:********************************************************************************
23:04:49:I1:WU22:Project: 18806 (Run 88, Clone 14, Gen 388)
23:04:49:I1:WU22:Digital signatures verified
23:04:49:I1:WU22:Calling: mdrun -c frame388.gro -s frame388.tpr -x frame388.xtc -cpi state.cpt -cpt 5 -nt 14 -ntmpi 1 -update cpu -nb cpu -bonded cpu -pme cpu -pmefft cpu
23:04:49:I1:WU22:Steps: first=97000000 total=97250000
23:04:54:I1:WU22:Completed 64812 out of 250000 steps (25%)
23:05:06:I1:WU22:Completed 65000 out of 250000 steps (26%)
23:05:07:I1:OUT103:< HTTP/1.1 200 HTTP_OK
23:05:10:I1:WU23:Received WU P18251 R117 C2 G162
23:05:10:I3:Running FahCore: D:\ProgramData\FAHClient\cores/openmm-core-24/windows-10-64bit/release/fahcore-24-windows-10-64bit-release-8.1.4/FahCore_24.exe -dir IAFKrY66z0ZCz-pzAw2VxJ6rdrGicEwc6m7-MLIBDcg -suffix 01 -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
23:05:10:I3:WU23:Started FahCore on PID 1608
23:05:11:I1:WU23:*********************** Log Started 2025-03-14T23:05:11Z ***********************
23:05:11:I1:WU23:*************************** Core24 Folding@home Core ***************************
23:05:11:I1:WU23:       Core: Core24
23:05:11:I1:WU23:       Type: 0x24
23:05:11:I1:WU23:    Version: 8.1.4
23:05:11:I1:WU23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:05:11:I1:WU23:  Copyright: 2022 foldingathome.org
23:05:11:I1:WU23:   Homepage: https://foldingathome.org/
23:05:11:I1:WU23:       Date: Jul 25 2024
23:05:11:I1:WU23:       Time: 05:42:49
23:05:11:I1:WU23:   Revision: cf9f0139862b8945a2091772770e4631aac37792
23:05:11:I1:WU23:     Branch: HEAD
23:05:11:I1:WU23:   Compiler: Visual C++
23:05:11:I1:WU23:    Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2
23:05:11:I1:WU23:             /Zc:throwingNew /MT -DOPENMM_VERSION="\"8.1.1\"" /Ox /std:c++14
23:05:11:I1:WU23:   Platform: win32 10
23:05:11:I1:WU23:       Bits: 64
23:05:11:I1:WU23:       Mode: Release
23:05:11:I1:WU23:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
23:05:11:I1:WU23:             <peastman@stanford.edu>
23:05:11:I1:WU23:       Args: -dir IAFKrY66z0ZCz-pzAw2VxJ6rdrGicEwc6m7-MLIBDcg -suffix 01
23:05:11:I1:WU23:             -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd
23:05:11:I1:WU23:             -opencl-platform 0 -opencl-device 1 -gpu 1
23:05:11:I1:WU23:************************************ libFAH ************************************
23:05:11:I1:WU23:       Date: Jul 25 2024
23:05:11:I1:WU23:       Time: 05:23:50
23:05:11:I1:WU23:   Revision: c7d2824a47eb025fa8cda8968c7a5e971585d90c
23:05:11:I1:WU23:     Branch: HEAD
23:05:11:I1:WU23:   Compiler: Visual C++
23:05:11:I1:WU23:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:05:11:I1:WU23:   Platform: win32 10
23:05:11:I1:WU23:       Bits: 64
23:05:11:I1:WU23:       Mode: Release
23:05:11:I1:WU23:************************************ CBang *************************************
23:05:11:I1:WU23:    Version: 1.7.2
23:05:11:I1:WU23:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:05:11:I1:WU23:        Org: Cauldron Development LLC
23:05:11:I1:WU23:  Copyright: Cauldron Development LLC, 2003-2024
23:05:11:I1:WU23:   Homepage: https://cauldrondevelopment.com/
23:05:11:I1:WU23:    License: LGPL-2.1-or-later
23:05:11:I1:WU23:       Date: Jul 25 2024
23:05:11:I1:WU23:       Time: 05:22:43
23:05:11:I1:WU23:   Revision: f1cd4c791e8c40a35dcfeab3ab85d910949cc0cb
23:05:11:I1:WU23:     Branch: HEAD
23:05:11:I1:WU23:   Compiler: Visual C++
23:05:11:I1:WU23:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
23:05:11:I1:WU23:   Platform: win32 10
23:05:11:I1:WU23:       Bits: 64
23:05:11:I1:WU23:       Mode: Release
23:05:11:I1:WU23:************************************ System ************************************
23:05:11:I1:WU23:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
23:05:11:I1:WU23:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
23:05:11:I1:WU23:       CPUs: 16
23:05:11:I1:WU23:     Memory: 95.16GiB
23:05:11:I1:WU23:Free Memory: 76.04GiB
23:05:11:I1:WU23: OS Version: 10.0
23:05:11:I1:WU23:Has Battery: false
23:05:11:I1:WU23: On Battery: false
23:05:11:I1:WU23:   Hostname: AZHAD-PC2
23:05:11:I1:WU23: UTC Offset: 5
23:05:11:I1:WU23:        PID: 1608
23:05:11:I1:WU23:        CWD: D:\ProgramData\FAHClient\work
23:05:11:I1:WU23:       Exec: D:\ProgramData\FAHClient\cores\openmm-core-24\windows-10-64bit\release\fahcore-24-windows-10-64bit-release-8.1.4\FahCore_24.exe
23:05:11:I1:WU23:************************************ OpenMM ************************************
23:05:11:I1:WU23:    Version: 8.1.1
23:05:11:I1:WU23:********************************************************************************
23:05:11:I1:WU23:Project: 18251 (Run 117, Clone 2, Gen 162)
23:05:11:I1:WU23:Reading tar file core.xml
23:05:11:I1:WU23:Reading tar file integrator.xml
23:05:11:I1:WU23:Reading tar file state.xml.bz2
23:05:12:I1:WU23:Reading tar file system.xml.bz2
23:05:12:I1:WU23:Digital signatures verified
23:05:12:I1:WU23:Folding@home GPU Core24 Folding@home Core
23:05:12:I1:WU23:Version 8.1.4
23:05:12:I1:WU23:  Checkpoint write interval: 12500 steps (5%) [20 total]
23:05:12:I1:WU23:  JSON viewer frame write interval: 2500 steps (1%) [100 total]
23:05:12:I1:WU23:  XTC frame write interval: 5000 steps (2%) [50 total]
23:05:12:I1:WU23:  TRR frame write interval: disabled
23:05:12:I1:WU23:  Global context and integrator variables write interval: disabled
23:05:13:I1:WU23:There are 3 platforms available.
23:05:13:I1:WU23:Platform 0: Reference
23:05:13:I1:WU23:Platform 1: CPU
23:05:13:I1:WU23:Platform 2: OpenCL
23:05:13:I1:WU23:  opencl-device 1 specified
23:05:17:I1:OUT101:< HTTP/1.1 200 HTTP_OK
23:05:17:I1:WU21:Credited
23:07:50:I1:WU22:Completed 67500 out of 250000 steps (27%)
23:08:02:W :WU22:Visualization frame 27 unchanged, skipping
23:08:06:I1:WU23:Attempting to create OpenCL context:
23:08:06:I1:WU23:  Configuring platform OpenCL
23:08:52:I1:WU23:  Using OpenCL on OpenCL platformId 0 and gpu 1
23:08:52:I1:WU23:  GPU info: Platform: OpenCL: AMD Accelerated Parallel Processing
23:08:52:I1:WU23:  GPU info: PlatformIndex: 0
23:08:52:I1:WU23:  GPU info: Device: gfx1030
23:08:52:I1:WU23:  GPU info: DeviceIndex: 1
23:08:52:I1:WU23:  GPU info: Vendor: 0x1002
23:08:52:I1:WU23:  GPU info: PCI: 03:00:00
23:08:52:I1:WU23:  GPU info: Compute: 2.0
23:08:52:I1:WU23:  GPU info: Driver: 3640.0
23:08:52:I1:WU23:  GPU info: GPU: true
23:08:52:I1:WU23:Completed 0 out of 250000 steps (0%)
23:08:59:I1:WU23:Checkpoint completed at step 0
23:10:44:I1:WU22:Completed 70000 out of 250000 steps (28%)
23:11:47:I1:WU23:Completed 2500 out of 250000 steps (1%)
23:13:27:I1:WU22:Completed 72500 out of 250000 steps (29%)
23:13:40:W :WU22:Visualization frame 29 unchanged, skipping
23:14:32:I1:WU23:Completed 5000 out of 250000 steps (2%)
23:16:14:I1:WU22:Completed 75000 out of 250000 steps (30%)
23:17:14:I1:WU23:Completed 7500 out of 250000 steps (3%)
23:18:57:I1:WU22:Completed 77500 out of 250000 steps (31%)
23:19:10:W :WU22:Visualization frame 31 unchanged, skipping
23:19:55:I1:WU23:Completed 10000 out of 250000 steps (4%)
23:21:44:I1:WU22:Completed 80000 out of 250000 steps (32%)
23:22:34:I1:WU23:Completed 12500 out of 250000 steps (5%)
23:22:42:I1:WU23:Checkpoint completed at step 12500
23:24:30:I1:WU22:Completed 82500 out of 250000 steps (33%)
23:24:43:W :WU22:Visualization frame 33 unchanged, skipping
23:25:23:I1:WU23:Completed 15000 out of 250000 steps (6%)
23:27:17:I1:WU22:Completed 85000 out of 250000 steps (34%)
23:28:04:I1:WU23:Completed 17500 out of 250000 steps (7%)
23:30:04:I1:WU22:Completed 87500 out of 250000 steps (35%)
23:30:43:I1:WU23:Completed 20000 out of 250000 steps (8%)
23:32:47:I1:WU22:Completed 90000 out of 250000 steps (36%)
23:33:00:W :WU22:Visualization frame 36 unchanged, skipping
23:33:22:I1:WU23:Completed 22500 out of 250000 steps (9%)
23:35:34:I1:WU22:Completed 92500 out of 250000 steps (37%)
23:36:02:I1:WU23:Completed 25000 out of 250000 steps (10%)
23:36:10:I1:WU23:Checkpoint completed at step 25000
23:38:20:I1:WU22:Completed 95000 out of 250000 steps (38%)
23:38:32:W :WU22:Visualization frame 38 unchanged, skipping
23:38:50:I1:WU23:Completed 27500 out of 250000 steps (11%)
23:41:07:I1:WU22:Completed 97500 out of 250000 steps (39%)
23:41:28:I1:WU23:Completed 30000 out of 250000 steps (12%)
23:43:50:I1:WU22:Completed 100000 out of 250000 steps (40%)
23:44:07:I1:WU23:Completed 32500 out of 250000 steps (13%)
23:46:37:I1:WU22:Completed 102500 out of 250000 steps (41%)
23:46:49:I1:WU23:Completed 35000 out of 250000 steps (14%)
23:49:19:I1:WU22:Completed 105000 out of 250000 steps (42%)
23:49:29:I1:WU23:Completed 37500 out of 250000 steps (15%)
23:49:32:W :WU22:Visualization frame 42 unchanged, skipping
23:49:37:I1:WU23:Checkpoint completed at step 37500
23:52:09:I1:WU22:Completed 107500 out of 250000 steps (43%)
23:52:17:I1:WU23:Completed 40000 out of 250000 steps (16%)
23:54:52:I1:WU22:Completed 110000 out of 250000 steps (44%)
23:55:00:I1:WU23:Completed 42500 out of 250000 steps (17%)
23:55:05:W :WU22:Visualization frame 44 unchanged, skipping
23:57:39:I1:WU22:Completed 112500 out of 250000 steps (45%)
23:57:41:I1:WU23:Completed 45000 out of 250000 steps (18%)
*********************** Log Started 2025-03-14T23:59:58Z ***********************
00:00:22:I1:WU23:Completed 47500 out of 250000 steps (19%)
00:00:26:I1:WU22:Completed 115000 out of 250000 steps (46%)
00:03:02:I1:WU23:Completed 50000 out of 250000 steps (20%)
00:03:10:I1:WU23:Checkpoint completed at step 50000
00:03:12:I1:WU22:Completed 117500 out of 250000 steps (47%)
00:03:25:W :WU22:Visualization frame 47 unchanged, skipping
00:05:50:I1:WU23:Completed 52500 out of 250000 steps (21%)
00:05:59:I1:WU22:Completed 120000 out of 250000 steps (48%)
00:08:31:I1:WU23:Completed 55000 out of 250000 steps (22%)
00:08:42:I1:WU22:Completed 122500 out of 250000 steps (49%)
00:08:55:W :WU22:Visualization frame 49 unchanged, skipping
00:11:09:I1:WU23:Completed 57500 out of 250000 steps (23%)
00:11:29:I1:WU22:Completed 125000 out of 250000 steps (50%)
00:13:47:I1:WU23:Completed 60000 out of 250000 steps (24%)
00:14:12:I1:WU22:Completed 127500 out of 250000 steps (51%)
00:14:25:W :WU22:Visualization frame 51 unchanged, skipping
00:16:29:I1:WU23:Completed 62500 out of 250000 steps (25%)
00:16:37:I1:WU23:Checkpoint completed at step 62500
00:17:01:I1:WU22:Completed 130000 out of 250000 steps (52%)
00:19:15:I1:WU23:Completed 65000 out of 250000 steps (26%)
00:19:45:I1:WU22:Completed 132500 out of 250000 steps (53%)
00:19:56:W :WU22:Visualization frame 53 unchanged, skipping
00:21:55:I1:WU23:Completed 67500 out of 250000 steps (27%)
00:22:31:I1:WU22:Completed 135000 out of 250000 steps (54%)
00:24:33:I1:WU23:Completed 70000 out of 250000 steps (28%)
00:25:18:I1:WU22:Completed 137500 out of 250000 steps (55%)
00:27:12:I1:WU23:Completed 72500 out of 250000 steps (29%)
00:28:01:I1:WU22:Completed 140000 out of 250000 steps (56%)
00:28:14:W :WU22:Visualization frame 56 unchanged, skipping
00:29:52:I1:WU23:Completed 75000 out of 250000 steps (30%)
00:29:58:I1:WU23:Checkpoint completed at step 75000
00:30:50:I1:WU22:Completed 142500 out of 250000 steps (57%)
00:32:35:I1:WU23:Completed 77500 out of 250000 steps (31%)
00:33:33:I1:WU22:Completed 145000 out of 250000 steps (58%)
00:33:46:W :WU22:Visualization frame 58 unchanged, skipping
00:35:15:I1:WU23:Completed 80000 out of 250000 steps (32%)
00:36:20:I1:WU22:Completed 147500 out of 250000 steps (59%)
00:37:55:I1:WU23:Completed 82500 out of 250000 steps (33%)
00:39:03:I1:WU22:Completed 150000 out of 250000 steps (60%)
00:40:36:I1:WU23:Completed 85000 out of 250000 steps (34%)
00:41:50:I1:WU22:Completed 152500 out of 250000 steps (61%)
00:43:16:I1:WU23:Completed 87500 out of 250000 steps (35%)
00:43:24:I1:WU23:Checkpoint completed at step 87500
00:44:36:I1:WU22:Completed 155000 out of 250000 steps (62%)
00:44:49:W :WU22:Visualization frame 62 unchanged, skipping
00:46:03:I1:WU23:Completed 90000 out of 250000 steps (36%)
00:47:23:I1:WU22:Completed 157500 out of 250000 steps (63%)
00:48:42:I1:WU23:Completed 92500 out of 250000 steps (37%)
00:50:10:I1:WU22:Completed 160000 out of 250000 steps (64%)
00:51:21:I1:WU23:Completed 95000 out of 250000 steps (38%)
00:52:53:I1:WU22:Completed 162500 out of 250000 steps (65%)
00:53:06:W :WU22:Visualization frame 65 unchanged, skipping
00:53:59:I1:WU23:Completed 97500 out of 250000 steps (39%)
00:55:40:I1:WU22:Completed 165000 out of 250000 steps (66%)
00:56:38:I1:WU23:Completed 100000 out of 250000 steps (40%)
00:56:46:I1:WU23:Checkpoint completed at step 100000
00:58:26:I1:WU22:Completed 167500 out of 250000 steps (67%)
00:58:38:W :WU22:Visualization frame 67 unchanged, skipping
00:59:27:I1:WU23:Completed 102500 out of 250000 steps (41%)
01:01:13:I1:WU22:Completed 170000 out of 250000 steps (68%)
01:02:05:I1:WU23:Completed 105000 out of 250000 steps (42%)
01:03:56:I1:WU22:Completed 172500 out of 250000 steps (69%)
01:04:08:W :WU22:Visualization frame 69 unchanged, skipping
01:04:45:I1:WU23:Completed 107500 out of 250000 steps (43%)
01:06:43:I1:WU22:Completed 175000 out of 250000 steps (70%)
01:07:26:I1:WU23:Completed 110000 out of 250000 steps (44%)
01:09:26:I1:WU22:Completed 177500 out of 250000 steps (71%)
01:09:39:W :WU22:Visualization frame 71 unchanged, skipping
01:10:08:I1:WU23:Completed 112500 out of 250000 steps (45%)
01:10:16:I1:WU23:Checkpoint completed at step 112500
01:12:16:I1:WU22:Completed 180000 out of 250000 steps (72%)
01:12:55:I1:WU23:Completed 115000 out of 250000 steps (46%)
01:15:03:I1:WU22:Completed 182500 out of 250000 steps (73%)
01:15:36:I1:WU23:Completed 117500 out of 250000 steps (47%)
01:17:46:I1:WU22:Completed 185000 out of 250000 steps (74%)
01:17:59:W :WU22:Visualization frame 74 unchanged, skipping
01:18:17:I1:WU23:Completed 120000 out of 250000 steps (48%)
01:20:33:I1:WU22:Completed 187500 out of 250000 steps (75%)
01:20:59:I1:WU23:Completed 122500 out of 250000 steps (49%)
01:23:16:I1:WU22:Completed 190000 out of 250000 steps (76%)
01:23:29:W :WU22:Visualization frame 76 unchanged, skipping
01:23:39:I1:WU23:Completed 125000 out of 250000 steps (50%)
01:23:47:I1:WU23:Checkpoint completed at step 125000
01:26:06:I1:WU22:Completed 192500 out of 250000 steps (77%)
01:26:29:I1:WU23:Completed 127500 out of 250000 steps (51%)
01:28:49:I1:WU22:Completed 195000 out of 250000 steps (78%)
01:29:02:W :WU22:Visualization frame 78 unchanged, skipping
01:29:10:I1:WU23:Completed 130000 out of 250000 steps (52%)
01:31:36:I1:WU22:Completed 197500 out of 250000 steps (79%)
01:31:49:I1:WU23:Completed 132500 out of 250000 steps (53%)
01:34:20:I1:WU22:Completed 200000 out of 250000 steps (80%)
01:34:32:I1:WU23:Completed 135000 out of 250000 steps (54%)
01:37:07:I1:WU22:Completed 202500 out of 250000 steps (81%)
01:37:10:I1:WU23:Completed 137500 out of 250000 steps (55%)
01:37:19:I1:WU23:Checkpoint completed at step 137500
01:39:53:I1:WU22:Completed 205000 out of 250000 steps (82%)
01:40:01:I1:WU23:Completed 140000 out of 250000 steps (56%)
01:42:39:I1:WU22:Completed 207500 out of 250000 steps (83%)
01:42:39:I1:WU23:Completed 142500 out of 250000 steps (57%)
01:42:52:W :WU22:Visualization frame 83 unchanged, skipping
01:45:20:I1:WU23:Completed 145000 out of 250000 steps (58%)
01:45:26:I1:WU22:Completed 210000 out of 250000 steps (84%)
01:48:00:I1:WU23:Completed 147500 out of 250000 steps (59%)
01:48:09:I1:WU22:Completed 212500 out of 250000 steps (85%)
01:48:22:W :WU22:Visualization frame 85 unchanged, skipping
01:50:38:I1:WU23:Completed 150000 out of 250000 steps (60%)
01:50:46:I1:WU23:Checkpoint completed at step 150000
01:50:59:I1:WU22:Completed 215000 out of 250000 steps (86%)
01:53:24:I1:WU23:Completed 152500 out of 250000 steps (61%)
01:53:42:I1:WU22:Completed 217500 out of 250000 steps (87%)
01:53:55:W :WU22:Visualization frame 87 unchanged, skipping
01:56:04:I1:WU23:Completed 155000 out of 250000 steps (62%)
01:56:29:I1:WU22:Completed 220000 out of 250000 steps (88%)
01:58:43:I1:WU23:Completed 157500 out of 250000 steps (63%)
01:59:12:I1:WU22:Completed 222500 out of 250000 steps (89%)
01:59:26:W :WU22:Visualization frame 89 unchanged, skipping
02:01:22:I1:WU23:Completed 160000 out of 250000 steps (64%)
02:01:59:I1:WU22:Completed 225000 out of 250000 steps (90%)
02:04:02:I1:WU23:Completed 162500 out of 250000 steps (65%)
02:04:10:I1:WU23:Checkpoint completed at step 162500
02:04:45:I1:WU22:Completed 227500 out of 250000 steps (91%)
02:04:58:W :WU22:Visualization frame 91 unchanged, skipping
02:06:24:I1:WU23:Completed 165000 out of 250000 steps (66%)
02:06:24:I1:WU23:Caught signal SIGABRT(22)
02:06:24:I1:WU23:WARNING:Unexpected exit
02:06:25:E :WU23:Core returned EARLY_UNIT_END (123)
02:06:25:E :WU23:Run did not produce any results. Dumping WU
02:06:25:I1:Default:Added new work unit: cpus:0 gpus:gpu:03:00:00
02:06:25:I1:WU23:Sending dump report
02:06:25:I1:WU24:Requesting WU assignment for user azhad team 0
02:06:25:I1:WU22:WARNING:Console control signal 1 on PID 5812
02:06:25:I1:WU22:Exiting, please wait. . .
02:06:25:I1:OUT105:> POST https://highland3.seas.upenn.edu/api/results HTTP/1.1
02:06:25:I1:OUT106:> POST https://assign4.foldingathome.org/api/assign HTTP/1.1
02:06:26:I1:OUT105:< HTTP/1.1 200 HTTP_OK
02:06:26:I1:WU23:Dumped
02:06:26:I1:OUT106:< HTTP/1.1 200 HTTP_OK
02:06:26:I1:WU24:Received WU assignment KqUy6IqsFOT5p5bvnsn401-8r7X-ra7Go-cqrGIBGo0
02:06:26:I1:WU24:Downloading WU
02:06:26:I1:OUT107:> POST https://highland1.seas.upenn.edu/api/assign HTTP/1.1
02:06:27:I1:OUT107:< HTTP/1.1 503 HTTP_SERVICE_UNAVAILABLE
02:06:27:E :OUT107:HTTP_SERVICE_UNAVAILABLE: {"error":{"message":"Please wait","code":503}}
02:06:27:I1:WU24:Retry #1 in 2 secs
02:06:29:I1:WU24:Requesting WU assignment for user azhad team 0
02:06:29:I1:OUT108:> POST https://assign5.foldingathome.org/api/assign HTTP/1.1
02:06:30:I1:WU22:Folding@home Core Shutdown: INTERRUPTED
02:06:30:I1:WU22:Core returned INTERRUPTED (102)
02:06:30:I3:Running FahCore: D:\ProgramData\FAHClient\cores/gromacs-core-a9/windows-10-64bit/cpu-avx2_256-release/fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12/FahCore_a9.exe -dir 9zXqJDVx0SXvPKVcGUZREjIQSz5kOZTR6eCFPXWLC1c -suffix 01 -version 8.4.9 -lifeline 29216 -np 15
02:06:30:I3:WU22:Started FahCore on PID 8544
02:06:30:I1:OUT108:< HTTP/1.1 200 HTTP_OK
02:06:30:I1:WU24:Received WU assignment 4s45uG5X-p3RUrw4dUVLgSf1fLmVi89m8gwueU1OTP4
02:06:30:I1:WU24:Downloading WU
02:06:30:I1:OUT109:> POST https://highland3.seas.upenn.edu/api/assign HTTP/1.1
02:06:31:E :WU22:Core exited with Windows unhandled exception code 0xc000013a.  See https://bit.ly/2CXgWkZ for more information.
02:06:31:E :WU22:Core returned FAILED_1 (0)
02:06:31:E :WU22:The folding core did not produce any log output.  This indicates that the core is not functional on your system.  Check for missing libraries or GPU drivers.  Make a post about your issue on https://foldingforum.org/ to get more help.
02:06:31:E :WU22:Run did not produce any results. Dumping WU
02:06:31:I1:WU22:Sending dump report
02:06:31:I1:OUT110:> POST https://fahserver1.flatironinstitute.org/api/results HTTP/1.1
02:06:32:I1:OUT110:< HTTP/1.1 200 HTTP_OK
02:06:32:I1:WU22:Dumped
Instead of Dumping, why not rollback, do the calculations again from one before the last checkpoint to the last checkpoint. If the results match, that gives some validity.
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

@muziqaz: Noted your points. In progress of trying it out.

WU27:

Code: Select all

06:34:55:I1:WU27:Received WU P18251 R88 C2 G185
06:34:56:I1:OUT120:< HTTP/1.1 200 HTTP_OK
06:34:56:I1:WU24:Credited
06:34:56:I3:Running FahCore: D:\ProgramData\FAHClient\cores/openmm-core-24/windows-10-64bit/release/fahcore-24-windows-10-64bit-release-8.1.4/FahCore_24.exe -dir e_rhM5Fm7CQ1Eqohun5BIg5ytNVOfbXELoVTy0eT95c -suffix 01 -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
06:34:56:I3:WU27:Started FahCore on PID 20900
06:34:56:I1:WU27:*********************** Log Started 2025-03-15T06:34:56Z ***********************
06:34:56:I1:WU27:*************************** Core24 Folding@home Core ***************************
06:34:56:I1:WU27:       Core: Core24
06:34:56:I1:WU27:       Type: 0x24
06:34:56:I1:WU27:    Version: 8.1.4
06:34:56:I1:WU27:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
06:34:56:I1:WU27:  Copyright: 2022 foldingathome.org
06:34:56:I1:WU27:   Homepage: https://foldingathome.org/
06:34:56:I1:WU27:       Date: Jul 25 2024
06:34:56:I1:WU27:       Time: 05:42:49
06:34:56:I1:WU27:   Revision: cf9f0139862b8945a2091772770e4631aac37792
06:34:56:I1:WU27:     Branch: HEAD
06:34:56:I1:WU27:   Compiler: Visual C++
06:34:56:I1:WU27:    Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2
06:34:56:I1:WU27:             /Zc:throwingNew /MT -DOPENMM_VERSION="\"8.1.1\"" /Ox /std:c++14
06:34:56:I1:WU27:   Platform: win32 10
06:34:56:I1:WU27:       Bits: 64
06:34:56:I1:WU27:       Mode: Release
06:34:56:I1:WU27:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
06:34:56:I1:WU27:             <peastman@stanford.edu>
06:34:56:I1:WU27:       Args: -dir e_rhM5Fm7CQ1Eqohun5BIg5ytNVOfbXELoVTy0eT95c -suffix 01
06:34:56:I1:WU27:             -version 8.4.9 -lifeline 29216 -gpu-platform opencl -gpu-vendor amd
06:34:56:I1:WU27:             -opencl-platform 0 -opencl-device 1 -gpu 1
06:34:56:I1:WU27:************************************ libFAH ************************************
06:34:56:I1:WU27:       Date: Jul 25 2024
06:34:56:I1:WU27:       Time: 05:23:50
06:34:56:I1:WU27:   Revision: c7d2824a47eb025fa8cda8968c7a5e971585d90c
06:34:56:I1:WU27:     Branch: HEAD
06:34:56:I1:WU27:   Compiler: Visual C++
06:34:56:I1:WU27:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
06:34:56:I1:WU27:   Platform: win32 10
06:34:56:I1:WU27:       Bits: 64
06:34:56:I1:WU27:       Mode: Release
06:34:56:I1:WU27:************************************ CBang *************************************
06:34:56:I1:WU27:    Version: 1.7.2
06:34:56:I1:WU27:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
06:34:56:I1:WU27:        Org: Cauldron Development LLC
06:34:56:I1:WU27:  Copyright: Cauldron Development LLC, 2003-2024
06:34:56:I1:WU27:   Homepage: https://cauldrondevelopment.com/
06:34:56:I1:WU27:    License: LGPL-2.1-or-later
06:34:56:I1:WU27:       Date: Jul 25 2024
06:34:56:I1:WU27:       Time: 05:22:43
06:34:56:I1:WU27:   Revision: f1cd4c791e8c40a35dcfeab3ab85d910949cc0cb
06:34:56:I1:WU27:     Branch: HEAD
06:34:56:I1:WU27:   Compiler: Visual C++
06:34:56:I1:WU27:    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
06:34:56:I1:WU27:   Platform: win32 10
06:34:56:I1:WU27:       Bits: 64
06:34:56:I1:WU27:       Mode: Release
06:34:56:I1:WU27:************************************ System ************************************
06:34:56:I1:WU27:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
06:34:56:I1:WU27:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
06:34:56:I1:WU27:       CPUs: 16
06:34:56:I1:WU27:     Memory: 95.16GiB
06:34:56:I1:WU27:Free Memory: 73.98GiB
06:34:56:I1:WU27: OS Version: 10.0
06:34:56:I1:WU27:Has Battery: false
06:34:56:I1:WU27: On Battery: false
06:34:56:I1:WU27:   Hostname: AZHAD-PC2
06:34:56:I1:WU27: UTC Offset: 5
06:34:56:I1:WU27:        PID: 20900
06:34:56:I1:WU27:        CWD: D:\ProgramData\FAHClient\work
06:34:56:I1:WU27:       Exec: D:\ProgramData\FAHClient\cores\openmm-core-24\windows-10-64bit\release\fahcore-24-windows-10-64bit-release-8.1.4\FahCore_24.exe
06:34:56:I1:WU27:************************************ OpenMM ************************************
06:34:56:I1:WU27:    Version: 8.1.1
06:34:56:I1:WU27:********************************************************************************
06:34:56:I1:WU27:Project: 18251 (Run 88, Clone 2, Gen 185)
06:34:56:I1:WU27:Reading tar file core.xml
06:34:56:I1:WU27:Reading tar file integrator.xml
06:34:56:I1:WU27:Reading tar file state.xml.bz2
06:34:56:I1:WU27:Reading tar file system.xml.bz2
06:34:56:I1:WU27:Digital signatures verified
06:34:56:I1:WU27:Folding@home GPU Core24 Folding@home Core
06:34:56:I1:WU27:Version 8.1.4
06:34:57:I1:WU27:  Checkpoint write interval: 12500 steps (5%) [20 total]
06:34:57:I1:WU27:  JSON viewer frame write interval: 2500 steps (1%) [100 total]
06:34:57:I1:WU27:  XTC frame write interval: 5000 steps (2%) [50 total]
06:34:57:I1:WU27:  TRR frame write interval: disabled
06:34:57:I1:WU27:  Global context and integrator variables write interval: disabled
06:34:57:I1:WU27:There are 3 platforms available.
06:34:57:I1:WU27:Platform 0: Reference
06:34:57:I1:WU27:Platform 1: CPU
06:34:57:I1:WU27:Platform 2: OpenCL
06:34:57:I1:WU27:  opencl-device 1 specified
06:36:45:I1:WU26:Completed 182500 out of 250000 steps (73%)
06:37:49:I1:WU27:Attempting to create OpenCL context:
06:37:49:I1:WU27:  Configuring platform OpenCL
06:38:33:I1:WU27:  Using OpenCL on OpenCL platformId 0 and gpu 1
06:38:33:I1:WU27:  GPU info: Platform: OpenCL: AMD Accelerated Parallel Processing
06:38:33:I1:WU27:  GPU info: PlatformIndex: 0
06:38:33:I1:WU27:  GPU info: Device: gfx1030
06:38:33:I1:WU27:  GPU info: DeviceIndex: 1
06:38:33:I1:WU27:  GPU info: Vendor: 0x1002
06:38:33:I1:WU27:  GPU info: PCI: 03:00:00
06:38:33:I1:WU27:  GPU info: Compute: 2.0
06:38:33:I1:WU27:  GPU info: Driver: 3640.0
06:38:33:I1:WU27:  GPU info: GPU: true
06:38:33:I1:WU27:Completed 0 out of 250000 steps (0%)
06:38:39:I1:WU27:Checkpoint completed at step 0
06:39:35:I1:WU26:Completed 185000 out of 250000 steps (74%)
06:39:48:W :WU26:Visualization frame 74 unchanged, skipping
06:41:27:I1:WU27:Completed 2500 out of 250000 steps (1%)
06:42:21:I1:WU26:Completed 187500 out of 250000 steps (75%)
06:44:11:I1:WU27:Completed 5000 out of 250000 steps (2%)
06:45:07:I1:WU26:Completed 190000 out of 250000 steps (76%)
06:46:51:I1:WU27:Completed 7500 out of 250000 steps (3%)
06:47:49:I1:WU26:Completed 192500 out of 250000 steps (77%)
06:48:01:W :WU26:Visualization frame 77 unchanged, skipping
06:49:29:I1:WU27:Completed 10000 out of 250000 steps (4%)
06:50:34:I1:WU26:Completed 195000 out of 250000 steps (78%)
06:52:04:I1:WU27:Completed 12500 out of 250000 steps (5%)
06:52:12:I1:WU27:Checkpoint completed at step 12500
06:53:19:I1:WU26:Completed 197500 out of 250000 steps (79%)
06:53:32:W :WU26:Visualization frame 79 unchanged, skipping
06:54:49:I1:WU27:Completed 15000 out of 250000 steps (6%)
06:56:05:I1:WU26:Completed 200000 out of 250000 steps (80%)
06:57:26:I1:WU27:Completed 17500 out of 250000 steps (7%)
06:58:47:I1:WU26:Completed 202500 out of 250000 steps (81%)
06:59:00:W :WU26:Visualization frame 81 unchanged, skipping
07:00:03:I1:WU27:Completed 20000 out of 250000 steps (8%)
07:01:33:I1:WU26:Completed 205000 out of 250000 steps (82%)
07:02:39:I1:WU27:Completed 22500 out of 250000 steps (9%)
07:04:15:I1:WU26:Completed 207500 out of 250000 steps (83%)
07:04:28:W :WU26:Visualization frame 83 unchanged, skipping
07:05:12:I1:WU27:Completed 25000 out of 250000 steps (10%)
07:05:20:I1:WU27:Checkpoint completed at step 25000
07:07:04:I1:WU26:Completed 210000 out of 250000 steps (84%)
07:07:56:I1:WU27:Completed 27500 out of 250000 steps (11%)
07:09:49:I1:WU26:Completed 212500 out of 250000 steps (85%)
07:10:32:I1:WU27:Completed 30000 out of 250000 steps (12%)
07:12:32:I1:WU26:Completed 215000 out of 250000 steps (86%)
07:12:44:W :WU26:Visualization frame 86 unchanged, skipping
07:13:08:I1:WU27:Completed 32500 out of 250000 steps (13%)
07:15:17:I1:WU26:Completed 217500 out of 250000 steps (87%)
07:15:45:I1:WU27:Completed 35000 out of 250000 steps (14%)
07:18:00:I1:WU26:Completed 220000 out of 250000 steps (88%)
07:18:13:W :WU26:Visualization frame 88 unchanged, skipping
07:18:24:I1:WU27:Completed 37500 out of 250000 steps (15%)
07:18:32:I1:WU27:Checkpoint completed at step 37500
07:20:49:I1:WU26:Completed 222500 out of 250000 steps (89%)
07:21:10:I1:WU27:Completed 40000 out of 250000 steps (16%)
07:23:31:I1:WU26:Completed 225000 out of 250000 steps (90%)
07:23:44:W :WU26:Visualization frame 90 unchanged, skipping
07:23:47:I1:WU27:Completed 42500 out of 250000 steps (17%)
07:26:17:I1:WU26:Completed 227500 out of 250000 steps (91%)
07:26:23:I1:WU27:Completed 45000 out of 250000 steps (18%)
07:29:00:I1:WU26:Completed 230000 out of 250000 steps (92%)
07:29:02:I1:WU27:Completed 47500 out of 250000 steps (19%)
07:29:13:W :WU26:Visualization frame 92 unchanged, skipping
07:31:37:I1:WU27:Completed 50000 out of 250000 steps (20%)
07:31:45:I1:WU27:Checkpoint completed at step 50000
07:31:49:I1:WU26:Completed 232500 out of 250000 steps (93%)
07:34:22:I1:WU27:Completed 52500 out of 250000 steps (21%)
07:34:32:I1:WU26:Completed 235000 out of 250000 steps (94%)
07:34:44:W :WU26:Visualization frame 94 unchanged, skipping
07:36:57:I1:WU27:Completed 55000 out of 250000 steps (22%)
07:37:17:I1:WU26:Completed 237500 out of 250000 steps (95%)
07:39:32:I1:WU27:Completed 57500 out of 250000 steps (23%)
07:40:03:I1:WU26:Completed 240000 out of 250000 steps (96%)
07:42:07:I1:WU27:Completed 60000 out of 250000 steps (24%)
07:42:46:I1:WU26:Completed 242500 out of 250000 steps (97%)
07:42:59:W :WU26:Visualization frame 97 unchanged, skipping
07:44:41:I1:WU27:Completed 62500 out of 250000 steps (25%)
07:44:48:I1:WU27:Checkpoint completed at step 62500
07:45:34:I1:WU26:Completed 245000 out of 250000 steps (98%)
07:47:23:I1:WU27:Completed 65000 out of 250000 steps (26%)
07:48:17:I1:WU26:Completed 247500 out of 250000 steps (99%)
07:48:29:W :WU26:Visualization frame 99 unchanged, skipping
07:49:59:I1:WU27:Completed 67500 out of 250000 steps (27%)
07:51:02:I1:WU26:Completed 250000 out of 250000 steps (100%)
07:51:10:I1:WU26:Saving result file ..\logfile_01.txt
07:51:10:I1:WU26:Saving result file frame403.gro
07:51:10:I1:WU26:Saving result file frame403.xtc
07:51:10:I1:WU26:Saving result file md.log
07:51:10:I1:WU26:Saving result file science.log
07:51:10:I1:WU26:Saving result file state.cpt
07:51:10:I1:WU26:Folding@home Core Shutdown: FINISHED_UNIT
07:51:11:I1:WU26:Core returned FINISHED_UNIT (100)
07:51:13:I1:Default:Added new work unit: cpus:14 gpus:
07:51:13:I1:WU26:Uploading WU results
07:51:13:I1:WU28:Requesting WU assignment for user azhad team 0
07:51:13:I1:OUT124:> POST https://fahserver1.flatironinstitute.org/api/results HTTP/1.1
07:51:13:I1:OUT125:> POST https://assign3.foldingathome.org/api/assign HTTP/1.1
07:51:14:I1:OUT125:< HTTP/1.1 200 HTTP_OK
07:51:14:I1:WU28:Received WU assignment es_Urn2HVtfFsQPRAa_7sBoVEC7UX8paRlreTrrYeb4
07:51:14:I1:WU28:Downloading WU
07:51:15:I1:OUT126:> POST https://vav17.fah.temple.edu/api/assign HTTP/1.1
07:51:19:I1:OUT126:< HTTP/1.1 200 HTTP_OK
07:51:19:I1:WU28:Received WU P12485 R12 C11 G37
07:51:19:I3:Running FahCore: D:\ProgramData\FAHClient\cores/fahcore-a8-win-64bit-avx2_256-0.0.12/FahCore_a8.exe -dir es_Urn2HVtfFsQPRAa_7sBoVEC7UX8paRlreTrrYeb4 -suffix 01 -version 8.4.9 -lifeline 29216 -np 14
07:51:19:I3:WU28:Started FahCore on PID 21232
07:51:19:I1:WU28:*********************** Log Started 2025-03-15T07:51:19Z ***********************
07:51:19:I1:WU28:************************** Gromacs Folding@home Core ***************************
07:51:19:I1:WU28:       Core: Gromacs
07:51:19:I1:WU28:       Type: 0xa8
07:51:19:I1:WU28:    Version: 0.0.12
07:51:19:I1:WU28:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
07:51:19:I1:WU28:  Copyright: 2020 foldingathome.org
07:51:19:I1:WU28:   Homepage: https://foldingathome.org/
07:51:19:I1:WU28:       Date: Jan 16 2021
07:51:19:I1:WU28:       Time: 12:29:40
07:51:19:I1:WU28:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
07:51:19:I1:WU28:     Branch: master
07:51:19:I1:WU28:   Compiler: Visual C++ 2019 16.7
07:51:19:I1:WU28:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
07:51:19:I1:WU28:   Platform: win32 10
07:51:19:I1:WU28:       Bits: 64
07:51:19:I1:WU28:       Mode: Release
07:51:19:I1:WU28:       SIMD: avx2_256
07:51:19:I1:WU28:     OpenMP: ON
07:51:19:I1:WU28:       CUDA: OFF
07:51:19:I1:WU28:       Args: -dir es_Urn2HVtfFsQPRAa_7sBoVEC7UX8paRlreTrrYeb4 -suffix 01
07:51:19:I1:WU28:             -version 8.4.9 -lifeline 29216 -np 14
07:51:19:I1:WU28:************************************ libFAH ************************************
07:51:19:I1:WU28:       Date: Jan 16 2021
07:51:19:I1:WU28:       Time: 11:24:13
07:51:19:I1:WU28:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
07:51:19:I1:WU28:     Branch: master
07:51:19:I1:WU28:   Compiler: Visual C++ 2019 16.7
07:51:19:I1:WU28:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
07:51:19:I1:WU28:   Platform: win32 10
07:51:19:I1:WU28:       Bits: 64
07:51:19:I1:WU28:       Mode: Release
07:51:19:I1:WU28:************************************ CBang *************************************
07:51:19:I1:WU28:       Date: Jan 16 2021
07:51:19:I1:WU28:       Time: 11:23:53
07:51:19:I1:WU28:   Revision: c5816759c404e4b65f9f364c3d1ef554a67c4225
07:51:19:I1:WU28:     Branch: master
07:51:19:I1:WU28:   Compiler: Visual C++ 2019 16.7
07:51:19:I1:WU28:    Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
07:51:19:I1:WU28:   Platform: win32 10
07:51:19:I1:WU28:       Bits: 64
07:51:19:I1:WU28:       Mode: Release
07:51:19:I1:WU28:************************************ System ************************************
07:51:19:I1:WU28:        CPU: AMD Ryzen 7 7800X3D 8-Core Processor
07:51:19:I1:WU28:     CPU ID: AuthenticAMD Family 25 Model 97 Stepping 2
07:51:19:I1:WU28:       CPUs: 16
07:51:19:I1:WU28:     Memory: 95.16GiB
07:51:19:I1:WU28:Free Memory: 68.91GiB
07:51:19:I1:WU28:    Threads: WINDOWS_THREADS
07:51:19:I1:WU28: OS Version: 6.2
07:51:19:I1:WU28:Has Battery: false
07:51:19:I1:WU28: On Battery: false
07:51:19:I1:WU28: UTC Offset: 5
07:51:19:I1:WU28:        PID: 21232
07:51:19:I1:WU28:        CWD: D:\ProgramData\FAHClient\work
07:51:19:I1:WU28:********************************************************************************
07:51:19:I1:WU28:Project: 12485 (Run 12, Clone 11, Gen 37)
07:51:19:I1:WU28:Unit: 0x00000000000000000000000000000000
07:51:19:I1:WU28:Reading tar file core.xml
07:51:19:I1:WU28:Reading tar file frame37.tpr
07:51:19:I1:WU28:Digital signatures verified
07:51:19:I1:WU28:Calling: mdrun -c frame37.gro -s frame37.tpr -x frame37.xtc -cpt 5 -nt 14 -ntmpi 1
07:51:19:I1:WU28:Steps: first=18500000 total=19000000
07:51:21:I1:WU28:Completed 1 out of 500000 steps (0%)
07:51:40:I1:OUT124:< HTTP/1.1 200 HTTP_OK
07:51:40:I1:WU26:Credited
07:52:20:I1:WU28:Completed 5000 out of 500000 steps (1%)
07:52:34:I1:WU27:Completed 70000 out of 250000 steps (28%)
07:53:20:I1:WU28:Completed 10000 out of 500000 steps (2%)
07:54:19:I1:WU28:Completed 15000 out of 500000 steps (3%)
07:55:09:I1:WU27:Completed 72500 out of 250000 steps (29%)
07:55:18:I1:WU28:Completed 20000 out of 500000 steps (4%)
07:56:17:I1:WU28:Completed 25000 out of 500000 steps (5%)
07:57:16:I1:WU28:Completed 30000 out of 500000 steps (6%)
07:57:46:I1:WU27:Completed 75000 out of 250000 steps (30%)
07:57:53:I1:WU27:Checkpoint completed at step 75000
07:58:19:I1:WU28:Completed 35000 out of 500000 steps (7%)
07:59:19:I1:WU28:Completed 40000 out of 500000 steps (8%)
08:00:07:I1:WU27:Completed 77500 out of 250000 steps (31%)
08:00:10:I1:WU27:Completed 80000 out of 250000 steps (32%)
08:00:10:I1:WU27:Caught signal SIGABRT(22)
08:00:10:I1:WU27:WARNING:Unexpected exit
08:00:11:E :WU27:Core returned EARLY_UNIT_END (123)
08:00:11:E :WU27:Run did not produce any results. Dumping WU
08:00:11:I1:Default:Added new work unit: cpus:0 gpus:gpu:03:00:00
08:00:11:I1:WU27:Sending dump report
08:00:11:I1:WU29:Requesting WU assignment for user azhad team 0
08:00:11:I1:WU28:WARNING:Console control signal 1 on PID 21232
08:00:11:I1:WU28:Exiting, please wait. . .
08:00:11:I1:OUT129:> POST https://assign4.foldingathome.org/api/assign HTTP/1.1
08:00:11:I1:OUT128:> POST https://highland3.seas.upenn.edu/api/results HTTP/1.1
08:00:11:I1:WU28:Folding@home Core Shutdown: INTERRUPTED
08:00:12:I1:OUT128:< HTTP/1.1 200 HTTP_OK
08:00:12:I1:WU27:Dumped
muziqaz
Posts: 2131
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Too many Core Dumped on GPU

Post by muziqaz »

If you are failing CPU WUs, that means your computer is unstable, and it might not be your GPU. The GPU WU which is failing for you is the one which demands relatively high RAM usage (6GB or so). That might indicate that your RAM is unstable, or a CPU, since it also demands a good amount of CPU when sanity checking. I see that it fails at the end of the frame, which might indicate CPU instability, or even SSD.

Why it is dumping, is our concern, not yours (I mean you have bigger problems than worrying about why Client is dumping failed stuff, let us worry about that) :)
FAH Omega tester
Image
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

@muziqaz: I am not failing CPU WUs! Only one CPU WU failed (due to Control signal 1(??) at 02:06:25 and ends with Exiting please wait).

What is "control signal 1"?

Did the GPU work unit 23 die, causing the system to be stuck for few seconds resulting the program thinking that core of WU22 also died?

That is the only CPU WU that died. Doesn't mean CPU or SSD instability. No RAM instability or low RAM either. 48GB x 2 ECC memory with ECC enabled.
muziqaz
Posts: 2131
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Too many Core Dumped on GPU

Post by muziqaz »

azhad wrote: Sat Mar 15, 2025 2:41 pm @muziqaz: I am not failing CPU WUs! Only one CPU WU failed (due to Control signal 1(??) at 02:06:25 and ends with Exiting please wait).

What is "control signal 1"?

Did the GPU work unit 23 die, causing the system to be stuck for few seconds resulting the program thinking that core of WU22 also died?

That is the only CPU WU that died. Doesn't mean CPU or SSD instability. No RAM instability or low RAM either. 48GB x 2 ECC memory with ECC enabled.
CPU WUs usually don't fail.
GPU failures are more frequent since it is very hard to write stable drivers, and GPUs are more volatile.

Also, note that in Windows you have to pause any folding before restarting your PC
FAH Omega tester
Image
arisu
Posts: 586
Joined: Mon Feb 24, 2025 11:11 pm

Re: Too many Core Dumped on GPU

Post by arisu »

I don't see any CPU WU failure, it's just being interrupted. I guess control signal 1 is similar to SIGINT on Linux, since it interrupts the CPU WU. It doesn't cause it to crash or dump, it just stops it. If you click pause, you'll probably see control signal 1 on each WU too.

Caught SIGABRT is what concerns me. That means there's an internal error occurring and the core doesn't know how to recover. The core terminates (and not to control signal 1, but to an emergency abort signal), and the client sees that and dumps it. I don't know what would cause that, and I don't know enough about Windows to diagnose it, myself.

The reason it is not restored from a checkpoint is because the client is not coded to be able to determine if the failure is one that can be safely recovered from, and coding the client to determine that may be harder, or more dangerous, than it seems. What if it is terminating due to memory corruption, and there's a 1 in 1000 chance that the corruption won't cause it to terminate? If the client just retries every time, then eventually you'll send back a WU or two with bad data and that can corrupt the scientific research. If it simply retried like nothing bad happened, that would be called "normalization of deviance" and that phenomenon has caused lives to be lost. It's better to fail fast when an unknown error occurs than to hide the problem and keep retrying.
muziqaz
Posts: 2131
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3D, 5950x, 5800x3D
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, Intel B580
Location: London
Contact:

Re: Too many Core Dumped on GPU

Post by muziqaz »

arisu wrote: Sun Mar 16, 2025 3:30 am I don't see any CPU WU failure, it's just being interrupted. I guess control signal 1 is similar to SIGINT on Linux, since it interrupts the CPU WU. It doesn't cause it to crash or dump, it just stops it. If you click pause, you'll probably see control signal 1 on each WU too.

Caught SIGABRT is what concerns me. That means there's an internal error occurring and the core doesn't know how to recover. The core terminates (and not to control signal 1, but to an emergency abort signal), and the client sees that and dumps it. I don't know what would cause that, and I don't know enough about Windows to diagnose it, myself.

The reason it is not restored from a checkpoint is because the client is not coded to be able to determine if the failure is one that can be safely recovered from, and coding the client to determine that may be harder, or more dangerous, than it seems. What if it is terminating due to memory corruption, and there's a 1 in 1000 chance that the corruption won't cause it to terminate? If the client just retries every time, then eventually you'll send back a WU or two with bad data and that can corrupt the scientific research. If it simply retried like nothing bad happened, that would be called "normalization of deviance" and that phenomenon has caused lives to be lost. It's better to fail fast when an unknown error occurs than to hide the problem and keep retrying.
02:06:31:E :WU22:Core exited with Windows unhandled exception code 0xc000013a. See https://bit.ly/2CXgWkZ for more information.
02:06:31:E :WU22:Core returned FAILED_1 (0)

WU22 is core_a9
FAH Omega tester
Image
azhad
Posts: 16
Joined: Tue Jul 27, 2021 9:40 pm

Re: Too many Core Dumped on GPU

Post by azhad »

Issue resolved. FAH GPU workloads are different from my previous project workloads. The bump in voltage from 850mW to 887mW works, with power consumption going from 170W -> 210-220W. Still underclocked but it is stable now.

But I still wonder why you can't have 2 checkpoint saves (say CKpoint48 at 48000 and CKpoint50 at 50000). Error occurs at 51000. Load Ckpoint48 and resume work (as if it got paused). Once Ckpoint50 is reached, check if the work result matches -> if so, resume the work, else dump. My GPU was able to work with another new workunit immediately after the crash - it should be good to resume at the previous checkpoint as well (unless the saves consume exorbitant space on SSD, etc).
Post Reply