Frame time increases from 75 seconds to 90+ minutes for 16722 WU, after restart due to bad allocation.
Posted: Sun Oct 06, 2024 3:04 am
My GPU is doing Project: 16722 (Run 262, Clone 2, Gen 458), which will take more than six days to complete. The Timeout is three days.
It completed 83%, before the bad allocation occurred. Since it restarted, it's taking more than 90 minutes per frame, up from 75 seconds before it restarted!
Rebooting the computer caused the WU to restart, this time with the 75 second frame time.
Can someone explain what happened?
It completed 83%, before the bad allocation occurred. Since it restarted, it's taking more than 90 minutes per frame, up from 75 seconds before it restarted!
Code: Select all
15:01:47:WU01:FS02:0x23:Checkpoint completed at step 2000000
15:03:05:WU01:FS02:0x23:Completed 2025000 out of 2500000 steps (81%)
15:04:23:WU01:FS02:0x23:Completed 2050000 out of 2500000 steps (82%)
15:05:38:WU01:FS02:0x23:Completed 2075000 out of 2500000 steps (83%)
15:06:38:WU01:FS02:0x23:An exception occurred at step 2094343: bad allocation
15:06:38:WU01:FS02:0x23:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
15:06:38:WU01:FS02:0x23:Folding@home Core Shutdown: CORE_RESTART
15:06:38:WARNING:WU01:FS02:FahCore returned: CORE_RESTART (98 = 0x62)
15:06:38:WU01:FS02:Starting
15:06:38:WU01:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" A:\FAHClient\cores/cores.foldingathome.org/openmm-core-23/windows-10-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23.exe -dir 01 -suffix 01 -version 706 -lifeline 13876 -checkpoint 5 -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
15:06:38:WU01:FS02:Started FahCore on PID 491712
15:06:39:WU01:FS02:Core PID:775060
15:06:39:WU01:FS02:FahCore 0x23 started
15:06:39:WU01:FS02:0x23:*********************** Log Started 2024-10-05T15:06:39Z ***********************
15:06:39:WU01:FS02:0x23:*************************** Core23 Folding@home Core ***************************
15:06:39:WU01:FS02:0x23: Core: Core23
15:06:39:WU01:FS02:0x23: Type: 0x23
15:06:39:WU01:FS02:0x23: Version: 8.0.3
15:06:39:WU01:FS02:0x23: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:06:39:WU01:FS02:0x23: Copyright: 2022 foldingathome.org
15:06:39:WU01:FS02:0x23: Homepage: https://foldingathome.org/
15:06:39:WU01:FS02:0x23: Date: Aug 3 2023
15:06:39:WU01:FS02:0x23: Time: 08:39:06
15:06:39:WU01:FS02:0x23: Compiler: Visual C++
15:06:39:WU01:FS02:0x23: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:06:39:WU01:FS02:0x23: -DOPENMM_VERSION="\"8.0.0\""
15:06:39:WU01:FS02:0x23: Platform: win32 10
15:06:39:WU01:FS02:0x23: Bits: 64
15:06:39:WU01:FS02:0x23: Mode: Release
15:06:39:WU01:FS02:0x23:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
15:06:39:WU01:FS02:0x23: <peastman@stanford.edu>
15:06:39:WU01:FS02:0x23: Args: -dir 01 -suffix 01 -version 706 -lifeline 491712 -checkpoint 5
15:06:39:WU01:FS02:0x23: -opencl-platform 1 -opencl-device 0 -cuda-device 0 -gpu-vendor
15:06:39:WU01:FS02:0x23: nvidia -gpu 0 -gpu-usage 100
15:06:39:WU01:FS02:0x23:************************************ libFAH ************************************
15:06:39:WU01:FS02:0x23: Date: Aug 3 2023
15:06:39:WU01:FS02:0x23: Time: 08:37:55
15:06:39:WU01:FS02:0x23: Compiler: Visual C++
15:06:39:WU01:FS02:0x23: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:06:39:WU01:FS02:0x23: Platform: win32 10
15:06:39:WU01:FS02:0x23: Bits: 64
15:06:39:WU01:FS02:0x23: Mode: Release
15:06:39:WU01:FS02:0x23:************************************ CBang *************************************
15:06:39:WU01:FS02:0x23: Version: 1.7.2
15:06:39:WU01:FS02:0x23: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:06:39:WU01:FS02:0x23: Org: Cauldron Development LLC
15:06:39:WU01:FS02:0x23: Copyright: Cauldron Development LLC, 2003-2023
15:06:39:WU01:FS02:0x23: Homepage: https://cauldrondevelopment.com/
15:06:39:WU01:FS02:0x23: License: GPL 2+
15:06:39:WU01:FS02:0x23: Date: Aug 3 2023
15:06:39:WU01:FS02:0x23: Time: 08:37:14
15:06:39:WU01:FS02:0x23: Compiler: Visual C++
15:06:39:WU01:FS02:0x23: Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
15:06:39:WU01:FS02:0x23: Platform: win32 10
15:06:39:WU01:FS02:0x23: Bits: 64
15:06:39:WU01:FS02:0x23: Mode: Release
15:06:39:WU01:FS02:0x23:************************************ System ************************************
15:06:39:WU01:FS02:0x23: CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
15:06:39:WU01:FS02:0x23: CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
15:06:39:WU01:FS02:0x23: CPUs: 8
15:06:39:WU01:FS02:0x23: Memory: 31.70GiB
15:06:39:WU01:FS02:0x23:Free Memory: 8.64GiB
15:06:39:WU01:FS02:0x23: Threads: WINDOWS_THREADS
15:06:39:WU01:FS02:0x23: OS Version: 6.2
15:06:39:WU01:FS02:0x23:Has Battery: false
15:06:39:WU01:FS02:0x23: On Battery: false
15:06:39:WU01:FS02:0x23: UTC Offset: 10
15:06:39:WU01:FS02:0x23: PID: 775060
15:06:39:WU01:FS02:0x23: CWD: A:\FAHClient\work
15:06:39:WU01:FS02:0x23: Exec: A:\FAHClient\cores\cores.foldingathome.org\openmm-core-23\windows-10-64bit\release\0x23-8.0.3\Core_23.fah\FahCore_23.exe
15:06:39:WU01:FS02:0x23:************************************ OpenMM ************************************
15:06:39:WU01:FS02:0x23: Version: 8.0.0
15:06:39:WU01:FS02:0x23:********************************************************************************
15:06:39:WU01:FS02:0x23:Project: 16722 (Run 262, Clone 2, Gen 458)
15:06:39:WU01:FS02:0x23:Digital signatures verified
15:06:39:WU01:FS02:0x23:Folding@home GPU Core23 Folding@home Core
15:06:39:WU01:FS02:0x23:Version 8.0.3
15:06:39:WU01:FS02:0x23: Checkpoint write interval: 100000 steps (4%) [25 total]
15:06:39:WU01:FS02:0x23: JSON viewer frame write interval: 25000 steps (1%) [100 total]
15:06:39:WU01:FS02:0x23: XTC frame write interval: 10000 steps (0.4%) [250 total]
15:06:39:WU01:FS02:0x23: Global context and integrator variables write interval: disabled
15:06:40:WU01:FS02:0x23:There are 4 platforms available.
15:06:40:WU01:FS02:0x23:Platform 0: Reference
15:06:40:WU01:FS02:0x23:Platform 1: CPU
15:06:40:WU01:FS02:0x23:Platform 2: OpenCL
15:06:40:WU01:FS02:0x23: opencl-device 0 specified
15:06:40:WU01:FS02:0x23:Platform 3: CUDA
15:06:40:WU01:FS02:0x23: cuda-device 0 specified
15:07:00:WU01:FS02:0x23:Attempting to create CUDA context:
15:07:00:WU01:FS02:0x23: Configuring platform CUDA
15:07:01:WU01:FS02:0x23:Failed to create CUDA context:
15:07:01:WU01:FS02:0x23:Error initializing FFT: 5
15:07:01:WU01:FS02:0x23:Attempting to create OpenCL context:
15:07:01:WU01:FS02:0x23: Configuring platform OpenCL
15:07:29:WU01:FS02:0x23: Using OpenCL on OpenCL platformId 1 and gpu 0
15:07:29:WU01:FS02:0x23: GPU info: Platform: OpenCL: NVIDIA CUDA
15:07:29:WU01:FS02:0x23: GPU info: PlatformIndex: 0
15:07:29:WU01:FS02:0x23: GPU info: Device: NVIDIA GeForce RTX 4070
15:07:29:WU01:FS02:0x23: GPU info: DeviceIndex: 0
15:07:29:WU01:FS02:0x23: GPU info: Vendor: 0x10de
15:07:29:WU01:FS02:0x23: GPU info: PCI: 01:00:00
15:07:29:WU01:FS02:0x23: GPU info: Compute: 3.0
15:07:29:WU01:FS02:0x23: GPU info: Driver: 561.9
15:07:29:WU01:FS02:0x23: GPU info: GPU: true
15:07:29:WU01:FS02:0x23:Completed 0 out of 2500000 steps (0%)
16:42:33:WU01:FS02:0x23:Completed 25000 out of 2500000 steps (1%)
******************************* Date: 2024-10-05 *******************************
18:15:30:WU01:FS02:0x23:Completed 50000 out of 2500000 steps (2%)
19:51:07:WU01:FS02:0x23:Completed 75000 out of 2500000 steps (3%)
21:24:02:WU01:FS02:0x23:Completed 100000 out of 2500000 steps (4%)
21:24:07:WU01:FS02:0x23:Checkpoint completed at step 100000
22:55:10:WU01:FS02:0x23:Completed 125000 out of 2500000 steps (5%)
******************************* Date: 2024-10-06 *******************************
00:29:38:WU01:FS02:0x23:Completed 150000 out of 2500000 steps (6%)
02:04:24:WU01:FS02:0x23:Completed 175000 out of 2500000 steps (7%)
Can someone explain what happened?