Problem with WU 18806 (19, 5, 167)

Moderators: Site Moderators, FAHC Science Team

Demmers
Posts: 41
Joined: Sat Mar 07, 2020 12:57 pm

Re: Problem with WU 18806 (19, 5, 167)

Post by Demmers »

This project needs to be urgently looked at. Failed on me again, only for my machine to be given another straight after.
Note that FAHClient.exe uses a LOT of memory when running this project.

Code: Select all

07:09:49:I1:WU23:Project: 18806 (Run 45, Clone 34, Gen 265)
07:09:49:I1:WU23:Reading tar file core.xml
07:09:49:I1:WU23:Reading tar file frame265.tpr
07:09:50:I1:WU23:Digital signatures verified
07:09:50:I1:WU23:Calling: mdrun -c frame265.gro -s frame265.tpr -x frame265.xtc -cpt 5 -nt 6 -ntmpi 1 -update cpu -nb cpu -bonded cpu -pme cpu -pmefft cpu
07:09:50:I1:WU23:Steps: first=66250000 total=66500000
07:09:57:I1:WU23:Completed 1 out of 250000 steps (0%)
07:26:27:I1:WU23:Completed 2500 out of 250000 steps (1%)
07:42:58:I1:WU23:Completed 5000 out of 250000 steps (2%)
07:59:31:I1:WU23:Completed 7500 out of 250000 steps (3%)
08:16:26:I1:WU23:Completed 10000 out of 250000 steps (4%)
08:32:55:I1:WU23:Completed 12500 out of 250000 steps (5%)
08:49:26:I1:WU23:Completed 15000 out of 250000 steps (6%)
09:06:20:I1:WU23:Completed 17500 out of 250000 steps (7%)
09:22:53:I1:WU23:Completed 20000 out of 250000 steps (8%)
09:39:24:I1:WU23:Completed 22500 out of 250000 steps (9%)
09:56:22:I1:WU23:Completed 25000 out of 250000 steps (10%)
10:12:51:I1:WU23:Completed 27500 out of 250000 steps (11%)
10:29:21:I1:WU23:Completed 30000 out of 250000 steps (12%)
10:46:15:I1:WU23:Completed 32500 out of 250000 steps (13%)
11:02:47:I1:WU23:Completed 35000 out of 250000 steps (14%)
11:19:20:I1:WU23:Completed 37500 out of 250000 steps (15%)
11:36:14:I1:WU23:Completed 40000 out of 250000 steps (16%)
11:52:45:I1:WU23:Completed 42500 out of 250000 steps (17%)
12:09:16:I1:WU23:Completed 45000 out of 250000 steps (18%)
12:26:10:I1:WU23:Completed 47500 out of 250000 steps (19%)
12:42:40:I1:WU23:Completed 50000 out of 250000 steps (20%)
12:59:13:I1:WU23:Completed 52500 out of 250000 steps (21%)
13:16:18:I1:WU23:Completed 55000 out of 250000 steps (22%)
13:33:19:I1:WU23:Completed 57500 out of 250000 steps (23%)
13:50:48:I1:WU23:Completed 60000 out of 250000 steps (24%)
13:52:36:W :WU23:Detected clock skew (40 secs), I/O delay, laptop hibernation, other slowdown or clock change noted, adjusting time estimates
14:08:09:I1:WU23:Completed 62500 out of 250000 steps (25%)
14:24:44:I1:WU23:Completed 65000 out of 250000 steps (26%)
14:41:40:I1:WU23:Completed 67500 out of 250000 steps (27%)
14:58:10:I1:WU23:Completed 70000 out of 250000 steps (28%)
15:14:42:I1:WU23:Completed 72500 out of 250000 steps (29%)
15:32:01:I1:WU23:Completed 75000 out of 250000 steps (30%)
15:49:24:I1:WU23:Completed 77500 out of 250000 steps (31%)
16:07:05:I1:WU23:Completed 80000 out of 250000 steps (32%)
16:24:39:I1:WU23:Completed 82500 out of 250000 steps (33%)
16:42:15:I1:WU23:Completed 85000 out of 250000 steps (34%)
16:58:32:I1:WU23:Completed 87500 out of 250000 steps (35%)
17:14:50:I1:WU23:Completed 90000 out of 250000 steps (36%)
17:32:48:I1:WU23:Completed 92500 out of 250000 steps (37%)
17:51:27:I1:WU23:Completed 95000 out of 250000 steps (38%)
18:09:42:I1:WU23:Completed 97500 out of 250000 steps (39%)
18:26:38:I1:WU23:Completed 100000 out of 250000 steps (40%)
18:43:07:I1:WU23:Completed 102500 out of 250000 steps (41%)
18:59:36:I1:WU23:Completed 105000 out of 250000 steps (42%)
19:16:30:I1:WU23:Completed 107500 out of 250000 steps (43%)
19:33:01:I1:WU23:Completed 110000 out of 250000 steps (44%)
19:49:33:I1:WU23:Completed 112500 out of 250000 steps (45%)
20:07:23:I1:WU23:Completed 115000 out of 250000 steps (46%)
20:25:52:I1:WU23:Completed 117500 out of 250000 steps (47%)
20:43:18:I1:WU23:Completed 120000 out of 250000 steps (48%)
21:01:11:I1:WU23:Completed 122500 out of 250000 steps (49%)
21:18:13:I1:WU23:Completed 125000 out of 250000 steps (50%)
21:34:48:I1:WU23:Completed 127500 out of 250000 steps (51%)
21:51:43:I1:WU23:Completed 130000 out of 250000 steps (52%)
22:08:13:I1:WU23:Completed 132500 out of 250000 steps (53%)
22:24:48:I1:WU23:Completed 135000 out of 250000 steps (54%)
22:41:44:I1:WU23:Completed 137500 out of 250000 steps (55%)
22:58:13:I1:WU23:Completed 140000 out of 250000 steps (56%)
23:14:44:I1:WU23:Completed 142500 out of 250000 steps (57%)
23:31:40:I1:WU23:Completed 145000 out of 250000 steps (58%)
23:48:12:I1:WU23:Completed 147500 out of 250000 steps (59%)
00:04:40:I1:WU23:Completed 150000 out of 250000 steps (60%)
00:21:34:I1:WU23:Completed 152500 out of 250000 steps (61%)
00:38:05:I1:WU23:Completed 155000 out of 250000 steps (62%)
00:54:37:I1:WU23:Completed 157500 out of 250000 steps (63%)
01:11:33:I1:WU23:Completed 160000 out of 250000 steps (64%)
01:24:06:I1:Account websocket closed: PROTOCOL msg=
01:24:06:W :WU23:Detected clock skew (2 mins 18 secs), I/O delay, laptop hibernation, other slowdown or clock change noted, adjusting time estimates
01:24:06:I1:OUT87:> GET https://api.foldingathome.org/machine/bmI9bFj5zbsJLJ7qJu77jbLFSXT_4zJscQZBjXsQJbQ HTTP/1.1
01:24:07:I1:OUT87:< HTTP/1.1 200 HTTP_OK
01:24:07:I1:OUT2:> GET wss://node1.foldingathome.org/ws/client HTTP/1.1
01:24:08:I1:OUT2:< HTTP/1.1 101 HTTP_SWITCHING_PROTOCOLS
01:24:08:I1:Logging into node account
01:28:08:I1:WU23:Completed 162500 out of 250000 steps (65%)
01:44:40:I1:WU23:Completed 165000 out of 250000 steps (66%)
02:01:36:I1:WU23:Completed 167500 out of 250000 steps (67%)
02:18:11:I1:WU23:Completed 170000 out of 250000 steps (68%)
02:34:45:I1:WU23:Completed 172500 out of 250000 steps (69%)
02:51:42:I1:WU23:Completed 175000 out of 250000 steps (70%)
03:08:13:I1:WU23:Completed 177500 out of 250000 steps (71%)
03:24:44:I1:WU23:Completed 180000 out of 250000 steps (72%)
03:41:40:I1:WU23:Completed 182500 out of 250000 steps (73%)
03:58:09:I1:WU23:Completed 185000 out of 250000 steps (74%)
04:14:41:I1:WU23:Completed 187500 out of 250000 steps (75%)
04:31:41:I1:WU23:Completed 190000 out of 250000 steps (76%)
04:48:11:I1:WU23:Completed 192500 out of 250000 steps (77%)
05:04:39:I1:WU23:Completed 195000 out of 250000 steps (78%)
05:21:34:I1:WU23:Completed 197500 out of 250000 steps (79%)
05:38:05:I1:WU23:Completed 200000 out of 250000 steps (80%)
05:54:37:I1:WU23:Completed 202500 out of 250000 steps (81%)
06:11:34:I1:WU23:Completed 205000 out of 250000 steps (82%)
06:28:05:I1:WU23:Completed 207500 out of 250000 steps (83%)
06:44:32:I1:WU23:Completed 210000 out of 250000 steps (84%)
07:01:30:I1:WU23:Completed 212500 out of 250000 steps (85%)
07:18:01:I1:WU23:Completed 215000 out of 250000 steps (86%)
07:34:33:I1:WU23:Completed 217500 out of 250000 steps (87%)
07:51:27:I1:WU23:Completed 220000 out of 250000 steps (88%)
08:07:59:I1:WU23:Completed 222500 out of 250000 steps (89%)
08:08:14:E :WU23:std::exception: bad allocation
08:08:14:E :WU23:std::exception: bad allocation
08:08:15:E :WU23:std::exception: bad allocation
08:08:16:E :WU23:std::exception: bad allocation
08:08:17:E :WU23:std::exception: bad allocation
08:08:19:I1:WU23:ERROR:exception: bad allocation
08:08:19:I1:WU23:Saving result file ..\logfile_01.txt
08:08:19:I1:WU23:Saving result file frame265.xtc
08:08:19:I1:WU23:Saving result file md.log
08:08:19:I1:WU23:Saving result file science.log
08:08:19:I1:WU23:Saving result file state.cpt
08:08:19:I1:WU23:Folding@home Core Shutdown: BAD_WORK_UNIT
08:08:21:W :WU23:Core returned UNKNOWN_ENUM (3221226505)
08:08:21:I1:Default:Added new work unit: cpus:6 gpus:
08:08:21:I1:WU23:Uploading WU results
08:08:22:I1:WU24:Requesting WU assignment for user Demmers team 76140
08:08:22:I1:OUT89:> POST https://assign2.foldingathome.org/api/assign HTTP/1.1
08:08:22:I1:OUT88:> POST https://fahserver1.flatironinstitute.org/api/results HTTP/1.1
08:08:22:I1:OUT89:< HTTP/1.1 200 HTTP_OK
08:08:22:I1:WU24:Received WU assignment m4HuzIHm-WR8ve08Dy67g_Ez6nUFpS6QdYe06xjJiXg
08:08:22:I1:WU24:Downloading WU
08:08:23:I1:OUT90:> POST https://fahserver1.flatironinstitute.org/api/assign HTTP/1.1
08:08:25:I1:OUT90:< HTTP/1.1 200 HTTP_OK
08:08:26:I1:WU24:Received WU P18806 R51 C30 G285
08:08:26:I3:WU24:Running FahCore: C:\ProgramData\FAHClient\cores/gromacs-core-a9/windows-10-64bit/cpu-avx2_256-release/fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12/FahCore_a9.exe -dir m4HuzIHm-WR8ve08Dy67g_Ez6nUFpS6QdYe06xjJiXg -suffix 01 -version 8.3.18 -lifeline 9008 -np 6
08:08:26:I3:WU24:Started FahCore on PID 1760
08:08:26:I1:WU24:*********************** Log Started 2024-11-06T08:08:26Z ***********************
08:08:26:I1:WU24:************************** Gromacs Folding@home Core ***************************
08:08:26:I1:WU24: Core: Gromacs
08:08:26:I1:WU24: Type: 0xa9
08:08:26:I1:WU24: Version: 0.0.12
08:08:26:I1:WU24: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
08:08:26:I1:WU24: Copyright: 2022 foldingathome.org
08:08:26:I1:WU24: Homepage: https://foldingathome.org/
08:08:26:I1:WU24: Date: Nov 15 2022
08:08:26:I1:WU24: Time: 13:31:08
08:08:26:I1:WU24: Compiler: Visual C++
08:08:26:I1:WU24: Options: /TP /std:c++17 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
08:08:26:I1:WU24: Platform: win32 10
08:08:26:I1:WU24: Bits: 64
08:08:26:I1:WU24: Mode: Release
08:08:26:I1:WU24: SIMD: avx2_256
08:08:26:I1:WU24: OpenMP: ON
08:08:26:I1:WU24: CUDA: OFF
08:08:26:I1:WU24: OpenCL: OFF
08:08:26:I1:WU24: Args: -dir m4HuzIHm-WR8ve08Dy67g_Ez6nUFpS6QdYe06xjJiXg -suffix 01
08:08:26:I1:WU24: -version 8.3.18 -lifeline 9008 -np 6
08:08:26:I1:WU24:************************************ libFAH ************************************
08:08:26:I1:WU24: Date: Nov 15 2022
08:08:26:I1:WU24: Time: 13:30:33
08:08:26:I1:WU24: Compiler: Visual C++
08:08:26:I1:WU24: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
08:08:26:I1:WU24: Platform: win32 10
08:08:26:I1:WU24: Bits: 64
08:08:26:I1:WU24: Mode: Release
08:08:26:I1:WU24:************************************ CBang *************************************
08:08:26:I1:WU24: Date: Nov 15 2022
08:08:26:I1:WU24: Time: 13:29:57
08:08:26:I1:WU24: Compiler: Visual C++
08:08:26:I1:WU24: Options: /TP /std:c++14 /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
08:08:26:I1:WU24: Platform: win32 10
08:08:26:I1:WU24: Bits: 64
08:08:26:I1:WU24: Mode: Release
08:08:26:I1:WU24:************************************ System ************************************
08:08:26:I1:WU24: CPU: AMD Ryzen 5 3400G with Radeon Vega Graphics
08:08:26:I1:WU24: CPU ID: AuthenticAMD Family 23 Model 24 Stepping 1
08:08:26:I1:WU24: CPUs: 8
08:08:26:I1:WU24: Memory: 13.92GiB
08:08:26:I1:WU24:Free Memory: 3.11GiB
08:08:26:I1:WU24: Threads: WINDOWS_THREADS
08:08:26:I1:WU24: OS Version: 6.2
08:08:26:I1:WU24:Has Battery: false
08:08:26:I1:WU24: On Battery: false
08:08:26:I1:WU24: UTC Offset: 0
08:08:26:I1:WU24: PID: 1760
08:08:26:I1:WU24: CWD: C:\ProgramData\FAHClient\work
08:08:26:I1:WU24: Exec: C:\ProgramData\FAHClient\cores\gromacs-core-a9\windows-10-64bit\cpu-avx2_256-release\fahcore-a9-windows-10-64bit-cpu-avx2_256-release-0.0.12\FahCore_a9.exe
08:08:26:I1:WU24:********************************************************************************
08:08:26:I1:WU24:Project: 18806 (Run 51, Clone 30, Gen 285)
08:08:26:I1:WU24:Reading tar file core.xml
08:08:26:I1:WU24:Reading tar file frame285.tpr
08:08:26:I1:WU24:Digital signatures verified
08:08:26:I1:WU24:Calling: mdrun -c frame285.gro -s frame285.tpr -x frame285.xtc -cpt 5 -nt 6 -ntmpi 1 -update cpu -nb cpu -bonded cpu -pme cpu -pmefft cpu
08:08:27:I1:WU24:Steps: first=71250000 total=71500000
08:08:28:I1:OUT88:< HTTP/1.1 200 HTTP_OK
08:08:28:I1:WU23:Credited
08:09:22:I1:WU24:Completed 1 out of 250000 steps (0%)
08:09:22:W :WU24:Detected clock skew (54 secs), I/O delay, laptop hibernation, other slowdown or clock change noted, adjusting time estimates
08:25:13:I1:WU24:Completed 2500 out of 250000 steps (1%)
08:41:47:I1:WU24:Completed 5000 out of 250000 steps (2%)
08:58:19:I1:WU24:Completed 7500 out of 250000 steps (3%)
09:15:17:I1:WU24:Completed 10000 out of 250000 steps (4%)
09:31:48:I1:WU24:Completed 12500 out of 250000 steps (5%)
09:48:22:I1:WU24:Completed 15000 out of 250000 steps (6%)
10:05:20:I1:WU24:Completed 17500 out of 250000 steps (7%)
10:21:54:I1:WU24:Completed 20000 out of 250000 steps (8%)
10:38:27:I1:WU24:Completed 22500 out of 250000 steps (9%)
10:55:24:I1:WU24:Completed 25000 out of 250000 steps (10%)
11:11:54:I1:WU24:Completed 27500 out of 250000 steps (11%)
11:28:29:I1:WU24:Completed 30000 out of 250000 steps (12%) 
muziqaz
Posts: 946
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Problem with WU 18806 (19, 5, 167)

Post by muziqaz »

FAHClient.exe memory usage is not related to this project.
But the reason it crashed might be because you ran out of memory allocation
FAH Omega tester
Demmers
Posts: 41
Joined: Sat Mar 07, 2020 12:57 pm

Re: Problem with WU 18806 (19, 5, 167)

Post by Demmers »

muziqaz wrote: Wed Nov 06, 2024 12:05 pm FAHClient.exe memory usage is not related to this project.
But the reason it crashed might be because you ran out of memory allocation
My machine has 16GB of memory, and this WU is the only one where this happens, so there must be some link surely? I see it's the biggest CPU project there is in terms of atom count, so are we saying 16GB isn't enough? Note that due to using integrated graphics, 2GB are allocated to the igpu, so technically I have 14GB at foldings disposal (minus Windows/other apps etc).
muziqaz
Posts: 946
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Problem with WU 18806 (19, 5, 167)

Post by muziqaz »

Project is being simulated by fahcore_a9.exe and not FAHClient.exe.
As I said there is memory leak with FAHClient, but it is relandom and quite rare and Windows only. System reboot usually sort it out.
Usually 16GB is plenty, but depending on severity of memory leak, it can consume all of it.
18806 is a CPU project, and it is very stable at that, so if it is failing, then there might be system instability at play, or just that memory leak.
FAH Omega tester
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Problem with WU 18806 (19, 5, 167)

Post by Joe_H »

I will have to check this further on my systems as I was running an older release of the v8 beta client. I also saw a memory leak on macOS, that was running v8.3.14 Intel and fah-client had grown to 3.3 GB over a few weeks. As that system has 48 GB of RAM it barely noticed the memory usage.

On Windows were you running with a defined swap space or leaving it to the OS to expand?
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Demmers
Posts: 41
Joined: Sat Mar 07, 2020 12:57 pm

Re: Problem with WU 18806 (19, 5, 167)

Post by Demmers »

Joe_H wrote: Wed Nov 06, 2024 4:52 pm I will have to check this further on my systems as I was running an older release of the v8 beta client. I also saw a memory leak on macOS, that was running v8.3.14 Intel and fah-client had grown to 3.3 GB over a few weeks. As that system has 48 GB of RAM it barely noticed the memory usage.

On Windows were you running with a defined swap space or leaving it to the OS to expand?
Ah, that has reminded me of something. For a completely unrelated reason (that I have forgotten, think just for performance testing), I manually set the paging file size to 0!

https://imgur.com/lKDhluB

I've just changed it to System Managed for the time being. Any advice here then?

https://imgur.com/a/kMDy2VX
muziqaz
Posts: 946
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Problem with WU 18806 (19, 5, 167)

Post by muziqaz »

Yes, don't run withour page file, regardless if you have infinite RAM. Windows does not like it, and nearly every app assumes there is page file in your system :)

ok, mystery solved
Last edited by muziqaz on Thu Nov 07, 2024 5:25 pm, edited 1 time in total.
FAH Omega tester
Demmers
Posts: 41
Joined: Sat Mar 07, 2020 12:57 pm

Re: Problem with WU 18806 (19, 5, 167)

Post by Demmers »

muziqaz wrote: Wed Nov 06, 2024 8:10 pm Yes, don't run withour page file, regardless if you have infinite RAM. Windows dows not like it, and nearly every app assumes there is page file in your system :)

ok, mystery solved
Mystery indeed solved (at least for me).... Last WU went by without a hitch!
Sorry for the run-around! :lol:
Post Reply