Weird "exception: bad allocation" behaviour

Moderators: Site Moderators, FAHC Science Team

Post Reply
Schrödinger's cat
Posts: 142
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4080 Super
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Weird "exception: bad allocation" behaviour

Post by Schrödinger's cat »

My GPU has had a least a couple of these problems:

Code: Select all

04:35:08:WU00:FS00:0x27:Completed 0 out of 250000 steps (0%)
04:35:13:WU00:FS00:0x27:Checkpoint completed at step 0
04:35:24:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:35:48:WU00:FS00:0x27:Completed 2500 out of 250000 steps (1%)
04:36:08:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:36:26:WU00:FS00:0x27:Completed 5000 out of 250000 steps (2%)
04:36:46:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:37:01:WU00:FS00:0x27:Completed 7500 out of 250000 steps (3%)
04:37:35:WU00:FS00:0x27:Completed 10000 out of 250000 steps (4%)
04:37:56:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:38:09:WU00:FS00:0x27:Completed 12500 out of 250000 steps (5%)
04:38:16:WU00:FS00:0x27:Checkpoint completed at step 12500
04:38:50:WU00:FS00:0x27:Completed 15000 out of 250000 steps (6%)
04:39:24:WU00:FS00:0x27:Completed 17500 out of 250000 steps (7%)
04:39:40:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:39:58:WU00:FS00:0x27:Completed 20000 out of 250000 steps (8%)
04:40:14:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:40:32:WU00:FS00:0x27:Completed 22500 out of 250000 steps (9%)
04:40:48:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:41:06:WU00:FS00:0x27:Completed 25000 out of 250000 steps (10%)
04:41:14:WU00:FS00:0x27:Checkpoint completed at step 25000
04:41:34:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:41:48:WU00:FS00:0x27:Completed 27500 out of 250000 steps (11%)
04:42:10:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:42:23:WU00:FS00:0x27:Completed 30000 out of 250000 steps (12%)
04:42:57:WU00:FS00:0x27:Completed 32500 out of 250000 steps (13%)
04:43:32:WU00:FS00:0x27:Completed 35000 out of 250000 steps (14%)
04:44:06:WU00:FS00:0x27:Completed 37500 out of 250000 steps (15%)
04:44:13:WU00:FS00:0x27:Checkpoint completed at step 37500
04:44:48:WU00:FS00:0x27:Completed 40000 out of 250000 steps (16%)
04:45:22:WU00:FS00:0x27:Completed 42500 out of 250000 steps (17%)
04:45:39:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:45:57:WU00:FS00:0x27:Completed 45000 out of 250000 steps (18%)
04:46:14:ERROR:std::exception: bad allocation
04:46:21:ERROR:WU00:FS00:std::exception: bad allocation
04:46:23:ERROR:Receive error: 10053: An established connection was aborted by the software in your host machine.
04:46:30:WU00:FS00:0x27:Completed 47500 out of 250000 steps (19%)
Note that the problem just disappeared after 19%.

What I find confusing is that folding wasn't restarted after the bad allocation exception. Does anyone understand what's going on? Could it be related to RAM, as Folding@home data is saved to a RAM disk.
Image
muziqaz
Posts: 2301
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580
Location: London
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by muziqaz »

RAM disk?
Please use conventional storage devices for FAH.
FAH Omega tester
Image
Schrödinger's cat
Posts: 142
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4080 Super
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Weird "exception: bad allocation" behaviour

Post by Schrödinger's cat »

muziqaz wrote: Wed Dec 17, 2025 8:33 am RAM disk?
Please use conventional storage devices for FAH.
I've been using a RAM disk for years without any issues.

The only thing that has changed is my GPU, which can produce viewerFrame.json files that are bigger than 32MB, for which I increased the size of the RAM disk, and Dynamic Memory Allocation, which only uses as much RAM as the data on the disk requires, at any time. The RAM disk data is saved to my SSD before rebooting or shutting down.

I have only discovered the error twice, and it appears to be random.
Image
muziqaz
Posts: 2301
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580
Location: London
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by muziqaz »

Schrödinger's cat wrote: Wed Dec 17, 2025 1:17 pm
muziqaz wrote: Wed Dec 17, 2025 8:33 am RAM disk?
Please use conventional storage devices for FAH.
I've been using a RAM disk for years without any issues.

The only thing that has changed is my GPU, which can produce viewerFrame.json files that are bigger than 32MB, for which I increased the size of the RAM disk, and Dynamic Memory Allocation, which only uses as much RAM as the data on the disk requires, at any time. The RAM disk data is saved to my SSD before rebooting or shutting down.

I have only discovered the error twice, and it appears to be random.
Years or decades, it doesn't matter. You are still using volatile medium as your storage in scientific application, which requires total stability. The fact that you had to increase RAM disk size because of fah visualisation tells me you are running it at bleeding edge, of "just about ok".
FAH Omega tester
Image
Schrödinger's cat
Posts: 142
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4080 Super
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Weird "exception: bad allocation" behaviour

Post by Schrödinger's cat »

muziqaz wrote: Wed Dec 17, 2025 4:49 pm The fact that you had to increase RAM disk size because of fah visualisation tells me you are running it at bleeding edge, of "just about ok".
Hardly. I've seen viewerFrame.json files of 11 and 32 KB too.
Image
PaulTV
Posts: 238
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: Weird "exception: bad allocation" behaviour

Post by PaulTV »

I see the bad allocation messages frequently, but only on jobs for project 18261. Which may or may not be related. To be sure I double checked my SSD health, and it is 100% fine.
Image

Ryzen 9800X3D / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 22.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
muziqaz
Posts: 2301
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580
Location: London
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by muziqaz »

PaulTV wrote: Thu Dec 18, 2025 9:53 am I see the bad allocation messages frequently, but only on jobs for project 18261. Which may or may not be related. To be sure I double checked my SSD health, and it is 100% fine.
18261 requires 10GB of free RAM to get a request for it. Once received, if free RAM dips below required, you might get issues. This also reminds me, page file allocation is also important. Anyone who has it restricted to some specific value might run into issues
FAH Omega tester
Image
PaulTV
Posts: 238
Joined: Mon Jan 25, 2021 4:53 pm
Location: Netherlands

Re: Weird "exception: bad allocation" behaviour

Post by PaulTV »

muziqaz wrote: Thu Dec 18, 2025 10:59 am
PaulTV wrote: Thu Dec 18, 2025 9:53 am I see the bad allocation messages frequently, but only on jobs for project 18261. Which may or may not be related. To be sure I double checked my SSD health, and it is 100% fine.
18261 requires 10GB of free RAM to get a request for it. Once received, if free RAM dips below required, you might get issues. This also reminds me, page file allocation is also important. Anyone who has it restricted to some specific value might run into issues
I see those messages (also) at times when I'm peacefully asleep, and nothing else is running on the machine. Windows 11, 32 GB, 4 TB hard disk with plenty of free space and no special swap config. As only jobs for this project give those messages, and I don't see other strange behaviour, I don't expect it to be a hardware issue (like bad memory). It might be a glitch somewhere, and bad allocation messages are on my to ignore list... But given this thread was started because of those bad allocation messages, I chipped in :)
Image

Ryzen 9800X3D / RTX 4090 / Windows 11
Ryzen 5600X / RTX 3070 Ti / Ubuntu 22.04
Ryzen 5600 / RTX 3060 Ti / Windows 11
muziqaz
Posts: 2301
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580
Location: London
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by muziqaz »

Oh, this is v7 client. I think more and more you will start seeing random things stop behaving like normal, since the client is no longer actively developed and supported. Infrastructure is being fine tuned for v8 client, and if things work on v7, they work, but if they don't there will not going to be any effort put in to fix it.
FAH Omega tester
Image
Schrödinger's cat
Posts: 142
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4080 Super
20 x Raspberry Pi 5 Model B 2GB RAM
Location: VIC, Australia

Re: Weird "exception: bad allocation" behaviour

Post by Schrödinger's cat »

I can confirm that only 18261 work units cause this, and do so once every work unit:

Code: Select all

ERROR:std::exception: bad allocation
ERROR:WU00:FS00:std::exception: bad allocation
Project: 18261 (Run 155, Clone 2, Gen 22) is currently running and creating 31.7MB viewerFrame.json files, and using 11GB of RAM.

I've also seen a work unit use more than 15GB of RAM when first starting. I'm unsure if that was an 18261 work unit.
Last edited by Schrödinger's cat on Sun Dec 21, 2025 12:26 pm, edited 1 time in total.
Image
muziqaz
Posts: 2301
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 9950x3d, 7950x3d, 5950x, 5800x3d
7900xtx, RX9070, Radeon 7, 5700xt, 6900xt, RX550, Intel B580
Location: London
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by muziqaz »

Schrödinger's cat wrote: Sat Dec 20, 2025 3:48 pm I can confirm that only 18261 work units cause this, and do so every time:

Code: Select all

ERROR:std::exception: bad allocation
ERROR:WU00:FS00:std::exception: bad allocation
Project: 18261 (Run 155, Clone 2, Gen 22) is currently running and creating 31.7MB viewerFrame.json files, and using 11GB of RAM.

I've also seen a work unit use more than 15GB of RAM when first starting. I'm unsure if that was an 18261 work unit.
Yes, when WUs are starting they peak RAM usage. Then, when finishing a WU, they kinda come close to the peak.

v8.5.5 curbs viewerFrame.json file sizes, too. This is to prevent memory leak in the client
FAH Omega tester
Image
toTOW
Site Moderator
Posts: 6520
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Weird "exception: bad allocation" behaviour

Post by toTOW »

Code: Select all

04:46:14:ERROR:std::exception: bad allocation
04:46:21:ERROR:WU00:FS00:std::exception: bad allocation
This usually means the there's not enough system RAM and/or virtual memory available.
It could also be memory leaks from the client which affect biggest projects like 18261.

Those leaks are fixed in v8.5 clients.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply