Potentially issues for work units for project 124xx

Moderators: Site Moderators, FAHC Science Team

Post Reply
ETA_2025
Posts: 73
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
10 x Raspberry Pi 5 Model B 2GB RAM
10 x Raspberry Pi 4 Model B 2GB RAM
Location: VIC, Australia

Potentially issues for work units for project 124xx

Post by ETA_2025 »

I have 10 Pi 4Bs doing F@H work.

All errors are The total potential energy is nan resulting in WU_STALLED (127 = 0x7f)

Pi 1:
12409 (Run 83, Clone 3, Gen 22) failed with too many errors

Pi 2:
12417 (Run 113, Clone 7, Gen 26) completed with 2 errors
12411 (Run 120, Clone 5, Gen 24) completed without errors

Pi 3:
completed multiple 124xx projects without any errors

Pi 4:
12403 (Run 29, Clone 0, Gen 28) completed with 1 error
12400 (Run 41, Clone 6, Gen 30) completed with 2 errors

Pi 5:
completed multiple 124xx projects without errors before encountering:
12410 (Run 5, Clone 9, Gen 9) completed with 1 error that appears to have stalled the Pi, requiring a reboot.

Pi 7:
completed multiple 124xx projects without any errors

Pi 8:
12419 (Run 121, Clone 8, Gen 27) failed with too many errors
12416 (Run 154, Clone 3, Gen 20) failed with too many errors
12401 (Run 25, Clone 7, Gen 21) completed despite experiencing 9 errors! A reboot mid-project might have helped.
12419 (Run 15, Clone 7, Gen 22) completed with 1 error
12419 (Run 18, Clone 1, Gen 25) failed with too many errors
12400 (Run 80, Clone 1, Gen 23) failed after 10 errors! Reboots mid-project might be delayed the WU failing.
12400 (Run 25, Clone 5, Gen 20) started with 2 errors before completing 1%

While Pi 8 's problem might be due to the machine,I don't understand why a machine that keep failing 124xx projects, is assigned only more of them!
Image
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Potentially issues for work units for project 124xx

Post by toTOW »

How much RAM do you have on these systems ? Are they dedicated to FAH ?

Can you check in system logs if the errors happen at the same time as other events ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
ETA_2025
Posts: 73
Joined: Mon Jan 30, 2023 10:43 am
Hardware configuration: NVIDIA RTX 4070
10 x Raspberry Pi 5 Model B 2GB RAM
10 x Raspberry Pi 4 Model B 2GB RAM
Location: VIC, Australia

Re: Potentially issues for work units for project 124xx

Post by ETA_2025 »

toTOW wrote: Fri Aug 11, 2023 9:29 pm How much RAM do you have on these systems ? Are they dedicated to FAH ?
2GB, and they are headless Pi's, doing nothing but FAH.
Can you check in system logs if the errors happen at the same time as other events ?
I could if I knew what to check, and where to find it.

These Pi's all run 24/7, and normally don't all have problems at the same time. Occasionally one does, but so many having problems, points to the WUs causing the problem. And, they are only having problems whilst processing 124xx WUs.
Image
John_Richard
Posts: 1
Joined: Fri Jun 09, 2023 9:03 am
Hardware configuration: 1 x Raspberry Pi 4 Model B 4GB RAM

Re: Potentially issues for work units for project 124xx

Post by John_Richard »

My 4GB Raspberry Pi4 has just been given WU 12447 (5,9,10). The computer is currently expecting to finish in 6 days but has been given a limit of 3.6 days to complete the task.
There seems to be little point in wasting this time and energy waiting for the inevitable failure of this task, so I would like to dump this WU and get a new assignment.
However, the only two options given in the Web Control interface through the "Stop Folding" button are to "Finish up, then stop" or "Stop now". The first option doesn't address the problem and the second option only pauses the WU processing.
I have searched for instructions on how to "dump" a WU and have found references to people having done this without going into the details.
calxalot
Site Moderator
Posts: 1115
Joined: Sat Dec 08, 2007 1:33 am
Location: San Francisco, CA
Contact:

Re: Potentially issues for work units for project 124xx

Post by calxalot »

Easiest way is to use FAHControl. You may need to run FAHControl from a Windows machine.

Your Pi4 is not fast enough for most work.
Little point in getting another WU.

You can also dump from the command line. You would need to tell us where your client data directory is.
Post Reply