BAD_WORK_UNIT project:19600

Moderators: Site Moderators, FAHC Science Team

Post Reply
tvdsluis
Posts: 42
Joined: Mon Apr 24, 2017 11:12 am

BAD_WORK_UNIT project:19600

Post by tvdsluis »

9 out of 10 work units of this project end in a bad work unit.
And i keep getting this project which is a total waste of resources.
Other projects run fine on this RX 580 8 Gb.

Can i somehow opt out of this project.
Or do i need to stop folding for a few weeks/months until this project is finished.
I dont want to keep wasting electricity on this project.

10:16:37:WU02:FS01:0x22:Completed 2160000 out of 3000000 steps (72%)
10:18:09:WU02:FS01:0x22:Completed 2190000 out of 3000000 steps (73%)
10:19:01:WU02:FS01:0x22:An exception occurred at step 2206289: Particle coordinate is nan
10:19:01:WU02:FS01:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
10:19:01:WU02:FS01:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.
10:19:01:WU02:FS01:0x22:Saving result file ..\logfile_01.txt
10:19:01:WU02:FS01:0x22:Saving result file positions.xtc
10:19:01:WU02:FS01:0x22:Saving result file science.log
10:19:01:WU02:FS01:0x22:Saving result file state.xml.bz2
10:19:01:WU02:FS01:0x22:Saving result file xtcAtoms.csv.bz2
10:19:01:WU02:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
10:19:01:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
10:19:01:WU02:FS01:Sending unit results: id:02 state:SEND error:FAULTY project:19600 run:514 clone:1 gen:0 core:0x22 unit:0x000000010000000000004c9000000202
10:19:01:WU02:FS01:Uploading 2.04MiB to 160.45.68.201
10:19:01:WU02:FS01:Connecting to 160.45.68.201:8080
10:19:03:WU01:FS01:Connecting to assign1.foldingathome.org:80
10:19:04:WU01:FS01:Assigned to work server 160.45.68.201
10:19:04:WU01:FS01:Requesting new work unit for slot 01: gpu:38:0 Ellesmere XT [RX 470/480/570/580/590] from 160.45.68.201
Joe_H
Site Admin
Posts: 7937
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: BAD_WORK_UNIT project:19600

Post by Joe_H »

The WU shown in your log was folded successfully by the next person assigned it. If you are only getting failures on WUs from this project, it is possible there is an incompatibility between the version of the folding core used and the AMD driver and OpenCL support.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
tvdsluis
Posts: 42
Joined: Mon Apr 24, 2017 11:12 am

Re: BAD_WORK_UNIT project:19600

Post by tvdsluis »

Joe_H wrote: Tue Sep 05, 2023 5:18 pm The WU shown in your log was folded successfully by the next person assigned it. If you are only getting failures on WUs from this project, it is possible there is an incompatibility between the version of the folding core used and the AMD driver and OpenCL support.
I think that is probably the case, since other projects run fine.
But what is my best solution.
I keep getting this project and maybe once a day i get another project that runs fine.
The rest of the day my systems spends resouces on bad wu's from the project, and i can't opt out.

Is my only option to stop folding until this project is finished, and how can i see this project is finished?
muziqaz
Posts: 946
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: BAD_WORK_UNIT project:19600

Post by muziqaz »

What driver is this on?
FAH Omega tester
yycyyc
Scientist
Posts: 6
Joined: Sun May 21, 2023 3:12 pm

Re: BAD_WORK_UNIT project:19600

Post by yycyyc »

But what is my best solution.
I keep getting this project and maybe once a day i get another project that runs fine.
The rest of the day my systems spends resouces on bad wu's from the project, and i can't opt out.

Is my only option to stop folding until this project is finished, and how can i see this project is finished?
Thanks for your direct feedback! I feel really sorry for the constant failure you have experienced with this project. While we are trying to look into the matter, I just temporarily opt you out from the project until the problem gets solved.

The filter rule might take some time to propagate to the assign server and become valid, and in the meantime jobs already assigned to you might still cause a failure. I beg for your understanding and patience. If the problem does not resolve for you after the current jobs are gone, please don't hesitate to report it here again!

Thank you again for your valuable contribution to the folding community, and we will try our best to reduce this type of compatibility issues.
tvdsluis
Posts: 42
Joined: Mon Apr 24, 2017 11:12 am

Re: BAD_WORK_UNIT project:19600

Post by tvdsluis »

yycyyc wrote: Tue Sep 05, 2023 9:02 pm Thanks for your direct feedback! I feel really sorry for the constant failure you have experienced with this project. While we are trying to look into the matter, I just temporarily opt you out from the project until the problem gets solved.

The filter rule might take some time to propagate to the assign server and become valid, and in the meantime jobs already assigned to you might still cause a failure. I beg for your understanding and patience. If the problem does not resolve for you after the current jobs are gone, please don't hesitate to report it here again!

Thank you again for your valuable contribution to the folding community, and we will try our best to reduce this type of compatibility issues.
Thank you for your fast response. It helps to see this is taken so serious.
I hope this gets resolved for you and for me.
I will continue to fold!
muziqaz
Posts: 946
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: BAD_WORK_UNIT project:19600

Post by muziqaz »

This project has been disabled from all but one AMD species (RDNA3). AMD drivers for GCN have this bug, which blows up the simulations on particular projects. AMD is not willing to spend any time fixing it, as OCL is on its last leg, being replaced by ROCm.
The reason these crashes slip through the testing is holiday season for AMD tester and bad timing in general :)
We hope this won't happen in the future...until next time 😂
FAH Omega tester
Post Reply