Page 3 of 4

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 4:11 am
by bruce
The problem has been corrected.

Acually, it was only present for a few minutes. I'm actually surprised how many people managed to be assigned one of the defective projects before the error was corrected.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 6:50 am
by Rel25917
With the number of people with idle cards due to the shortage of gpu units I dont find it surprising at all. Off to check my systems to see if they got caught up in this.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 8:14 am
by PedroDuarte
Good morning,

Just for information, same issue here:
Project: 14300 (66,5,0)

Code: Select all

******************************* Date: 2020-04-24 *******************************
03:56:48:WU00:FS02:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
03:56:48:WU00:FS02:Connecting to cores.foldingathome.org:80
03:56:48:WU00:FS02:FahCore 22: Downloading 3.58MiB
03:56:52:WU00:FS02:FahCore 22: Download complete
03:56:52:WU00:FS02:Valid core signature
03:56:52:WARNING:WU00:FS02:FahCore has not changed since last download, aborting core update
03:56:52:WU00:FS02:Starting
03:56:52:WU00:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 705 -lifeline 1718 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
03:56:52:WU00:FS02:Started FahCore on PID 32685
03:56:52:WU00:FS02:Core PID:32689
03:56:52:WU00:FS02:FahCore 0x22 started
03:56:52:WARNING:WU00:FS02:FahCore returned: CORE_OUTDATED (110 = 0x6e)
I will use the solution provided above.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 8:17 am
by Neil-B
You may find that simply pausing all your GPU slots (at the same time if you have multiples) and then unpausing them might allow the client to start the core download properly not that it is on the server … might be worth a try before deleting anything

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 8:35 am
by PedroDuarte
Hi Neil-B,

I try to do it as you said, i pause the GPU and started again and he download the Core_22.fah but still doing the same error return.
Ps.: I try to dump i found out the work folder but inside was only logs not the WO itself.

Code: Select all

08:23:04:WU00:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 705 -lifeline 1718 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
08:23:04:WU00:FS02:Started FahCore on PID 15521
08:23:04:WU00:FS02:Core PID:15525
08:23:04:WU00:FS02:FahCore 0x22 started
08:23:04:WARNING:WU00:FS02:FahCore returned: CORE_OUTDATED (110 = 0x6e)
08:30:14:FS02:Paused
08:32:03:FS02:Unpaused
08:32:03:WU00:FS02:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
08:32:03:WU00:FS02:Connecting to cores.foldingathome.org:80
08:32:03:WU00:FS02:FahCore 22: Downloading 3.58MiB
08:32:08:WU00:FS02:FahCore 22: Download complete
08:32:08:WU00:FS02:Valid core signature
08:32:08:WARNING:WU00:FS02:FahCore has not changed since last download, aborting core update
08:32:08:WU00:FS02:Starting
08:32:08:WU00:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 705 -lifeline 1718 -checkpoint 30 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
08:32:08:WU00:FS02:Started FahCore on PID 17409
08:32:08:WU00:FS02:Core PID:17413
08:32:08:WU00:FS02:FahCore 0x22 started
08:32:09:WU00:FS02:0x22:*********************** Log Started 2020-04-24T08:32:08Z ***********************
08:32:09:WU00:FS02:0x22:*************************** Core22 Folding@home Core ***************************
08:32:09:WU00:FS02:0x22:       Type: 0x22
08:32:09:WU00:FS02:0x22:       Core: Core22
08:32:09:WU00:FS02:0x22:    Website: https://foldingathome.org/
08:32:09:WU00:FS02:0x22:  Copyright: (c) 2009-2018 foldingathome.org
08:32:09:WU00:FS02:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
08:32:09:WU00:FS02:0x22:             <rafal.wiewiora@choderalab.org>
08:32:09:WU00:FS02:0x22:       Args: -dir 00 -suffix 01 -version 705 -lifeline 17409 -checkpoint 30
08:32:09:WU00:FS02:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
08:32:09:WU00:FS02:0x22:             0 -gpu 0
08:32:09:WU00:FS02:0x22:     Config: <none>
08:32:09:WU00:FS02:0x22:************************************ Build *************************************
08:32:09:WU00:FS02:0x22:    Version: 0.0.2
08:32:09:WU00:FS02:0x22:       Date: Dec 6 2019
08:32:09:WU00:FS02:0x22:       Time: 21:20:17
08:32:09:WU00:FS02:0x22: Repository: Git
08:32:09:WU00:FS02:0x22:   Revision: f87d92b58abdf7e6bf2e173cfbc4dc3e837c7042
08:32:09:WU00:FS02:0x22:     Branch: core22
08:32:09:WU00:FS02:0x22:   Compiler: GNU 4.8.2 20140120 (Red Hat 4.8.2-15)
08:32:09:WU00:FS02:0x22:    Options: -std=gnu++98 -O3 -funroll-loops
08:32:09:WU00:FS02:0x22:   Platform: linux2 4.9.87-linuxkit-aufs
08:32:09:WU00:FS02:0x22:       Bits: 64
08:32:09:WU00:FS02:0x22:       Mode: Release
08:32:09:WU00:FS02:0x22:************************************ System ************************************
08:32:09:WU00:FS02:0x22:        CPU: Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
08:32:09:WU00:FS02:0x22:     CPU ID: GenuineIntel Family 6 Model 85 Stepping 7
08:32:09:WU00:FS02:0x22:       CPUs: 16
08:32:09:WU00:FS02:0x22:     Memory: 15.30GiB
08:32:09:WU00:FS02:0x22:Free Memory: 11.25GiB
08:32:09:WU00:FS02:0x22:    Threads: POSIX_THREADS
08:32:09:WU00:FS02:0x22: OS Version: 5.3
08:32:09:WU00:FS02:0x22:Has Battery: false
08:32:09:WU00:FS02:0x22: On Battery: false
08:32:09:WU00:FS02:0x22: UTC Offset: 2
08:32:09:WU00:FS02:0x22:        PID: 17413
08:32:09:WU00:FS02:0x22:        CWD: /var/lib/fahclient/work
08:32:09:WU00:FS02:0x22:         OS: Linux 5.3.0-46-generic x86_64
08:32:09:WU00:FS02:0x22:    OS Arch: AMD64
08:32:09:WU00:FS02:0x22:********************************************************************************
08:32:09:WU00:FS02:0x22:Project: 13400 (Run 66, Clone 5, Gen 0)
08:32:09:WU00:FS02:0x22:Unit: 0x0000000012bc7d9a5ea1be659703a57d
08:32:09:WU00:FS02:0x22:ERROR:110: Need version 0.0.5
08:32:09:WU00:FS02:0x22:Folding@home Core Shutdown: CORE_OUTDATED
08:32:09:WARNING:WU00:FS02:FahCore returned: CORE_OUTDATED (110 = 0x6e)
Can someone help me here please? Linux OS here. Thanks

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 9:44 am
by HaloJones
/var/lib/fahclient/work

you should have at least a directory called 00 or 01.

so pause the client

Code: Select all

sudo rm -rf /var/lib/fahclient/work
re-start the client

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 12:29 pm
by PedroDuarte
Worked, Thanks HaloJones +1 :D

EDIT: I don't know why i received this WU because i'm not listed on beta projects.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 1:48 pm
by Neil-B
This project and another actually moved from beta to advanced last night - so I guess you have the advanced flag set - there is always a risk with advanced albeit not as much as with beta … there was an oversight when the new core wasn't loaded to the server prior to the WUs being released which has caused these errors … Apologies were posted to forum, lessons have been learnt, processes changed to reduce chance of this happening again.

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 2:04 pm
by JohnChodera
@Neil-B is right! This was an oversight on my part that we're correcting with a checklist right now.
Only a few hundred 0.0.5 WUs got out to ADVANCED without the core made available for full FAH. We'll roll 0.0.5 out to full ADVANCED/FAH today to eliminate any other stuck WUs!

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 8:17 pm
by Tuna_Ertemalp
For Windows users:

- Start FAHControl
- PAUSE the entire host
- Under the Status tab, select the GPU slot with the problem WU
- Look at the Work Queue list at the bottom and make a note of the ID of the row that is now automatically selected
- Click on Windows Start menu, launch File Explorer
- In the path bar at the top, enter %AppData%\FAHClient\work
- You should see folders with the WU IDs
- First click on the FAHClient icon in the taskbar and select QUIT to really end folding on all slots
- Now delete the folder under %AppData%\FAHClient\work that corresponds to the problem WU's ID
- Restart your computer
- Start FAHControl
- FOLD the entire host to take it out of the PAUSE state

This is what worked for me on two hosts...

Tuna

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Fri Apr 24, 2020 9:30 pm
by JohnChodera
We've released core22 0.0.5 to full FAH, so this should resolve on its own now!

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Sat Apr 25, 2020 12:16 am
by kc2lrc
John -

I notice I was assigned a project 14300 today on my dual-GTX960 Windows 10 system (client-type advanced). The description of your project suggests this should be Linux-only - is this correct?

Also, this exercises an issue in a multi-GPU system - with the other slot running on core version 0.0.2, the slot with this WU is hung trying to update the core and will not start until about 5 hours from now. With the size of the work unit I doubt it will finish on time.

Finally, unless you are planning to restrict this to higher performance GPUs, I don't think it will complete within the allotted time on some GPUs. It's benchmarked at 256,000 points or so, and my GTX 750Ti Linux machines do about 100,000 per day. It may need more than 1.5 days to complete if this is the case.

Cheers -
Sam

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Sat Apr 25, 2020 12:30 am
by JohnChodera
> I notice I was assigned a project 14300 today on my dual-GTX960 Windows 10 system (client-type advanced). The description of your project suggests this should be Linux-only - is this correct?

I briefly restored `win` assignments after thinking we had corrected the PPD issue, but the PPD divergence on win seemed too large to bridge, so we restricted to linux again.
I'm guessing you were assigned a WU during this period.

> Also, this exercises an issue in a multi-GPU system - with the other slot running on core version 0.0.2, the slot with this WU is hung trying to update the core and will not start until about 5 hours from now. With the size of the work unit I doubt it will finish on time.

Oh no! I think others have reported they were able to pause and resume with the newer core version. There should be no compatibility issues between them.

> Finally, unless you are planning to restrict this to higher performance GPUs, I don't think it will complete within the allotted time on some GPUs. It's benchmarked at 256,000 points or so, and my GTX 750Ti Linux machines do about 100,000 per day. It may need more than 1.5 days to complete if this is the case.

Thanks for the heads-up here. We'll either reduce the number of steps or extend the deadline for future iterations. The tight deadline was due to the extreme time pressure in validating these calculations so we can use them in the COVID Moonshot now that the chemists are starting to switch from hit-finding to lead-optimization.

We really appreciate all the feedback, and this will have to be an iterative process of refining things over the next few weeks as we try to improve these new workflows. We also want to be sure to deliver better molecules for our chemists to test against COVID-19 targets, so we hope you can give us a little temporary leeway as we work out these issues.

Thanks so much for all the help!

~ John Chodera // MSKCC

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Sat Apr 25, 2020 12:59 am
by kc2lrc
No problem John. I can imagine you're under some serious pressure with the way things are going now. I'm proud to be supporting your work.

Good to note on pausing - I'll try that on my other systems if that happens. For what it's worth, the unit wound up getting hosed - seeing as I didn't think it would complete before the timeout, I tried deleting the slot to get it sent back to the server, but it got queued for the running slot. Then when I created the slot again, it downloaded a new unit so I decided to prioritize that one. The unit that went was: Project: 13400 (Run 112, Clone 8, Gen 0).

Good luck with everything!

Cheers -
Sam

Re: Core_22 Problem 14300 Stuck in Core Oudated

Posted: Sat Apr 25, 2020 1:44 am
by Faxon
edit: hey im leaving this up, i didn't realize my post had to be approved by a mod and figured the issue out while it was getting posted by reading the thread more thoroughly. I don't know if the beta team is aware this also happened a day or 2 ago with v0.0.3 on the same WU and wanted to let them know as well. Thanks for the instructions on how to properly delete the WU though, those were mighty useful since as I mentioned i bricked a client with a perfectly good WU on another slot because of it and not knowing how to properly delete the WU (i near followed these instructions on my own trying to figure it out, but it just didn't quite work properly since i guess the client wasnt restarted properly or closed properly

Hey guys, I have the same problem with this project. My slots are all flagged client-type advanced and this isn't the first time it happened (first was when you deployed v0.0.3). Last time this happened I had to reinstall the client to fix it after trying to tinker with it for several hours and breaking my client in the process. I've got all my other slots on finish right now anticipating having to because pausing and playing so it downloads the latest core isn't working. Someone wanna help save this WU before it gets wiped in a reinstall so I can get my 2060 (fastest card in the farm of 8 cards) back up doing work I'm configured to receive? As you can see it's been stuck this way for a couple hours now.

Code: Select all

21:41:57:WU01:FS01:Connecting to 65.254.110.245:80
21:41:57:WU01:FS01:Assigned to work server 18.188.125.154
21:41:57:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:TU106-200A [GeForce RTX 2060] from 18.188.125.154
21:41:57:WU01:FS01:Connecting to 18.188.125.154:8080
21:41:59:WU01:FS01:Downloading 9.85MiB
21:42:02:WU01:FS01:Download complete
21:42:02:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13400 run:14 clone:9 gen:0 core:0x22 unit:0x0000000012bc7d9a5ea12eeb9571a121
21:43:58:WU03:FS01:0x22:Completed 480000 out of 500000 steps (96%)
21:46:18:WU03:FS01:0x22:Completed 485000 out of 500000 steps (97%)
21:48:20:WU03:FS01:0x22:Completed 490000 out of 500000 steps (98%)
21:50:22:WU03:FS01:0x22:Completed 495000 out of 500000 steps (99%)
21:52:24:WU03:FS01:0x22:Completed 500000 out of 500000 steps (100%)
21:52:43:WU03:FS01:0x22:Saving result file ..\logfile_01.txt
21:52:43:WU03:FS01:0x22:Saving result file checkpointState.xml
21:52:56:WU03:FS01:0x22:Saving result file checkpt.crc
21:52:56:WU03:FS01:0x22:Saving result file positions.xtc
21:53:05:WU03:FS01:0x22:Saving result file science.log
21:53:05:WU03:FS01:0x22:Folding@home Core Shutdown: FINISHED_UNIT
21:53:06:WU03:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:53:06:WU03:FS01:Sending unit results: id:03 state:SEND error:NO_ERROR project:14561 run:0 clone:679 gen:3 core:0x22 unit:0x0000000480fccb025e9615c013331069
21:53:06:WU03:FS01:Uploading 66.59MiB to 128.252.203.2
21:53:06:WU03:FS01:Connecting to 128.252.203.2:8080
21:53:07:WU01:FS01:Starting
21:53:07:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "C:\Users\Folding Rig 2\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe" -dir 01 -suffix 01 -version 706 -lifeline 8920 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
21:53:07:WU01:FS01:Started FahCore on PID 8632
21:53:07:WU01:FS01:Core PID:7656
21:53:07:WU01:FS01:FahCore 0x22 started
21:53:07:WU01:FS01:0x22:*********************** Log Started 2020-04-24T21:53:07Z ***********************
21:53:07:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
21:53:07:WU01:FS01:0x22:       Type: 0x22
21:53:07:WU01:FS01:0x22:       Core: Core22
21:53:07:WU01:FS01:0x22:    Website: https://foldingathome.org/
21:53:07:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
21:53:07:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
21:53:07:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
21:53:07:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 8632 -checkpoint 15
21:53:07:WU01:FS01:0x22:             -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device
21:53:07:WU01:FS01:0x22:             0 -gpu 0
21:53:07:WU01:FS01:0x22:     Config: <none>
21:53:07:WU01:FS01:0x22:************************************ Build *************************************
21:53:07:WU01:FS01:0x22:    Version: 0.0.2
21:53:07:WU01:FS01:0x22:       Date: Dec 6 2019
21:53:07:WU01:FS01:0x22:       Time: 21:30:31
21:53:07:WU01:FS01:0x22: Repository: Git
21:53:07:WU01:FS01:0x22:   Revision: abeb39247cc72df5af0f63723edafadb23d5dfbe
21:53:07:WU01:FS01:0x22:     Branch: HEAD
21:53:07:WU01:FS01:0x22:   Compiler: Visual C++ 2008
21:53:07:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
21:53:07:WU01:FS01:0x22:   Platform: win32 10
21:53:07:WU01:FS01:0x22:       Bits: 64
21:53:07:WU01:FS01:0x22:       Mode: Release
21:53:07:WU01:FS01:0x22:************************************ System ************************************
21:53:07:WU01:FS01:0x22:        CPU: AMD Ryzen 5 2600 Six-Core Processor
21:53:07:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
21:53:07:WU01:FS01:0x22:       CPUs: 12
21:53:07:WU01:FS01:0x22:     Memory: 15.95GiB
21:53:07:WU01:FS01:0x22:Free Memory: 8.63GiB
21:53:07:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
21:53:07:WU01:FS01:0x22: OS Version: 6.2
21:53:07:WU01:FS01:0x22:Has Battery: false
21:53:07:WU01:FS01:0x22: On Battery: false
21:53:07:WU01:FS01:0x22: UTC Offset: -7
21:53:07:WU01:FS01:0x22:        PID: 7656
21:53:07:WU01:FS01:0x22:        CWD: C:\Users\Folding Rig 2\AppData\Roaming\FAHClient\work
21:53:07:WU01:FS01:0x22:         OS: Windows 10 Enterprise LTSC 2019
21:53:07:WU01:FS01:0x22:    OS Arch: AMD64
21:53:07:WU01:FS01:0x22:********************************************************************************
21:53:07:WU01:FS01:0x22:Project: 13400 (Run 14, Clone 9, Gen 0)
21:53:07:WU01:FS01:0x22:Unit: 0x0000000012bc7d9a5ea12eeb9571a121
21:53:07:WU01:FS01:0x22:ERROR:110: Need version 0.0.5
21:53:07:WU01:FS01:0x22:Folding@home Core Shutdown: CORE_OUTDATED
21:53:07:WARNING:WU01:FS01:FahCore returned: CORE_OUTDATED (110 = 0x6e)
21:53:12:WU03:FS01:Upload 11.07%
21:53:18:WU03:FS01:Upload 23.93%
21:53:24:WU03:FS01:Upload 36.13%
21:53:30:WU03:FS01:Upload 48.71%
21:53:36:WU03:FS01:Upload 61.57%
21:53:42:WU03:FS01:Upload 74.52%
21:53:48:WU03:FS01:Upload 84.19%
21:53:54:WU03:FS01:Upload 96.48%
21:54:08:WU03:FS01:Upload complete
21:54:08:WU03:FS01:Server responded WORK_ACK (400)
21:54:08:WU03:FS01:Final credit estimate, 167158.00 points
21:54:08:WU03:FS01:Cleaning up
******************************* Date: 2020-04-24 *******************************
01:40:05:FS01:Paused
01:40:10:FS01:Unpaused