Page 2 of 4

Re: WU 13456

Posted: Fri Jul 30, 2021 6:11 pm
by Neil-B
@JohnChodera ... Bit of feedback from Discord in case you haven't seen it ...

"firedfly — Today at 18:22
I have an nvidia 1660 running on Windows 10 that is completing p13456 via cuda without issue. I won't have time to post on the forums until tomorrow night, so feel free to report that if you want"

This supports your RTX30*0 hypothesis

Re: WU 13456

Posted: Fri Jul 30, 2021 8:40 pm
by debs3759
It ran the one I got last night with no problems. I have Visual C++ from Visual Studio 2019 Community installed, but not VC++ 2015 unless one of my other apps installed the files (which does seem very possible, as I have several optimisation and enhancement type apps, although they are all up to date).

Re: WU 13456

Posted: Fri Jul 30, 2021 8:45 pm
by Neil-B
@debs3759 ... what gpu? ... and do your logs show it running opencl or cuda?

Re: WU 13456

Posted: Fri Jul 30, 2021 10:12 pm
by debs3759
GTX 1060 3GB, CUDA, latest drivers.

Re: WU 13456

Posted: Sat Jul 31, 2021 2:11 am
by JohnChodera
@Neil-B: Now, that's odd. core22 0.0.13 was built with CUDA 9.2 and OpenMM 7.4.2.
core22 0.0.14 was built with CUDA 10.1 and OpenMM 7.5.1.

I definitely don't understand why CUDA 9.2 would work but 10.1 wouldn't.
If you have the time, perhaps we could invite you to the testing slack so we could run the next build by you next week?
If so, just private message me an email address we can use to send an invite.

~ John Chodera // MSKCC

Re: WU 13456

Posted: Sat Jul 31, 2021 3:53 am
by WhitehawkEQ
Don't forget, my GTX 1080 cards fail with a popup error missing .dll's on 13456 core22 0.0.13 (all 10 cards) and my RTX 2080 cards run 13456 0.0.14 just fine (2 cards)

Re: WU 13456

Posted: Sat Jul 31, 2021 4:38 am
by JohnChodera
@WhitehawkEQ: The missing DLL error occurs on 0.0.13? Which DLLs does it complain are missing?

~ John Chodera // MSKCC

Re: WU 13456

Posted: Sat Jul 31, 2021 6:11 am
by Tashgan
The same here, for the Version 22.0.0.13 cuda is working with my RTX 3070 Ti. The Problem with the wrong GPU architecture appears with Version 22.0.0.14.

Below the log for a WU using 22.0.0.13 and cuda:

Code: Select all

...
04:49:58:WU01:FS01:Connecting to assign1.foldingathome.org:80
04:49:59:WU01:FS01:Assigned to work server 140.163.4.200
04:49:59:WU01:FS01:Requesting new work unit for slot 01: gpu:1:0 GA104 [GeForce RTX 3070 Ti] from 140.163.4.200
04:49:59:WU01:FS01:Connecting to 140.163.4.200:8080
04:50:00:WU01:FS01:Downloading 7.45MiB
04:50:03:WU01:FS01:Download complete
04:50:03:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17601 run:128 clone:0 gen:82 core:0x22 unit:0x0000000000000052000044c100000080
04:51:21:WU01:FS01:Starting
04:51:21:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Markus\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 12852 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
04:51:21:WU01:FS01:Started FahCore on PID 13480
04:51:21:WU01:FS01:Core PID:15972
04:51:21:WU01:FS01:FahCore 0x22 started
04:51:21:WU01:FS01:0x22:*********************** Log Started 2021-07-31T04:51:21Z ***********************
04:51:21:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
04:51:21:WU01:FS01:0x22:       Core: Core22
04:51:21:WU01:FS01:0x22:       Type: 0x22
04:51:21:WU01:FS01:0x22:    Version: 0.0.13
04:51:21:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:51:21:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
04:51:21:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
04:51:21:WU01:FS01:0x22:       Date: Sep 19 2020
04:51:21:WU01:FS01:0x22:       Time: 02:35:58
04:51:21:WU01:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
04:51:21:WU01:FS01:0x22:     Branch: core22-0.0.13
04:51:21:WU01:FS01:0x22:   Compiler: Visual C++ 2015
04:51:21:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
04:51:21:WU01:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
04:51:21:WU01:FS01:0x22:   Platform: win32 10
04:51:21:WU01:FS01:0x22:       Bits: 64
04:51:21:WU01:FS01:0x22:       Mode: Release
04:51:21:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
04:51:21:WU01:FS01:0x22:             <peastman@stanford.edu>
04:51:21:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 13480 -checkpoint 15
04:51:21:WU01:FS01:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
04:51:21:WU01:FS01:0x22:             nvidia -gpu 0 -gpu-usage 100
04:51:21:WU01:FS01:0x22:************************************ libFAH ************************************
04:51:21:WU01:FS01:0x22:       Date: Sep 7 2020
04:51:21:WU01:FS01:0x22:       Time: 19:09:56
04:51:21:WU01:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
04:51:21:WU01:FS01:0x22:     Branch: HEAD
04:51:21:WU01:FS01:0x22:   Compiler: Visual C++ 2015
04:51:21:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
04:51:21:WU01:FS01:0x22:   Platform: win32 10
04:51:21:WU01:FS01:0x22:       Bits: 64
04:51:22:WU01:FS01:0x22:       Mode: Release
04:51:22:WU01:FS01:0x22:************************************ CBang *************************************
04:51:22:WU01:FS01:0x22:       Date: Sep 7 2020
04:51:22:WU01:FS01:0x22:       Time: 19:08:30
04:51:22:WU01:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
04:51:22:WU01:FS01:0x22:     Branch: HEAD
04:51:22:WU01:FS01:0x22:   Compiler: Visual C++ 2015
04:51:22:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
04:51:22:WU01:FS01:0x22:   Platform: win32 10
04:51:22:WU01:FS01:0x22:       Bits: 64
04:51:22:WU01:FS01:0x22:       Mode: Release
04:51:22:WU01:FS01:0x22:************************************ System ************************************
04:51:22:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
04:51:22:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 165 Stepping 5
04:51:22:WU01:FS01:0x22:       CPUs: 12
04:51:22:WU01:FS01:0x22:     Memory: 31.90GiB
04:51:22:WU01:FS01:0x22:Free Memory: 25.85GiB
04:51:22:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
04:51:22:WU01:FS01:0x22: OS Version: 6.2
04:51:22:WU01:FS01:0x22:Has Battery: false
04:51:22:WU01:FS01:0x22: On Battery: false
04:51:22:WU01:FS01:0x22: UTC Offset: 2
04:51:22:WU01:FS01:0x22:        PID: 15972
04:51:22:WU01:FS01:0x22:        CWD: C:\Users\Markus\AppData\Roaming\FAHClient\work
04:51:22:WU01:FS01:0x22:************************************ OpenMM ************************************
04:51:22:WU01:FS01:0x22:   Revision: 189320d0
04:51:22:WU01:FS01:0x22:********************************************************************************
04:51:22:WU01:FS01:0x22:Project: 17601 (Run 128, Clone 0, Gen 82)
04:51:22:WU01:FS01:0x22:Unit: 0x00000000000000000000000000000000
04:51:22:WU01:FS01:0x22:Reading tar file core.xml
04:51:22:WU01:FS01:0x22:Reading tar file integrator.xml.bz2
04:51:22:WU01:FS01:0x22:Reading tar file state.xml.bz2
04:51:22:WU01:FS01:0x22:Reading tar file system.xml.bz2
04:51:22:WU01:FS01:0x22:Digital signatures verified
04:51:22:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
04:51:22:WU01:FS01:0x22:Version 0.0.13
04:51:22:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (2%) [50 total]
04:51:22:WU01:FS01:0x22:  JSON viewer frame write interval: 25000 steps (1%) [100 total]
04:51:22:WU01:FS01:0x22:  XTC frame write interval: 12500 steps (0.5%) [200 total]
04:51:22:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
04:51:22:WU01:FS01:0x22:There are 4 platforms available.
04:51:22:WU01:FS01:0x22:Platform 0: Reference
04:51:22:WU01:FS01:0x22:Platform 1: CPU
04:51:22:WU01:FS01:0x22:Platform 2: OpenCL
04:51:22:WU01:FS01:0x22:  opencl-device 0 specified
04:51:22:WU01:FS01:0x22:Platform 3: CUDA
04:51:22:WU01:FS01:0x22:  cuda-device 0 specified
04:51:26:WU01:FS01:0x22:Attempting to create CUDA context:
04:51:26:WU01:FS01:0x22:  Configuring platform CUDA
04:51:31:WU01:FS01:0x22:  Using CUDA and gpu 0
04:51:31:WU01:FS01:0x22:Completed 0 out of 2500000 steps (0%)
04:51:31:WU01:FS01:0x22:Checkpoint completed at step 0
04:52:34:WU01:FS01:0x22:Completed 25000 out of 2500000 steps (1%)
04:53:36:WU01:FS01:0x22:Completed 50000 out of 2500000 steps (2%)
...

Re: WU 13456

Posted: Sat Jul 31, 2021 2:32 pm
by WhitehawkEQ
JohnChodera wrote:@WhitehawkEQ: The missing DLL error occurs on 0.0.13? Which DLLs does it complain are missing?

~ John Chodera // MSKCC
MSVCP140.dll on both Win 7 and Win 10 PC's with GTX 1080 cards.
The GTX 1080's do fine on any WU using 0.0.13, it's when they get WU 13456 0.0.14 that they report missing .dll's, my RTX 2080's don't have any problems with ether 0.0.13 and 0.0.14.

F@H 12 GPU's (10 EVGA GTX 1080 SC, 2 EVGA RTX 2080 Super Black)

Re: WU 13456

Posted: Sat Jul 31, 2021 8:20 pm
by WhitehawkEQ

Re: WU 13456

Posted: Sat Jul 31, 2021 8:38 pm
by Neil-B
WhitehawkEQ wrote:
JohnChodera wrote:@WhitehawkEQ: The missing DLL error occurs on 0.0.13? Which DLLs does it complain are missing?

~ John Chodera // MSKCC
MSVCP140.dll on both Win 7 and Win 10 PC's with GTX 1080 cards.
The GTX 1080's do fine on any WU using 0.0.13, it's when they get WU 13456 0.0.14 that they report missing .dll's, my RTX 2080's don't have any problems with ether 0.0.13 and 0.0.14.

F@H 12 GPU's (10 EVGA GTX 1080 SC, 2 EVGA RTX 2080 Super Black)
You can either try to install the missing dll yourself (see below) or wait until the dev gets a chance to sort this out and repackage the core with the dll(s) included ... given that I believe the 13456/7 projects are on hold at the moment and may not be released to wider than beta before this is done being patient may well be the easiest option as even once the dll is installed you may not get a chance to test if it works.

If you want to try out patching one of the machines with the latest supportable MS Redistributable see https://support.microsoft.com/en-us/top ... f26a218cc0 ... believe you may need to patch both x86 and x64 of the Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019 should install that dll for you.

@Severian used this approach and resolved the issue https://foldingforum.org/viewtopic.php?p=352578#p352598

If you look at "Programs" (add/remove) you should see the redistributable versions installed listed ... many people will have these because either they have patched MS O/S rigorously or have installed other software that needs these and installed them as part of another nstallation ... The listings should I believe read Microsoft Visual C++ 2015-2019 Redistributable (x86) followed by a version number and Microsoft Visual C++ 2015-2019 Redistributable (x64) also followed by a version number.

Re: WU 13456

Posted: Wed Aug 04, 2021 10:42 pm
by JohnChodera
If anyone who experienced the "missing DLL" issue with core22 0.0.14 has not installed the Visual Studio libraries manually to work around the issue, we could use your help testing a new core22 release! Please private message me an email address where we can reach you for details.

~ John Chodera // MSKCC

Re: WU 13456

Posted: Sat Aug 07, 2021 12:39 am
by JohnChodera
We have a fix for this in core22 0.0.15 that we're testing internally right now and hope to roll out in the next couple of days. Thanks so much for your patience, everyone!

~ John Chodera // MSKCC

Re: WU 13456

Posted: Sun Aug 08, 2021 8:20 am
by PaulTV
@JohnChodera, @Neil-B: I see the same as Neil.

For a p13456 job on 0.0.14:

Code: Select all

10:10:55:WU00:FS01:0x22:Project: 13456 (Run 303, Clone 20, Gen 0)
10:10:55:WU00:FS01:0x22:Unit: 0x00000000000000000000000000000000
10:10:55:WU00:FS01:0x22:Reading tar file core.xml
10:10:55:WU00:FS01:0x22:Reading tar file integrator.xml.bz2
10:10:55:WU00:FS01:0x22:Reading tar file state.xml.bz2
10:10:55:WU00:FS01:0x22:Reading tar file system.xml.bz2
10:10:55:WU00:FS01:0x22:Digital signatures verified
10:10:55:WU00:FS01:0x22:Folding@home GPU Core22 Folding@home Core
10:10:55:WU00:FS01:0x22:Version 0.0.14
10:10:55:WU00:FS01:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
10:10:55:WU00:FS01:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
10:10:55:WU00:FS01:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
10:10:55:WU00:FS01:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
10:10:55:WU00:FS01:0x22:There are 4 platforms available.
10:10:55:WU00:FS01:0x22:Platform 0: Reference
10:10:55:WU00:FS01:0x22:Platform 1: CPU
10:10:55:WU00:FS01:0x22:Platform 2: OpenCL
10:10:55:WU00:FS01:0x22:  opencl-device 0 specified
10:10:55:WU00:FS01:0x22:Platform 3: CUDA
10:10:55:WU00:FS01:0x22:  cuda-device 0 specified
10:11:03:WU00:FS01:0x22:Attempting to create CUDA context:
10:11:03:WU00:FS01:0x22:  Configuring platform CUDA
10:11:03:WU00:FS01:0x22:Failed to create CUDA context:
10:11:03:WU00:FS01:0x22:Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
10:11:03:WU00:FS01:0x22:Attempting to create OpenCL context:
10:11:03:WU00:FS01:0x22:  Configuring platform OpenCL
10:11:09:WU00:FS01:0x22:  Using OpenCL on platformId 0 and gpu 0
For a p17601 job ob 0.0.13:

Code: Select all

06:13:31:WU01:FS01:0x22:Project: 17601 (Run 86, Clone 5, Gen 40)
06:13:31:WU01:FS01:0x22:Unit: 0x00000000000000000000000000000000
06:13:31:WU01:FS01:0x22:Reading tar file core.xml
06:13:31:WU01:FS01:0x22:Reading tar file integrator.xml.bz2
06:13:31:WU01:FS01:0x22:Reading tar file state.xml.bz2
06:13:31:WU01:FS01:0x22:Reading tar file system.xml.bz2
06:13:31:WU01:FS01:0x22:Digital signatures verified
06:13:31:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
06:13:31:WU01:FS01:0x22:Version 0.0.13
06:13:31:WU01:FS01:0x22:  Checkpoint write interval: 50000 steps (2%) [50 total]
06:13:31:WU01:FS01:0x22:  JSON viewer frame write interval: 25000 steps (1%) [100 total]
06:13:31:WU01:FS01:0x22:  XTC frame write interval: 12500 steps (0.5%) [200 total]
06:13:31:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
06:13:31:WU01:FS01:0x22:There are 4 platforms available.
06:13:31:WU01:FS01:0x22:Platform 0: Reference
06:13:31:WU01:FS01:0x22:Platform 1: CPU
06:13:31:WU01:FS01:0x22:Platform 2: OpenCL
06:13:31:WU01:FS01:0x22:  opencl-device 0 specified
06:13:31:WU01:FS01:0x22:Platform 3: CUDA
06:13:31:WU01:FS01:0x22:  cuda-device 0 specified
06:13:34:WU01:FS01:0x22:Attempting to create CUDA context:
06:13:34:WU01:FS01:0x22:  Configuring platform CUDA
06:13:38:WU01:FS01:0x22:  Using CUDA and gpu 0
From my sys info (same for both jobs mentioned above):

Code: Select all

08:59:42:          GPU 0: Bus:6 Slot:0 Func:0 NVIDIA:8 GA104 [GeForce RTX 3070]
08:59:42:  CUDA Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:8.6 Driver:11.4
08:59:42:OpenCL Device 0: Platform:0 Device:0 Bus:6 Slot:0 Compute:3.0 Driver:471.41

Re: WU 13456

Posted: Sun Aug 08, 2021 9:01 am
by Neil-B
@PaulTV ... They are working on another new version of the core to resolve this "No CUDA on RTX30*0 gpus" issue - and think have identified both the issue and how it can be easily sorted ... You may see wus running a newer 0.0.15 core (as John mentions above) with which you will most likely have the same issue - that one was focused on resolving the "missing dlls" issue ... It should be the next release after that - I won't guess the version might be 0.0.16 as someone will deliberately change it to make sure I am wrong ;) - and it shouldn't be very long before that arrives (but please don't hold breathe - these things can take days/weeks rather than minutes/hours).