covid moonshot bad wu setup

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

NineVolt
Posts: 6
Joined: Tue Sep 08, 2020 12:00 am
Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti
Location: Team #12369 - Folding@Undernet

Re: covid moonshot bad wu setup

Post by NineVolt »

@NineVolt: Eep, so sorry for that. We exhausted the 13424-5 projects, but had some backlog of 13422-3 on low priority, which is why you're seeing them now.

We're launching Sprint 4 in about an hour. This set includes the workaround for the constraints issue, and future projects will be sure to include it as well.

~ John Chodera // MSKCC
No worries, just adding my data point. Thanks for the quick response!
NineVolt
Posts: 6
Joined: Tue Sep 08, 2020 12:00 am
Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti
Location: Team #12369 - Folding@Undernet

Re: covid moonshot bad wu setup

Post by NineVolt »

FYI, I now have a WU from project 13426 and the ETA is looking like 2.5 days.
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: covid moonshot bad wu setup

Post by PantherX »

NineVolt wrote:FYI, I now have a WU from project 13426 and the ETA is looking like 2.5 days.
Welcome to the F@H Forum NineVolt,

Assuming that the estimate is taken after the WU has successfully folded 5% then it will be folded before the expiration date of 3 days but after the timeout of 1 day. Thus, you will be getting base points without bonus.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
NineVolt
Posts: 6
Joined: Tue Sep 08, 2020 12:00 am
Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti
Location: Team #12369 - Folding@Undernet

Re: covid moonshot bad wu setup

Post by NineVolt »

Hi PantherX, and thanks for the welcome.

My concern isn't so much about my own points, but more about Folding@Home getting the most out of volunteers' hardware. Currently, my gtx 1070ti is getting an estimated 23k PPD on project 13426 while my gtx 970 is getting an estimated 219k PPD on project 13427. While it's working on these "slow" WUs, my Folding@Home rig is drawing ~40% less power than it usually does. From my recent anecdotal experience, it seems like WUs for some projects (e.g. 13422, 13426) are getting a very small fraction of the PPD of WUs for other projects using the same hardware.

In case this is the same issue that John believes was resolved, I just wanted to note that my experience may call that into question. Or maybe (hopefully?) it's just an issue with the gtx 1070ti on my end. I'll follow up if/when I can confirm a return to typical PPD production on this gpu with WUs from other projects.
muziqaz
Posts: 952
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 9950x, 7950x3D, 5950x, 5800x3D
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: covid moonshot bad wu setup

Post by muziqaz »

23k PPD is too low. Those cards tend to get, what, 700-800k as minimum
FAH Omega tester
Joe_H
Site Admin
Posts: 7943
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: covid moonshot bad wu setup

Post by Joe_H »

Check to see if your system had a video driver crash while processing earlier. People have reported seeing this and after the driver reloads having their GPU run at a low clock until rebooted.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
NineVolt
Posts: 6
Joined: Tue Sep 08, 2020 12:00 am
Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti
Location: Team #12369 - Folding@Undernet

Re: covid moonshot bad wu setup

Post by NineVolt »

I don't see any evidence of hardware or driver issues in any of my logs. Furthermore, I just rebooted and performance for this WU from project 13426 remains unexpectedly low.

I hope this is an isolated incident and not reflective of the performance others are seeing for this project's WUs.
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: covid moonshot bad wu setup

Post by JohnChodera »

@NineVolt: That's so odd--13426 should scream on your GTX 1070. Can you post a PRCG for a completed 13426, or a log snippet (ideally both the client log and the science.log in the work subdirectory)?

~ John Chodera // MSKCC
NineVolt
Posts: 6
Joined: Tue Sep 08, 2020 12:00 am
Hardware configuration: intel i7-6700k
nvidia gtx 970
nvidia gtx 1070 ti
Location: Team #12369 - Folding@Undernet

Re: covid moonshot bad wu setup

Post by NineVolt »

I'm currently still working my first WU from 13426, ETA still ~32 hours. I'll post complete logs once it's done, hopefully before the deadline. PRCG is 13426 (766, 9, 1). In the meantime:

From client log:
20:16:31:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13426 run:766 clone:9 gen:1 core:0x22 unit:0x0000000112bc7d9a5f571151201d6721
...
20:53:26:WU01:FS01:Starting
20:53:26:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ninevolt\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 5124 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
20:53:26:WU01:FS01:Started FahCore on PID 14856
20:53:26:WU01:FS01:Core PID:5596
20:53:26:WU01:FS01:FahCore 0x22 started
...
20:53:27:WU01:FS01:0x22:Project: 13426 (Run 766, Clone 9, Gen 1)
20:53:27:WU01:FS01:0x22:Unit: 0x0000000112bc7d9a5f571151201d6721
...
20:53:58:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
21:29:15:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
22:05:52:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
...
17:45:35:WU01:FS01:0x22:Completed 400000 out of 1000000 steps (40%)
18:14:03:WU01:FS01:0x22:Completed 410000 out of 1000000 steps (41%)
science.log (so far):
https://bpa.st/P7RSM2V5WWDW6GGV4FBUIWGNRQ

Note, system went down for reboot around the 40% mark, 17:42 UTC
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: covid moonshot bad wu setup

Post by JohnChodera »

Oh my goodness---can you dump that WU, @NineVolt? Something is very wrong here.

If you're willing to engage with us a bit more interactively, we might be able to debug what is going on there in more detail. Send me your email in a DM?

~ John Chodera // MSKCC
PantherX
Site Moderator
Posts: 6986
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: covid moonshot bad wu setup

Post by PantherX »

Based on the log file you posted, the quickest way to dump would be to:
Navigate to C:\Users\ninevolt\AppData\Roaming\FAHClient\work
Delete the folder 01 assuming that you're still folding that WU.

The reason the folder 01 needs to be deleted is because that's where the WU is stored which you can figure out by looking at the Work Queue ID in Advanced Control (AKA FAHControl) or the two numbers after WU in this line:
20:53:27:WU01:FS01:0x22:Project: 13426 (Run 766, Clone 9, Gen 1)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: covid moonshot bad wu setup

Post by Neil-B »

... you might want to move the folder to somewhere else for bit rather than delete it as that might allow John/yourself to check things (and even rerun the WU) for troubleshooting purposes ?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: covid moonshot bad wu setup

Post by bruce »

Development is working on developing a CUDA runtime for the FAHCore which will likely increase the speed of nVIdia GPUs. That MIGHT bring really slow GPU like the 710 and the 1030 back into production from your junk bin. No promises, though, :!: except that they'll still be slow.

Several other new things are in-the-works which will be announced when they're ready for release.
markdotgooley
Posts: 101
Joined: Tue Apr 21, 2020 11:46 am

Re: covid moonshot bad wu setup

Post by markdotgooley »

In the past few days I’ve been getting some Moonshot WUs that take over 4 hours to complete on either of my GPUs (one RTX 2060, one RTX 2060 KO, no overclocking) — an hour or more longer than what was usual earlier. I’ll take that as a sign that WUs can and will be adjusted for the new GPUs due to be available in the next few months.
JohnChodera
Pande Group Member
Posts: 467
Joined: Fri Feb 22, 2013 9:59 pm

Re: covid moonshot bad wu setup

Post by JohnChodera »

> In the past few days I’ve been getting some Moonshot WUs that take over 4 hours to complete on either of my GPUs (one RTX 2060, one RTX 2060 KO, no overclocking) — an hour or more longer than what was usual earlier. I’ll take that as a sign that WUs can and will be adjusted for the new GPUs due to be available in the next few months.

Can you let us know which PRCGs these were? I think they're likely part of the 1343x series, where we are simulating the X-ray structures collected by DiamondMX for the COVID Moonshot: http://postera.ai/covid.

And yes, you may be right about that and some other coming good things. ;)

~ John Chodera // MSKCC
Post Reply