Page 3 of 4

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 6:20 pm
by NineVolt
@NineVolt: Eep, so sorry for that. We exhausted the 13424-5 projects, but had some backlog of 13422-3 on low priority, which is why you're seeing them now.

We're launching Sprint 4 in about an hour. This set includes the workaround for the constraints issue, and future projects will be sure to include it as well.

~ John Chodera // MSKCC
No worries, just adding my data point. Thanks for the quick response!

Re: covid moonshot bad wu setup

Posted: Tue Sep 08, 2020 10:40 pm
by NineVolt
FYI, I now have a WU from project 13426 and the ETA is looking like 2.5 days.

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 8:42 am
by PantherX
NineVolt wrote:FYI, I now have a WU from project 13426 and the ETA is looking like 2.5 days.
Welcome to the F@H Forum NineVolt,

Assuming that the estimate is taken after the WU has successfully folded 5% then it will be folded before the expiration date of 3 days but after the timeout of 1 day. Thus, you will be getting base points without bonus.

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 12:11 pm
by NineVolt
Hi PantherX, and thanks for the welcome.

My concern isn't so much about my own points, but more about Folding@Home getting the most out of volunteers' hardware. Currently, my gtx 1070ti is getting an estimated 23k PPD on project 13426 while my gtx 970 is getting an estimated 219k PPD on project 13427. While it's working on these "slow" WUs, my Folding@Home rig is drawing ~40% less power than it usually does. From my recent anecdotal experience, it seems like WUs for some projects (e.g. 13422, 13426) are getting a very small fraction of the PPD of WUs for other projects using the same hardware.

In case this is the same issue that John believes was resolved, I just wanted to note that my experience may call that into question. Or maybe (hopefully?) it's just an issue with the gtx 1070ti on my end. I'll follow up if/when I can confirm a return to typical PPD production on this gpu with WUs from other projects.

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 12:25 pm
by muziqaz
23k PPD is too low. Those cards tend to get, what, 700-800k as minimum

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 3:29 pm
by Joe_H
Check to see if your system had a video driver crash while processing earlier. People have reported seeing this and after the driver reloads having their GPU run at a low clock until rebooted.

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 5:52 pm
by NineVolt
I don't see any evidence of hardware or driver issues in any of my logs. Furthermore, I just rebooted and performance for this WU from project 13426 remains unexpectedly low.

I hope this is an isolated incident and not reflective of the performance others are seeing for this project's WUs.

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 7:45 pm
by JohnChodera
@NineVolt: That's so odd--13426 should scream on your GTX 1070. Can you post a PRCG for a completed 13426, or a log snippet (ideally both the client log and the science.log in the work subdirectory)?

~ John Chodera // MSKCC

Re: covid moonshot bad wu setup

Posted: Wed Sep 09, 2020 8:41 pm
by NineVolt
I'm currently still working my first WU from 13426, ETA still ~32 hours. I'll post complete logs once it's done, hopefully before the deadline. PRCG is 13426 (766, 9, 1). In the meantime:

From client log:
20:16:31:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13426 run:766 clone:9 gen:1 core:0x22 unit:0x0000000112bc7d9a5f571151201d6721
...
20:53:26:WU01:FS01:Starting
20:53:26:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\ninevolt\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.11/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 705 -lifeline 5124 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
20:53:26:WU01:FS01:Started FahCore on PID 14856
20:53:26:WU01:FS01:Core PID:5596
20:53:26:WU01:FS01:FahCore 0x22 started
...
20:53:27:WU01:FS01:0x22:Project: 13426 (Run 766, Clone 9, Gen 1)
20:53:27:WU01:FS01:0x22:Unit: 0x0000000112bc7d9a5f571151201d6721
...
20:53:58:WU01:FS01:0x22:Completed 0 out of 1000000 steps (0%)
21:29:15:WU01:FS01:0x22:Completed 10000 out of 1000000 steps (1%)
22:05:52:WU01:FS01:0x22:Completed 20000 out of 1000000 steps (2%)
...
17:45:35:WU01:FS01:0x22:Completed 400000 out of 1000000 steps (40%)
18:14:03:WU01:FS01:0x22:Completed 410000 out of 1000000 steps (41%)
science.log (so far):
https://bpa.st/P7RSM2V5WWDW6GGV4FBUIWGNRQ

Note, system went down for reboot around the 40% mark, 17:42 UTC

Re: covid moonshot bad wu setup

Posted: Thu Sep 10, 2020 1:51 am
by JohnChodera
Oh my goodness---can you dump that WU, @NineVolt? Something is very wrong here.

If you're willing to engage with us a bit more interactively, we might be able to debug what is going on there in more detail. Send me your email in a DM?

~ John Chodera // MSKCC

Re: covid moonshot bad wu setup

Posted: Thu Sep 10, 2020 7:39 am
by PantherX
Based on the log file you posted, the quickest way to dump would be to:
Navigate to C:\Users\ninevolt\AppData\Roaming\FAHClient\work
Delete the folder 01 assuming that you're still folding that WU.

The reason the folder 01 needs to be deleted is because that's where the WU is stored which you can figure out by looking at the Work Queue ID in Advanced Control (AKA FAHControl) or the two numbers after WU in this line:
20:53:27:WU01:FS01:0x22:Project: 13426 (Run 766, Clone 9, Gen 1)

Re: covid moonshot bad wu setup

Posted: Thu Sep 10, 2020 7:47 am
by Neil-B
... you might want to move the folder to somewhere else for bit rather than delete it as that might allow John/yourself to check things (and even rerun the WU) for troubleshooting purposes ?

Re: covid moonshot bad wu setup

Posted: Sat Sep 12, 2020 6:04 am
by bruce
Development is working on developing a CUDA runtime for the FAHCore which will likely increase the speed of nVIdia GPUs. That MIGHT bring really slow GPU like the 710 and the 1030 back into production from your junk bin. No promises, though, :!: except that they'll still be slow.

Several other new things are in-the-works which will be announced when they're ready for release.

Re: covid moonshot bad wu setup

Posted: Tue Sep 15, 2020 11:23 am
by markdotgooley
In the past few days I’ve been getting some Moonshot WUs that take over 4 hours to complete on either of my GPUs (one RTX 2060, one RTX 2060 KO, no overclocking) — an hour or more longer than what was usual earlier. I’ll take that as a sign that WUs can and will be adjusted for the new GPUs due to be available in the next few months.

Re: covid moonshot bad wu setup

Posted: Wed Sep 16, 2020 4:00 am
by JohnChodera
> In the past few days I’ve been getting some Moonshot WUs that take over 4 hours to complete on either of my GPUs (one RTX 2060, one RTX 2060 KO, no overclocking) — an hour or more longer than what was usual earlier. I’ll take that as a sign that WUs can and will be adjusted for the new GPUs due to be available in the next few months.

Can you let us know which PRCGs these were? I think they're likely part of the 1343x series, where we are simulating the X-ray structures collected by DiamondMX for the COVID Moonshot: http://postera.ai/covid.

And yes, you may be right about that and some other coming good things. ;)

~ John Chodera // MSKCC