Page 1 of 1

COVID (GPU, core22 0.0.5) projects 13400-13401 to FAH

Posted: Fri Apr 24, 2020 11:56 pm
by JohnChodera
We've just released two new projects---13400 and 13401---that validate new features of the new core22 release 0.0.5 that we will use for prioritization of compounds for COVID-19 experimental collaborators to make and test!

Project descriptions: https://stats.foldingathome.org/project?p=13400

We've restricted these projects to linux only because we're testing out some new custom integrators that currently seem to perform poorly on win. We're working on improving that for the next batch!

Project 13400 : core22 0.0.5 : linux only [due to inefficiencies in win]
Stats Credit = 205000
timeout = 1.5
deadline 2.0

Project 13401 : core22 0.0.5 : linux only [due to inefficiencies in win]
Stats Credit = 65392
timeout = 0.8
deadline 1.0

Re: COVID (GPU, core22 0.0.5) projects 13400-13401 to FAH

Posted: Sat Apr 25, 2020 3:52 pm
by JohnChodera
We're noticing higher failure rates than normal with these projects, so I am debugging the cause today. We're still getting tons of useful data, though! Thanks for bearing with us!

Re: COVID (GPU, core22 0.0.5) projects 13400-13401 to FAH

Posted: Mon Apr 27, 2020 5:14 am
by JohnChodera
I've done some analysis of the higher failure rates:

Out of 1551 returned WUs:
A. 411 contain ERROR:exception: There is no registered Platform called "OpenCL"
B. 151 contain Following exception occured: Particle coordinate is nan
C. 8 contain ERROR:exception: There is no registered Platform called "CPU"

We're investigating A and C, which shouldn't happen if the client and core use the same criteria for determining eligibility for core22 projects.
I'm also trying to reproduce the failures in B, which I haven't seen on our local GPU cluster full of GTX 1080, GTX 1080Ti, and RTX 2080s.

For now, we've collected a ton of useful data to examine, so I've set 13400/13401 to collect-only.
Thanks for your help, everyone!

~ john Chodera // MSKCC