Hi! I was thinking of writing a blog post about how much faster folding@home could complete projects if some of the big tech companies used their idle computing power to help out. Specifically I was thinking of Amazon and their AWS instances.
I realize this might be unlikely because of the nature of FAH and the simulations, but I thought I would ask just in case. Sorry if the following is naive.
So I would like to figure out how much computation a project needs, or better yet, how much a full protein simulation would need. Ideally in x86 FLOPs. If that is unestimatable for an arbitrary protein, what about the amyloid beta protein?
It looks like different work units, actually require a different amount of computation? Is there any standard, average, or equation that can be used to summarize or predict the amount of computation needed per work unit or a sequence of work units for a project?
So if I had this information about the computation needed for a project, protein, or the amyloid beta protein, I can make estimates that relate to x amount of Amazon instances and what not, which would be the meat of the blog post.
Hopefully a post like this would let more people know about folding at home, and maybe even encourage some tech companies to run FAH on their employee computers during non-work (idle) hours, or even better yet, having Amazon AWS running it on unused/unallocated instances.
Thanks!
Calculating total folding time
Moderators: Site Moderators, FAHC Science Team
Re: Calculating total folding time
Yes, different work units require huge differences in computation. Generally speaking, the computational requirements increase more that proportional to the number of atoms and less than the square of the number of atoms as well as with the real-time duration represented by the WU. See the Project Summary for the numbers of atoms and other interesting facts. I would encourage you to read all of the scientific information that can be found at http://www.folding.stanford.edu. Also note the history of the accomplishments.
One thing about research: When questions are answered, they generate more questions ... so it's not surprising that research is limited by the number of Donors as by the hardware they happen to donate.
In the past, a number of corporate donors have contributed to FAH. Most of those contributions occur in short-term high performance bursts -- like when they're in the final stages of testing a soon-to-be-released new piece of hardware. FAH provides an excellent burn-in task!
Idle computing resources really don't exist on Amazon's AWS or other cloud computing services. They're in the business of selling those resources so they do everything they can to find paying customers that can use them. Higher priority resources cost more that resources that are lower priority and therefore slower. (BTW: FAH also rates your contribution based both on the total processor cycles you donate and on how quickly you return the results.)
That's not to say that FAH wouldn't love to use donations of AWS time, whether directly from Amazon or from you with a designated Dollar donation, but then Amazon or your business could advertise that they're donating to scientific research. FAH accepts any donations that contribute to solving the intricacies of diseases.
One thing about research: When questions are answered, they generate more questions ... so it's not surprising that research is limited by the number of Donors as by the hardware they happen to donate.
In the past, a number of corporate donors have contributed to FAH. Most of those contributions occur in short-term high performance bursts -- like when they're in the final stages of testing a soon-to-be-released new piece of hardware. FAH provides an excellent burn-in task!
Idle computing resources really don't exist on Amazon's AWS or other cloud computing services. They're in the business of selling those resources so they do everything they can to find paying customers that can use them. Higher priority resources cost more that resources that are lower priority and therefore slower. (BTW: FAH also rates your contribution based both on the total processor cycles you donate and on how quickly you return the results.)
That's not to say that FAH wouldn't love to use donations of AWS time, whether directly from Amazon or from you with a designated Dollar donation, but then Amazon or your business could advertise that they're donating to scientific research. FAH accepts any donations that contribute to solving the intricacies of diseases.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Calculating total folding time
Ok so the goal is to see 'how did we get here' / what are the possible folding paths for a protein / create a MSM transition matrix, so we have data on which we can do statistical analysis.
So lets say there are 20,000 proteins of interest in the human body. We know the number of atoms in a given protein??? As you said, the worst case computation needed would be proportional to the squared number of atoms. For now, lets use the worst case from the current list of projects: 277543 atoms. Assuming we have the initial data needed for a simulation, and know the number initial number of Runs and Clones, is there any estimate on the worst case number of total Runs and Clones there could be? Is there an average number of branches per Run/Clone?
So the computation required for all 20k proteins would be some function of 20,000 proteins * 277543^2 atoms * function of worst case number of Runs & number of Clones?
I would disagree, however, about idle resources not existing on AWS. The whole point of AWS is that you can scale up (or down) how many instances you need. You can even request limit increases. That means they need to have "inventory" available to sell, which means there is some room for more divisions of resources somewhere among their servers, or they sell reserved instances that users have approved. Maybe any unclaimed resources go to other Amazon services until they are bought by a user, but they definitely have some wiggle room, as is the nature of elastic computing. Of course, as you said, they always want to be selling as much as they can. Regardless, I am more interested in the theoretical of what 100% of AWS resources would do for folding simulations.
So lets say there are 20,000 proteins of interest in the human body. We know the number of atoms in a given protein??? As you said, the worst case computation needed would be proportional to the squared number of atoms. For now, lets use the worst case from the current list of projects: 277543 atoms. Assuming we have the initial data needed for a simulation, and know the number initial number of Runs and Clones, is there any estimate on the worst case number of total Runs and Clones there could be? Is there an average number of branches per Run/Clone?
So the computation required for all 20k proteins would be some function of 20,000 proteins * 277543^2 atoms * function of worst case number of Runs & number of Clones?
I would disagree, however, about idle resources not existing on AWS. The whole point of AWS is that you can scale up (or down) how many instances you need. You can even request limit increases. That means they need to have "inventory" available to sell, which means there is some room for more divisions of resources somewhere among their servers, or they sell reserved instances that users have approved. Maybe any unclaimed resources go to other Amazon services until they are bought by a user, but they definitely have some wiggle room, as is the nature of elastic computing. Of course, as you said, they always want to be selling as much as they can. Regardless, I am more interested in the theoretical of what 100% of AWS resources would do for folding simulations.
-
- Posts: 9
- Joined: Sat Nov 26, 2016 10:28 pm
Re: Calculating total folding time
Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity for up to 90% off the On-Demand price.