When will we be finished?

Post by **Jesse_V** » Fri Nov 30, 2012 8:51 pm

Nathan_P wrote:Well, its still better than Duke nukem forever time!!

That fell under the category of "not soon"©

Alan C. Lawhon · Post by **Alan C. Lawhon** » Sun Dec 23, 2012 6:47 am

bruce wrote:
Jesse_V wrote:The protein folding problem continues, so theoretically there will always be more work to do.
Scientists cannot predict that we'll ever run out of work. There are a lot of proteins that are too big to study with today's hardware -- and then there are interactions between proteins to be studied, etc.

That doesn't guarantee that you'll always get an assignment, though. Every research project starts with a proposal which is reviewed by several people before it's approved. Thus it's POSSIBLE that new projects might not always be approved before older ones end, but I've only occasional gaps when a there was a lull in projects for a particular platform.

Bruce:

I have a question about the bolded part. I read the online transcript of an interview with Dr. Pande where he indicated that FAH is currently limited by a finite amount of computing power. (I'm paraphrasing a bit.) Dr. Pande went on to state, (I believe this was in an interview with a young man whose name might have been "Nicholas Johnson"), that "bigger projects" are envisioned which would require on the order of 500,000 CPUs and even bigger projects than that are envisioned which would require on the order of a million CPUs. (I assume these "bigger projects" involve efforts to model the largest and most complex protein molecules.)

So it seems to me the real problem is the FAH community needs to figure out a way to "scale up" the active donor participation rate by a factor of 5x to 10x. I'm a "new folder" here, (literally in my first month of folding), but I've been giving this problem a great deal of thought. I've already posted one outlandish idea on this, (see my "A Bold Idea For Significantly Increasing FAH Donors" thread), and I've come up with another idea which I expect to post in the next day or so. (This latter idea involves a more practical method for raising awareness - and increasing participation - in FAH. It has the added benefit of being something that everybody can do in fairly short order. I promise that I'll have this new idea posted in the next few days.)

I read somewhere (maybe in the "Protein Folding" article on Wikipedia) that scientists have identified over 50,000 protein molecules which they term as "important to life." With that many molecules to study and model, I doubt if the FAH servers will quit pushing electrons any time soon.

Post by **Jesse_V** » Sun Dec 23, 2012 7:46 am

Alan C. Lawhon wrote:
bruce wrote:
Jesse_V wrote:The protein folding problem continues, so theoretically there will always be more work to do.
Scientists cannot predict that we'll ever run out of work. There are a lot of proteins that are too big to study with today's hardware -- and then there are interactions between proteins to be studied, etc.

That doesn't guarantee that you'll always get an assignment, though. Every research project starts with a proposal which is reviewed by several people before it's approved. Thus it's POSSIBLE that new projects might not always be approved before older ones end, but I've only occasional gaps when a there was a lull in projects for a particular platform.
Bruce:

I have a question about the bolded part. I read the online transcript of an interview with Dr. Pande where he indicated that FAH is currently limited by a finite amount of computing power. (I'm paraphrasing a bit.) Dr. Pande went on to state, (I believe this was in an interview with a young man whose name might have been "Nicholas Johnson"), that "bigger projects" are envisioned which would require on the order of 500,000 CPUs and even bigger projects than that are envisioned which would require on the order of a million CPUs. (I assume these "bigger projects" involve efforts to model the largest and most complex protein molecules.)

The interview was seven years ago, and the guy's name was "Noah Johnson".

http://fahwiki.net/index.php/Vijay_Pand ... nscription

I've also read (or heard from an interview with Dr. Pande in Futures in Biotech) that some projects aren't started because they'll take an infeasible amount of time to complete. Imagine if you're a graduate student, and you need to complete your research within some time period of length X. But maybe you'd like to do some complex detailed simulation that, with the current level of computing power, would take Y amount of time, say Y = 10*x. Obviously you can't do that research because it would simply take too long. You can't wait years for work to complete, so you don't launch the project. Dr. Pande was essentially explaining that there are projects like that. F@h continues to make big splashes in the relevant fields because it's computing power makes it capable of doing work that would be very impractical most everywhere else. But I'm sure there are plenty of other projects that it's too slow to handle.

Bold ideas which generate discussion deserve their own thread to keep things organized and focused, but you may be interested in this thread as well: viewtopic.php?f=16&t=21367

Post by **bruce** » Sun Dec 23, 2012 10:37 pm

I've been reading your "Bold Idea" topic and watching it develop. I commend you for starting the discussion. You might also check out the topic Discussion: what is holding F@h back?

Yes, the bolded part of my quote implies that FAH needs computers that are a lot faster, a lot more computers, and continuously improving software that incorporate faster analysis methodology (whenever it happens to be invented). Even with the Celeb bump, FAH wouldn't be in danger of "finishing" any time soon. My answer to the original question is "never" but I suppose it would be more realistic to use the phrase "not in our lifetime."

Ten+ years ago, when FAH started, many believed that protein folding could only run on supercomputers since it could not be paralleled. (FAH demonstrated that they were wrong.) The growth since then has been phenomenal and I predict that one way or another, the trend will continue.

Alan C. Lawhon · Post by **Alan C. Lawhon** » Wed Dec 26, 2012 4:56 am

bruce wrote:I've been reading your "Bold Idea" topic and watching it develop. I commend you for starting the discussion. You might also check out the topic Discussion: what is holding F@h back?

Yes, the bolded part of my quote implies that FAH needs computers that are a lot faster, a lot more computers, and continuously improving software that incorporate faster analysis methodology (whenever it happens to be invented). Even with the Celeb bump, FAH wouldn't be in danger of "finishing" any time soon. My answer to the original question is "never" but I suppose it would be more realistic to use the phrase "not in our lifetime."

Ten+ years ago, when FAH started, many believed that protein folding could only run on supercomputers since it could not be paralleled. (FAH demonstrated that they were wrong.) The growth since then has been phenomenal and I predict that one way or another, the trend will continue.

bruce:

I've read your posts and comments in these threads. It is obvious to me that you are much more knowledgable about protein folding dynamics and how misfolding triggers these diseases. I'm not a scientist or a medical person, but the limited amount of reading and study I have undertaken on this site (and Wikipedia) leads me to realize that this is a really tough problem. (If curing these diseases was "easy" they would have been beaten long ago.) The more I read about DNA, amino acids, polypeptide chains, protein folding, computational biology and related topics; the more I understand that many of the best minds in the world are trying to crack this nut. (Over in the "Science of FAH" subforum, a new poster - stevendking - posted this thread:

viewtopic.php?f=17&t=23277

a few days ago. When I see creative thinking like this, I know it's just a matter of time until somebody (like steve) bounces a new idea - a new way of looking at the problem - off somebody else and there's that magical "Eureka!" moment when somebody gains a new insight. One of these days there will be a series of "Eureka!" moments and suddenly the dam will break - the mystery of all this will be understood. There are too many great minds working on this not to beat it. Sooner or later we're going to figure out how to make these proteins quit misbehaving ...

I come at this from more of a "personal" rather than a scientific background. My foster sister has Parkinson's disease. It drives me crazy to see the challenges she is facing and the courageous battle she is fighting. If more computing power is what Dr. Pande and his team need, the least we members of the FAH community can do is everything in our power to secure more CPU and GPU cycles and faster computers. (I wish I had a spare $100 million dollars I could donate just to buy fast multi-core i7 computers to donate to the Pande Group.) I'm enough of a realist to know that you're probably right: Conquering these diseases is not going to happen overnight - maybe not even within our lifetimes. (I'm 57 years old now.) But according to this article:

http://www.bit-tech.net/hardware/graphi ... t-matter/1

Cornell Professor Dr. Harold Scheraga, a microbiologist who has been working on protein folding for over thrity years, believes we are within reach of new drugs that will effectively treat the worst diseases. I think, if we're really this close, let's pour on the gas. If Dr. Pande and his team envision projects which will require 500,000 and 1,000,000 CPUs of processing power - enough power to go after the big proteins - then it is incumbent upon us to get busy. They have their hands full doing the science - it's our job to provide them with the tools they need to get the ball over the goal line.

Post by **bruce** » Wed Dec 26, 2012 6:13 am

Do not mistake me for any kind of a microbiologist. My education and profession are in mathematics. I have read a lot of amateur scientist biology plus FAH's web pages but I lot comes from FAH's scientific papers. Every time I read one, a little more falls into place. Fortunately, I've done a lot of simulation of mechanical systems so for me, the math parts are easier than the biological parts.

mmonnin · Post by **mmonnin** » Thu Dec 27, 2012 11:21 pm

Didn't Dr Pande say a protein is folded like 2000 times to get the most likely folded states? To appease us, PG could just send out more then 2000 per to keep us busy and we would be none the wiser. The the case of gaps between projects, it would be better to use server power to keep us happy and produce maybe a small bit of science than to let the user base go elsewhere.

Post by **Jesse_V** » Thu Dec 27, 2012 11:34 pm

mmonnin wrote:Didn't Dr Pande say a protein is folded like 2000 times to get the most likely folded states? To appease us, PG could just send out more then 2000 per to keep us busy and we would be none the wiser. The the case of gaps between projects, it would be better to use server power to keep us happy and produce maybe a small bit of science than to let the user base go elsewhere.

I don't think that's entirely accurate. I'd like to see where he said that, because I think you're misunderstanding what he meant. I recall this blog post on multiple simulations, but I don't think they do the entire simulation multiple times. Rather there's a lot of exploration in the individual simulation. See the Simulation FAQ and take a look at this picture: http://en.wikipedia.org/wiki/File:ACBP_ ... @home.tiff

mmonnin · Post by **mmonnin** » Fri Dec 28, 2012 12:03 am

Ah it wasn't Dr Pande but rather bruce here on the forums:
viewtopic.php?p=232073#p232073

And not complete simulations but states was mentioned.

Either way if we complete the full simulation once or 2000 times, send it back to us to keep us busy. And I've seen mods here post that a certain project R/C/G has been returned by multiple people so I would guess it is happening to an extent. Even if that's just for verification, it'd be better just to keep us busy.

Post by **bruce** » Fri Dec 28, 2012 1:03 am

The "just to keep us busy" concept should never be applied to FAH. Dr. Pande has said we need 100,000 more computers to process the work we have today. Every bit of nonessential work that gets done would mean that there's that much less essential work that's not getting done. Even if Donors were none-the-wiser, intentionally sending out duplicate work would waste precious resources.

Note that there's an important difference between repeating identical work and processing a similar WU that differs from some other WU that's statistically SIMILAR. Maybe I didn't make that clear in the post you referenced. Please read about Brownian Motion

Realistically, though, stuff happens. Donor computers crash, data gets erased unintentionally, somebody leaves town shutting down their computer without thinking about the active WU, a donor machine may be excessively overclocked causing an error or for other reasons, results do not get returned or there is a major delay that might have been avoided. Since Gen (N+1) cannot be created or assigned until Gen (N) is completed, so loss of a WU will suspend processing on that trajectory.

To make up for those missing results, whenever a WU passes the timeout, it is duplicated and sent to somebody else. allowing the trajectory to continue. The undesirable side-effect, though is that sometimes work may be duplicated. If both copies of the same WU may happen to be returned., work is duplicated but there's really no way around that much waste as long as everything has to be completed.

Not all the potential problems are on the Donor side, though. Servers do crash, too, in spite of the PG's efforts to keep everything running constantly. Then, too, servers run out of work -- often because there's no more space to store the results. When the disk-array gets full, data has to be manually moved to off-line storage to make room for new generations of WUs so the project can resume. Neither event happens very often, but sometimes the work they'd really like you to be working on simply cannot be sent to your machine immediately. With multiple servers that are geographically separated, though, chances are that another server can assign you work on another project.

mmonnin · Post by **mmonnin** » Fri Dec 28, 2012 9:31 pm

If there needs to be 100k more computers then we are busy and we would never at the point of 'just to keep busy'. Never once did I say nonessential work should be done over essential work. Only when there's no more work to be done, which is the topic of this thread, should we repeat work to keep users and that is the only instance I was suggesting it.

jimerickson · Post by **jimerickson** » Fri Dec 28, 2012 9:38 pm

when? not soon...

Folding Forum

When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?

Re: When will we be finished?