Page 1 of 2
Runs, Clones, Gens
Posted: Mon May 14, 2012 4:49 pm
by iceman1992
I'm beginning to understand. Gens are for serialization, Runs and Clones are for parallelization? What's the difference between Run and Clone?
And if I receive a work unit with a relatively high gen number, does that mean it's closer to completion?
Yes I see. Well I am learning so much now, thanks a lot everyone
still have much more to learn.
Re: Cores a3 and a4
Posted: Mon May 14, 2012 4:59 pm
by 7im
See the fah wiki article on runs clones gens.
Runs, Clones, Gens
Posted: Mon May 14, 2012 5:03 pm
by bruce
iceman1992 wrote:P.S. I still don't know what runs, clones, and gens mean
First, a "trajectory" is a path through which a protein folds, starting in some shape at T=0 and continuing until T="long enough" There are an infinitely number of starting shapes, and from each one of them, there are lots and lots of possible paths that the atoms can take to get to "finished" Those two concepts are somehow related to Runs and Clones.
Each one of those trajectories might be assigned to a different computer (working in parallel) and you could work on it until you're finished. That's not a good plan, though, because the Pande Group wouldn't know if somebody quits and a trajectory is abandoned or otherwise lost . . . so the total time from T=0 continuing until T="long enough" is broken up into a sequence of processing assignments called Gens, where each one begins where the previous one ended (sort of like checkpoints, but on a much grander scale). If something gets lost, it can be reassigned. No single trajectory is processed by either the slowest computer or the fastest computer but rather by an "average" of many different computers so all trajectories will finish (more-or-less) at the same time.
If you're processing Gen 53, then that trajectory has already been assigned to 53 different computers working like a relay-race on that particular Project, Run, Clone and the simulation clock has successfully completed 53 segments of time each equal to the duration of a single Gen.
Re: Cores a3 and a4
Posted: Tue May 15, 2012 4:37 am
by iceman1992
Wow thanks bruce for taking the time to write such a detailed explanation, I really appreciate it
.
Gens are like a relay-race
So the finished shape of gen N is the start shape of gen N+1?
I think this should be split off to a new topic about PRCG numbers? So other people can find it more easily.
Or even better, be made into a FAQ
Re: Cores a3 and a4
Posted: Tue May 15, 2012 5:17 am
by bruce
iceman1992 wrote:Wow thanks bruce for taking the time to write such a detailed explanation, I really appreciate it
.
Gens are like a relay-race
So the finished shape of gen N is the start shape of gen N+1?
I think this should be split off to a new topic about PRCG numbers? So other people can find it more easily.
Or even better, be made into a FAQ
Dan wrote a more colorful explanation and we put it in the Wiki.
http://fahwiki.net/index.php/Runs,_Clones_and_Gens
My "relay race" explanation covers the "normal" use of "Gen" sufficiently for most people. (It does neglect some rather obscure situations that really don't matter much.)
Just don't ask me to explain why with thousands of different blind relay teams, starting at different points and heading in different directions nearly everybody eventually ends up at the same finish line.
Re: Cores a3 and a4
Posted: Tue May 15, 2012 5:33 am
by iceman1992
bruce wrote:Dan wrote a more colorful explanation and we put it in the Wiki.
http://fahwiki.net/index.php/Runs,_Clones_and_Gens
My "relay race" explanation covers the "normal" use of "Gen" sufficiently for most people. (It does neglect some rather obscure situations that really don't matter much.)
Oh yeah I forgot there's a wiki
sorry about that
Still, your explanation is shorter and thus much easier to understand
bruce wrote:Just don't ask me to explain why with thousands of different blind relay teams, starting at different points and heading in different directions nearly everybody eventually ends up at the same finish line.
The magic of science perhaps?
Re: Cores a3 and a4
Posted: Tue May 15, 2012 6:55 am
by Jesse_V
iceman1992 wrote:
bruce wrote:Just don't ask me to explain why with thousands of different blind relay teams, starting at different points and heading in different directions nearly everybody eventually ends up at the same finish line.
The magic of science perhaps?
Pretty much. Note the "nearly everybody" statement. For reasons that are hard to explain fully, sometimes the runners get all turned around and confused, and the baton ends up off the normal track completely. Since they'd never win that way, the crowd doesn't like that and unless some official comes down and sets things straight, everyone ends up booing. Statistical chances of that occuring, but the amazing part is that most of the time the proper finish line is found.
Re: Cores a3 and a4
Posted: Tue May 15, 2012 7:00 am
by iceman1992
Jesse_V wrote:Pretty much. Note the "nearly everybody" statement. For reasons that are hard to explain fully, sometimes the runners get all turned around and confused, and the baton ends up off the normal track completely. Since they'd never win that way, the crowd doesn't like that and unless some official comes down and sets things straight, everyone ends up booing. Statistical chances of that occuring, but the amazing part is that most of the time the proper finish line is found.
By "the runners get all turned around and confused" do you mean bad WUs?
Re: Cores a3 and a4
Posted: Tue May 15, 2012 7:27 am
by Jesse_V
iceman1992 wrote:Jesse_V wrote:Pretty much. Note the "nearly everybody" statement. For reasons that are hard to explain fully, sometimes the runners get all turned around and confused, and the baton ends up off the normal track completely. Since they'd never win that way, the crowd doesn't like that and unless some official comes down and sets things straight, everyone ends up booing. Statistical chances of that occuring, but the amazing part is that most of the time the proper finish line is found.
By "the runners get all turned around and confused" do you mean bad WUs?
Misfolding proteins that unless protein chaperones can fix them, may end up causing disease.
Re: Cores a3 and a4
Posted: Tue May 15, 2012 8:07 am
by iceman1992
I thought by "everyone ends up booing" you were talking about bad WUs
Misfolding proteins are the primary objects of research, no? Can we possibly know when a WU is a misfolded one?
Re: Cores a3 and a4
Posted: Tue May 15, 2012 4:29 pm
by Jesse_V
iceman1992 wrote:I thought by "everyone ends up booing" you were talking about bad WUs
Misfolding proteins are the primary objects of research, no? Can we possibly know when a WU is a misfolded one?
I think the extended analogy is failing rapidly...
Well yes, from what I've read it seems that the research focuses on understanding how proteins misfolding, although there are many other aspects that F@h can study as well. How does the protein fold? What is the folding influenced by? Why does that folding process matter? If I confine it between a whole lot of other stuff does that cause problems? How long does it stay folded? What other molecules can it attach to, and how tightly can it bind to them? Does it reshape itself during this binding process? If it's in an abnormal configuration, does it cause disease, and if so, how? These are all scientifically valuable questions that F@h can help address. I don't think there's any way for you to know if a WU is "misfolded". The WU represents a protein that's in a particular 3D shape, and it's probably pretty difficult for us regular folk to know if its the proper shape or not. Even if the WU isn't, its still very scientifically valuable.
As Bruce implied, the race is very confusing, with people running all over the place. The PG uses things called Markov State Models, which you can think of as trying to look at the big picture of all this confusion, as seen from above. They'll see what percentage of the relay racers follow each direction, where groups of racers cross paths, and where everyone ends up at the end. The model uses a lot of statistics, and I think its a pretty clever way of looking at things.
Also, here's a related thread, where I ended up trying to determine the exact definition of the PRCG numbers, and discovered that there's no single definite definition. Still, it might be helpful to you: viewtopic.php?f=17&t=20095
Re: Cores a3 and a4
Posted: Tue May 15, 2012 5:08 pm
by iceman1992
Jesse_V wrote:I think the extended analogy is failing rapidly...
Well yes, from what I've read it seems that the research focuses on understanding how proteins misfolding, although there are many other aspects that F@h can study as well. How does the protein fold? What is the folding influenced by? Why does that folding process matter? If I confine it between a whole lot of other stuff does that cause problems? How long does it stay folded? What other molecules can it attach to, and how tightly can it bind to them? Does it reshape itself during this binding process? If it's in an abnormal configuration, does it cause disease, and if so, how? These are all scientifically valuable questions that F@h can help address. I don't think there's any way for you to know if a WU is "misfolded". The WU represents a protein that's in a particular 3D shape, and it's probably pretty difficult for us regular folk to know if its the proper shape or not. Even if the WU isn't, its still very scientifically valuable.
Well if the WU protein isn't properly folded, that's what PG is trying to simulate right? So it should be even more valuable than correctly folded ones (which are still very valuable themselves). And do they research other molecules beside proteins?
Jesse_V wrote:Also, here's a related thread, where I ended up trying to determine the exact definition of the PRCG numbers, and discovered that there's no single definite definition. Still, it might be helpful to you: viewtopic.php?f=17&t=20095
Thanks! I like your summary, it very clearly sums up the whole page
Re: Cores a3 and a4
Posted: Tue May 15, 2012 6:26 pm
by Jesse_V
iceman1992 wrote:Well if the WU protein isn't properly folded, that's what PG is trying to simulate right? So it should be even more valuable than correctly folded ones (which are still very valuable themselves). And do they research other molecules beside proteins?
I think so. I know Dr. Kasson studies membrane fusion and things like that, so there's probably more there than just proteins. Cellular infection by viruses involves all sorts of different molecular changes. GROMACS, what F@h uses the most, is "a molecular dynamics package primarily designed for biomolecular systems such as proteins and lipids." That seems to be the range of what F@h simulates. As I already mentioned F@h can tackle problems like protein-proteins docking and binding, since that has implications to drug design. I'm not sure if GROMACS does that as well or if they use a different software package for that.
Re: Cores a3 and a4
Posted: Fri May 18, 2012 1:07 pm
by iceman1992
Jesse_V wrote:I think so. I know Dr. Kasson studies membrane fusion and things like that, so there's probably more there than just proteins. Cellular infection by viruses involves all sorts of different molecular changes. GROMACS, what F@h uses the most, is "a molecular dynamics package primarily designed for biomolecular systems such as proteins and lipids." That seems to be the range of what F@h simulates. As I already mentioned F@h can tackle problems like protein-proteins docking and binding, since that has implications to drug design. I'm not sure if GROMACS does that as well or if they use a different software package for that.
Okay thanks, Jesse_V. So not all proteins then
I have a further question, I noticed that the download size of every WU I get of the same project is more or less similar. The same for the upload size. How could that be if gen N+1 is the finished gen N? Do they do some processing first when generating the next WU? I'm not sure if this is worthy of a new thread
Re: Runs, Clones, Gens
Posted: Fri May 18, 2012 9:55 pm
by bruce
Start with a list of atoms in the protein. List atom type, current location, magnetic angles, velocity in each direction, rotational velocities, temperature, type of bonds, etc. Add anything else that it takes to know the exact state of the protein right now.
If you know all of that information now, you can predict where it will be at time (now+dt) for some small value dt and make a new table of values for some future time. It doesn't really matter whether that data is loaded in RAM and the calculations are currently proceeding or if the data is stored in a checkpoint to be reloaded and processed later, or if the data has been uploaded to a server to be downloaded later. Each snapshot of the current state of the protein is essentially a large table of numbers.
Collect several hundred thousand of those snapshots and call them a Gen. Collect many hundred of those Gens and call them a trajectory. Collect several thousand of those trajectories that start out from different locations or with different velocities and call them a Project.