Hi. I was hoping someone would be able to clarify as to what exactly Projects, Runs, Clones, and Generations are. I've been around F@h for a while now, and I've been doing research into how it works, and I'm enjoying sharing my knowledge about the projects with others. (such as on the Wikipedia page) Oddly, I have yet to run across accurate information into what exactly the PRCG numbers are scientifically. I've been searching around, with limited success, and I was hoping someone could explain. It seems odd that something as fundamental as those numbers (they basically distinguish a WU from other WUs, so I think the information behind that is notable) would not be explained in some page buried in folding.stanford.edu. To my knowledge, servers treat all WUs basically the same way, so there must be a uniform definition. Thus my guess is that it is a rather technical matter, which may vary depending on the calculation techniques being used.
In my search, I ran across this old topic: viewtopic.php?f=17&t=240 which didn't say much but it did link to this page:
http://fahwiki.net/index.php/Runs%2C_Clones_and_Gens which I had seen before. It states that it's written by Dan Ensign, which I'm assuming is the same Dan Ensign mentioned in Dr. Pande's 2007 blog post:
http://folding.typepad.com/news/2007/09 ... eam-2.html and if that's the case then he knows what he's talking about, although it could be an oversimplication since it doesn't seem scientifically written. The information listed on that wiki page was last modified in 2008, so regardless of whether it was accurate at that time or not, I'm not sure I currently trust it. There's probably been some technical advances since then, so perhaps his description is obsolete to some degree. But it seems to describe the following:
Project - not explained outright but seem to be the protein under study. To me, this implies a particular amino acid sequence.
Run - the arrangement of the atoms in the protein in three dimensional space (orientation irrelevant) so in other words what configuration they're in.
Clone - describes the set of forces given to each atom
Gen - time steps in the simulation process, they are very serial so the n+1th is generated after the nth finishes
He seems to be talking about explicit solvation, which is relatively easy for me to visualize. In another topic (viewtopic.php?f=44&t=20008&p=198882#p198882) I used the analogy of billiards balls to try to explain this as I understood it. The Project was the set of balls you picked up, the Run was how you arranged them in the air, the Clone described how all your friends would set them in motion, and the Generations were frames in a recording of what happened. Now, I realize that my description is a simplification of a simplification, and that's why I'd like to clear things up here. My analogy would work for a small number of long simulations (like the Anton computer uses) but F@h's approaches are much more complicated and efficient, so the analogy likely fails there.
In continuing my search, I tried to access the Reference Links at the bottom of Dan Ensign's explanation. Almost all of them were dead, but I was able to find one using waybackmachine.org:
http://web.archive.org/web/200712151740 ... clone.html which was written by an unknown author but he states that he's confused as well. Whoever wrote it did quote from the prominent F@h paper titled
Atomistic protein folding simulations on the hundreds of microsecond timescale using worldwide distributed computing, which I've read through, and it describes how proteins spend much of their time "waiting" in various states before quickly transitioning to the next configuration. F@h takes advantage of this by simulating only the quick transitions, and then use algorithms to statistically stitch the entire simulation together. The unknown author indicated that his best guess is:
Run: Different parameters to the protein fold, such as different temperature, different force cutoff distance, etc..
Clone: One of many simulations being performed, each one has randomized initial velocities, the first one to cross the free-energy barrier wins
Gen: When one WU is finished, another is assigned to continue from where it left off. That one would be one generation further..
This seems to correlate a bit to what Dan Ensign was describing, although the unknown author noted that he could have Run and Gen mixed up, and Gens start when a protein leaves an energy minimum.
So basically, I'm not sure what they are. Do the same definitions apply to both implicit and explict solvation models? How does this work when free-energy perturbation or simulated tempering techniques are used? Compounding this issue is that "Project" describes not just the set of amino acids, but in some cases (see project 5749) also specifies a specific simulation temperature. So I'm a bit confused. Any description would be helpful, and any kind of reference to some reliable material would be really great as well. Thank you.