Page 1 of 2

WU's per project

Posted: Tue May 15, 2012 12:30 pm
by jsanthara
I was just curious, roughly how many WU's need to be completed before a project is finished?

Re: WU's per project

Posted: Tue May 15, 2012 1:14 pm
by 7im
Projects do not have a fixed number of work units. How many depends on the size and complexity of the protein being simulated. And even after starting a project, it can be ended early if the results are distinct, or they can add mor work units if they need more details.

1000s...

Re: WU's per project

Posted: Tue May 15, 2012 5:51 pm
by jsanthara
So, they don't publish any statistics after finishing a project? (WU's completed, project start and finish, number of contributers, etc.)

Re: WU's per project

Posted: Tue May 15, 2012 5:56 pm
by 7im
Nothing published.

Feel free to make a request... ;)

Re: WU's per project

Posted: Tue May 15, 2012 6:31 pm
by verlyol
Very pertinent question jsanthara, I also think that this type of information could be interresting !!
But I also think it would be an additional workload for the PD ...This will be a large amount of additional data to be processed !

Re: WU's per project

Posted: Tue May 15, 2012 7:10 pm
by 7im
Other than end your curiosity, does publishing this data benefit the project?

You can also make your own estimates. While not exact, but if you are folding RCG 10, 26, 142, that might indicate 36920 work units in that project so far (assuming all the runs/clones are up to that generation) ...

Re: WU's per project

Posted: Tue May 15, 2012 7:20 pm
by Jesse_V
7im wrote:You can also make your own estimates. While not exact, but if you are folding RCG 10, 26, 142, that might indicate 36920 work units in that project so far...
And if each WU takes 5 hours to be completed on the average quad-core CPU, that's 46,150 CPU-hours, which is 5.27 CPU-years. And I've seen WUs where the Generation number is over a thousand. :!:

Supplying a progress bar indicating how close each each Project is to finishing would be nice, and might do more than just satisfy our curiosity, though IMO the benefits wouldn't outweigh the costs of the additional overhead for the PG. They already do a great deal for us as it is.

Re: WU's per project

Posted: Tue May 15, 2012 9:19 pm
by jsanthara
7im wrote:Other than end your curiosity, does publishing this data benefit the project?
I suppose it doesn't, I was just wondering if there was published project data.
Jesse_V wrote: Supplying a progress bar indicating how close each each Project is to finishing would be nice, and might do more than just satisfy our curiosity, though IMO the benefits wouldn't outweigh the costs of the additional overhead for the PG. They already do a great deal for us as it is.
Good point.
7im wrote:You can also make your own estimates. While not exact, but if you are folding RCG 10, 26, 142, that might indicate 36920 work units in that project so far (assuming all the runs/clones are up to that generation) ...
I'm still a little unclear about the whole run, clone, gen thing. If you are working a unit (e.g. RCG 10, 11, 12 for P1314), is that the only unit with that RCG that will be assigned for that project (assuming it is completed)? Sorry for the somewhat unrelated question, I just didn't feel like it was worth opening another thread.

Re: WU's per project

Posted: Tue May 15, 2012 11:38 pm
by bruce
jsanthara wrote:I'm still a little unclear about the whole run, clone, gen thing. If you are working a unit (e.g. RCG 10, 11, 12 for P1314), is that the only unit with that RCG that will be assigned for that project (assuming it is completed)? Sorry for the somewhat unrelated question, I just didn't feel like it was worth opening another thread.
Each PRCG needs to be completed once. Sometimes they get lost or contain an error and then they can be reassigned so a small percentage are completed twice. The goal is to keep that percentage as small as possible consistent with completing everything so that as little processing as possible is wasted on duplicated efforts. (Such as a WU that it expires (is assumed to be lost) and then shows up after the timeout.)
7im wrote:You can also make your own estimates. While not exact, but if you are folding RCG 10, 26, 142, that might indicate 36920 work units in that project so far (assuming all the runs/clones are up to that generation) ...
As an estimate, that's all we have to go on, but it's always going to exceed the actual number. Some WUs cannot be processed beyond some Gen, and the Gens that are completed don't progress at the same rate, so it's wrong to assume that the total number (R*C) of trajectories all extend from Gen=0 to Gen=N. The WUs enumerated by R,C,G do not form a "cube" plus some do end up being duplicated. -- but as a rough estimate, it will do.

Re: WU's per project

Posted: Tue May 15, 2012 11:45 pm
by Jesse_V
jsanthara wrote:I'm still a little unclear about the whole run, clone, gen thing. If you are working a unit (e.g. RCG 10, 11, 12 for P1314), is that the only unit with that RCG that will be assigned for that project (assuming it is completed)? Sorry for the somewhat unrelated question, I just didn't feel like it was worth opening another thread.
People have asked this before, so if you're after some explanations and/or definitions, please see viewtopic.php?f=16&t=21616 and viewtopic.php?f=17&t=20095

Re: WU's per project

Posted: Tue Feb 05, 2013 4:39 am
by Jesse_V
Finally able to answer this question.
Proteneer informed me on IRC that the number is in the range of 100k-200k WUs for large deployed projects. This translates into somewhere around 300GB of WUs per project.
It's a good thing that projects can be divided up, distributed to everyone, and (aside from the serial Generation component) computed in parallel. :)

Re: WU's per project

Posted: Tue Feb 05, 2013 5:15 am
by 7im
Was that just for his GPU projects? Or does that include larger SMP projects as well? Sorry, but one answer is rarely THE answer in regards to FAH. FAH simply changes and advances too quickly for any definitive to answers to stay definitive.

Re: WU's per project

Posted: Tue Feb 05, 2013 5:40 am
by Jesse_V
7im wrote:Was that just for his GPU projects? Or does that include larger SMP projects as well? Sorry, but one answer is rarely THE answer in regards to FAH. FAH simply changes and advances too quickly for any definitive to answers to stay definitive.
Good point. I asked, and the answer was:
<proteneer> probably in general
The figure may change over time, but now we have something, which is better than nothing.

Re: WU's per project

Posted: Tue Feb 05, 2013 7:02 am
by GreyWhiskers
This series of posts reminded me of the post on Anatomy of a series of GPU Work Units from the trenches that I did almost two years ago - Fri May 13, 2011.

This described a GPU project that issued the work units in sequence starting with Gen 0, then Gen 1, then Gen 2, etc. Within a Gen, they cycled through Runs and Clones in chronological order.

So far as I remember, this is the only project that was laid out this way. But, it did give an excellent illustration of what seemed to be the serial nature of the process - where the Gen 0 was the initial starting point for a wide set of trajectories. After all the trajectories completed Gen 0, they went to Gen 1, and ran those trajectories. and, in my sequence of 370 projects in time order, there were a few out of order ones - looked like PG needed to catch up on a few unfinished trajectories.
GreyWhiskers wrote:I had been wondering about how my contributions are actually supporting the science at FAH when I wrote the post below last week. I'm a retired "quant" who likes to understand the numbers and relationships of the things I work with.

Re: publication of new paper on FAH (bigadv) results
PantherX wrote:
GreyWhiskers wrote:... Are big molecules broken into many smaller segments that are issued as WUs? What are the "runs" "clones" "generations" exactly? How does one set of runs set up a subsequent set?...
This is an explanation of PRCGs that is easy to understand (http://fahwiki.net/index.php/Runs%2C_Clones_and_Gens).
I have processed 370+ Project 6801 Core 15 GPU WUs, at a rock steady 1:21 TPF, interspersed with four 1xxxx WUs that must have been part of the advmethods projects.

I hadn't really looked at the series of WUs very closely until this morning. I pulled out of the HFM Work History viewer .db3 database with Datadmin 3.4.8 personal and Excel exported as a .prn text file most of my work history since 1 April. ...

Reading Ensign's terrific paper on "Runs, Clones and Generations" in the FAH Wiki (see PantherX's link in the quote above), I see that the Gens are timesteps along the trajectory of particular clones for a particular run.

The interesting thing about this sequence is that I've consistently since 1 April been getting successive Gens - starting with 0 progressing to 12 in the last day. This shows that for the runs they are working on, all the WUs start at the beginning timestep - Gen 0.

EDIT: There are a couple of what may be "catch up" out of sequence WUs on May 5, 6, but the rest of the series seems consistent.

For Gen0, 1-4 April, I got a set of WUs for Clone 0, 1, 2, 3, 4 successively for various Runs.
Gen 1, 3-6 April
Gen 2, 7-9 April,
up to today
Gen 12, 11-13 May (and that's what I'm folding at the moment). It's on Clone 3, Edit: next two WUs after table below were Clone 4s.

I can just visualize in my mind's eye the trajectory tree growing time step by time step, gen by gen.

This makes clearer this quote from Ensign's paper
Okay, here it is: The CLONE numbers are labels for each trajectory that we run. Each GENeration is another chunk of time along that trajectory. So, say that I benchmark CLONE0, GEN0 (the first 4 ns). That WU is then done, and the FAH software builds a new WU with starting coordinates (and velocities and stuff) where mine left off. Then the new WU -- GEN1 of CLONE0 -- gets sent to you, and you simulate the next 4 ns. And so on. So CLONE is a label for an individual trajectory, and GENerations are time steps along that trajectory.

Re: WU's per project

Posted: Tue Feb 05, 2013 8:04 am
by Jesse_V
GreyWhiskers, thanks for the really interesting observations.

We also also have the Simulation FAQ (http://folding.stanford.edu/English/FAQ-Simulation) and a seperate thread, both devoted to describing what the PRCG numbers mean.