What is a WU?

Mr.Nosmo · Post by **Mr.Nosmo** » Mon Nov 24, 2008 4:46 pm

I know a WU is a WorkUnit and a workunit is a very little part of a large project - a few nanoseconds of proteins folding/development in real-time, but is the WU's we folders calculate a part of the project, like a Cube (3D), a Puzzle (2D) or string/time (1D)?

pluto7777 · Post by **pluto7777** » Mon Nov 24, 2008 9:36 pm

Most work units have a file named current.xyz in the work folder that you can open up with a 3d viewer.

Post by **Ivoshiee** » Mon Nov 24, 2008 10:02 pm

A WU is set of initial configuration of atoms/models at given temperature. Each WUs is tied to the WU processing method as well (broadly speaking the Core).

7im · Post by **7im** » Mon Nov 24, 2008 10:52 pm

http://fahwiki.net/index.php/WorkUnits and http://fahwiki.net/index.php/Runs,_Clones_and_Gens

I would also point you at the Project FAQs, but it seems that a work unit is not defined well there.

Mr.Nosmo · Post by **Mr.Nosmo** » Mon Nov 24, 2008 11:03 pm

I was thinking in wider terms, trying to understand the work we do. If it's a part of a "3D"-structure you can still get a good picture with one WU missing. If it's a "2D"-structure, you miss a part of the puzzle if a WU is missing, but you might miss a vital part of the "movie" if a frame "1D"-structure! Please help me to understand....

anandhanju · Post by **anandhanju** » Mon Nov 24, 2008 11:53 pm

I'd say a WU is a 3-D structure wrt space (x, y, z co-ordinates) but holds minimal significance at any given instant or to an extent, as a standalone unit. When the state of the 3-D structure, i.e., the protein changes over time, that is what FAH is interested in. These timeslices are called Gens or generations. Now, you may ask, can't FAH have one single WU that tracks the transition of a protein from State 0 (initial) to the final state? It can, but the WU would probably run for several months depending on the complexity of the protein. Also, as I understand it, the final state of a protein may not be defined or be something that the researchers want to see. Having these WUs grouped by Gens gives the reasearchers the flexibility of stopping a simulation sequence if its not what they want/ if its reached what they want to see.

Do note, all the above is from someone who's knowledge about proteins/ MD is obtained from being here or looking at the Wiki. I wouldn't be surprised if I've got it mixed up

MtM · Post by **MtM** » Tue Nov 25, 2008 1:06 am

I think you need an amendment to the above as there is an distinction to be made between a wu from the uniprocessor clients and those from the hpc clients ( smp/gpu/ps3 ).

A uniprocessor wu represented ( this might have changed with newer cores but I'm taking the information as it's given to me and it was presented in historical context so while it's correct the actual time frame might have been increased allot but still not near the level of the hpc clients ) between 50 and 100 nanaseconds. The villian protein which was used in the example presented to me ( and which I'll therefore use here ) takes around 50 miliseconds to fold. If you put that in a scientific forumula it would be represented like this.

p(folding) = 1 - exp[ -( 50 ns )/( 50,000 ns ) = ~ 0.1 %

To indicate the chanche of a full fold occuring in the timespan of a single work unit is about 0.1 percent.

A workunit is not standalone, as you can notice from the project - run - clone - gen in it's description. Project we can leave out here as that is only an indication as to which proteine the unit represents. A Run however is important as it's an initial state, the 'start position' of a proteine with the placement of molecules and the forces influencing the behaviour of those molecules. Each run consists of a number of Clones, where clones are as the name says an exact copy of the Run in terms of atom location and what they describe as 'temperature' an indication of forces influencing the folding sequence. I can't say I understand but it seems that even when a work unit has the exact same atom placement and initial forces, the distribution of those forces can change without changing the initial structure. Don't ask me, ask Dan

Then you have the Gen, which isn't refering to genetics but to generation. The first gen is the work unit as they cooked it up, and when they get it back they can formulate a new workunit with the starting position of the returned one increasing the timespan represented by the individual work units alone.

Now why I think there is a disctinction to be made is because the hpc clients enable longer total trajectories, and because of this PG is now able to not rely on thousand of new run's/clone's to 'get lucky' and find a workunit which enables them pinpoint a transition state for that proteine between it's folded state and unfolded state, but they can increase the lengt of each single run/clone with more generations ( each of which already representing a longer timeperiod ) and thus serverly increasing the change of finding what their looking for, the transition state/moment.

It also enables them to look for more complicated behaviour, a proteine which folds not one but more times to reach it's final state which using the old approach is not feasible knowing the how small the chanche of being able to witness that occuring when they have to rely on the many run's/clone's approach instead of the increased length by having more generations of the same run/clone.

I might be wrong in my explanation, if so I would enjoy being corrected and pointed to what I misinterpret.

Post by **bruce** » Tue Nov 25, 2008 8:26 am

MtM wrote:I might be wrong in my explanation, if so I would enjoy being corrected and pointed to what I misinterpret.

That's pretty much the way I understand it, although I'll add a couple of comments.

The High Performance clients are able to do their processing more rapidly. That may mean that a single WU represents a longer slice of simulated time, but it may also mean that it is able to simulate a larger protein. As the number of atoms increases, the computer time increases, and that needs becomes a trade-off vs. simulated time.

In general, there is a "reasonable" range of computer time for a WU. It makes no sense to create WUs that will upload results and download new WUs too frequently but it also makes no sense to create WUs that will run for too long a time without uploading the result and getting a new assignment.

MtM · Post by **MtM** » Tue Nov 25, 2008 12:31 pm

bruce wrote:
MtM wrote:I might be wrong in my explanation, if so I would enjoy being corrected and pointed to what I misinterpret.
That's pretty much the way I understand it, although I'll add a couple of comments.

The High Performance clients are able to do their processing more rapidly. That may mean that a single WU represents a longer slice of simulated time, but it may also mean that it is able to simulate a larger protein. As the number of atoms increases, the computer time increases, and that needs becomes a trade-off vs. simulated time.

In general, there is a "reasonable" range of computer time for a WU. It makes no sense to create WUs that will upload results and download new WUs too frequently but it also makes no sense to create WUs that will run for too long a time without uploading the result and getting a new assignment.

Where is the 'reasonable' range based on though? Is it based on an assumption of how many exponentials are involved in the fold of a certain proteine, as in how I understand it longer simulations are the only way to be able to witness the multitude of those occuring, or is it souly based on the computational power? In short, are the technoligical computation limits the deciding factor or the assumptions to how many transitional states are attributed to that proteine?

I'm not sure how they reach an expectation to how many transitional states are possible within a certain proteine so my question comes down to 'are they able to predict them before creating a run/clone based on folding events witnessed in lab experiments or is it 'wet vinger work'?

codysluder · Post by **codysluder** » Sat Nov 29, 2008 9:39 pm

MtM wrote:
bruce wrote:In general, there is a "reasonable" range of computer time for a WU. It makes no sense to create WUs that will upload results and download new WUs too frequently but it also makes no sense to create WUs that will run for too long a time without uploading the result and getting a new assignment.
Where is the 'reasonable' range based on though? Is it based on an assumption of how many exponentials are involved in the fold of a certain proteine, as in how I understand it longer simulations are the only way to be able to witness the multitude of those occuring, or is it souly based on the computational power? In short, are the technoligical computation limits the deciding factor or the assumptions to how many transitional states are attributed to that proteine?

I think the "reasonable" WU is based on an assumed average computer speed for that client, with the idea of a certain frequency range for uploading/downloading. The probability of transitions is related to the total number of WU which looks at both the number of run/clones chosen for the project vs. how many months/years a project can run.

MtM · Post by **MtM** » Sun Nov 30, 2008 11:04 am

codysluder wrote:
MtM wrote:
bruce wrote:In general, there is a "reasonable" range of computer time for a WU. It makes no sense to create WUs that will upload results and download new WUs too frequently but it also makes no sense to create WUs that will run for too long a time without uploading the result and getting a new assignment.
Where is the 'reasonable' range based on though? Is it based on an assumption of how many exponentials are involved in the fold of a certain proteine, as in how I understand it longer simulations are the only way to be able to witness the multitude of those occuring, or is it souly based on the computational power? In short, are the technoligical computation limits the deciding factor or the assumptions to how many transitional states are attributed to that proteine?
I think the "reasonable" WU is based on an assumed average computer speed for that client, with the idea of a certain frequency range for uploading/downloading. The probability of transitions is related to the total number of WU which looks at both the number of run/clones chosen for the project vs. how many months/years a project can run.

If you read my previous posts ( did you? ) you would know the probability of multiple transitions is not based on total number of wu's

codysluder · Post by **codysluder** » Mon Dec 01, 2008 8:54 pm

MtM wrote:If you read my previous posts ( did you? ) you would know the probability of multiple transitions is not based on total number of wu's

The number of WUs is related to the number of ns which is simulated in a single WU. One might simulate 50 ns of a simple protein in the same "sized" WU as 10 ns of a complex protein.

MtM · Post by **MtM** » Mon Dec 01, 2008 11:48 pm

That's very true, but I thought to understand the hpc wu's are not that much 'bigger' but do have much longer simulation times.

spazzychalk · Post by **spazzychalk** » Wed Dec 03, 2008 5:29 am

so if we're needing to run a project through its course, why are we jumping arou nd from projcet to project so much? if we all focused on (obviously the servers not us) getting one project until completion wouldnt we get a lot more done?

also im sure this is going to sound very stupid, but i just cant understand how you can write the code for the WU, write the algorhythm that runs the code in client and not know the outcome when youre writing it?

Post by **bruce** » Wed Dec 03, 2008 9:38 am

spazzychalk wrote:so if we're needing to run a project through its course, why are we jumping arou nd from projcet to project so much? if we all focused on (obviously the servers not us) getting one project until completion wouldnt we get a lot more done?

Each project comes from a specific researcher and resides on a single server.

Each researcher has particular areas of interest so there are various types of studies going on simultaneously.

Each server has some limitations and can only manage work to be distributed to a certain number of clients simultaneously and other servers can distribute other projects to other clients.

also im sure this is going to sound very stupid, but i just cant understand how you can write the code for the WU, write the algorhythm that runs the code in client and not know the outcome when youre writing it?

The algorithm describes only the laws of physics, not the protein or it's shape. The input data describes the initial position and velocity of each of the atoms in the protein. As time passes, each atom moves in accordance with the laws of physics, but there are so many interactions that the positions at future times are not known until they are computed.

Suppose four people play a game of golf. The characteristics of the clubs and the course and the balls and the people are all known, but you still don't know who will win that day. (And that's only considering four golf balls. What if there were thousands or hundreds of thousands of golf balls all in flight at the same time?)

Folding Forum

What is a WU?

What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?

Re: What is a WU?