WU: Subsetting from the Superset?

stevedking · Post by **stevedking** » Sat Dec 29, 2012 12:33 pm

Can a WU be divided up into smaller WU_subsets such that the smaller WU_subsets are sent off the other CPU's for parallelization? Has anyone investigated this option?

mmonnin · Post by **mmonnin** » Sat Dec 29, 2012 1:28 pm

No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.

Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.

Post by **Joe_H** » Sat Dec 29, 2012 5:56 pm

If you are asking if a WU could be processed on a cluster, in theory it can be done as the Gromacs code supports use of MPI. In practice the PG does not use that method with the current SMP cores as the amount of data that needs to be communicated between subsets is so high. The delays in communicating between CPU's over relatively slow links such as gigabit ethernet would slow down processing greatly. The current SMP processing parallels the WU processing over inter-thread communication using the fast core to core paths available within a CPU chip or between them on a multi-processor logic board.

Post by **bruce** » Sat Dec 29, 2012 9:15 pm

mmonnin wrote:No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.

Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.

This is both true and not quite true.

Folding a specific protein from a specific starting shape is divided into Runs, Clones, and Gens. The work is divided up into Runs and Clones which ARE run in parallel. Each Run-Clone is divided up into Gens which are strictly serial.

Gen (N+1) cannot be started until Gen (N) has been returned. The QRB is specifically designed to encourage everyone to minimize the total time from Gen 0 through the whatever the final Gen number is,

Various people will be working on one Run,Clone or a different Run,Clone in parallel since the current Gen of on has no dependencies with the current Gen of another.

A single R,C,G can be broken up but as Joe has said, it would be totally unsuitable for a Cluster. It is broken up into parallel threads when running on SMP or when running on a GPU. Those threads are restricted to a single device because they require the threads to be constantly resynchronized. Even using memory-to-memory data exchanges, any processor asymmetry (including something simple like an interruption by another process) is "expensive" since the other threads immediately have to wait for the next synchronization to take place.

Post by **Jesse_V** » Sat Dec 29, 2012 9:28 pm

What Bruce said is confirmed by the Simulation FAQ: http://folding.stanford.edu/English/FAQ-Simulation and the first couple sections of the Wikipedia article in my signature.

stevedking · Post by **stevedking** » Sun Dec 30, 2012 7:34 am

mmonnin wrote:No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.

Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.

What do you exactly mean by, " You need to know where the atoms in the protein move next before you can calculate where they will move after that." Do you mean the (X,Y,Z) coordinate of said atoms? Please clarify.

Post by **bruce** » Sun Dec 30, 2012 7:53 am

Yes, XYZ coordinates.

If you know the current XYZ coordinates and the velocity vector and the forces, you can predict where all the atoms will be a short time later. This process can be repeated for 500 000 small steps (or some other number) to define a trajectory for every atom.

If you divide the atoms up into two groups and run half on two different nodes, after half the atoms move by one step, you have to retrieve the new positions before you can calculate the revised forces acting between an atom in this half and an atom that's in the other half. (Because the distance that each has moved changes the forces.)

Napoleon · Post by **Napoleon** » Sun Dec 30, 2012 12:08 pm

Joe_H wrote:The current SMP processing parallels the WU processing over inter-thread communication using the fast core to core paths available within a CPU chip or between them on a multi-processor logic board.

I suppose even interprocess communication over MPICH within a single computer (essentially TCP/IP transfers through localhost interface, unless I'm mistaken?) was too slow compared to interthread communication inside a single process where all the threads share the same virtual memory space and have full access to it. After all, FAH did have a working MPICH SMP client (now defunct).

It boggles my mind: localhost interface is "kinda slow" as an interconnect...

Post by **Joe_H** » Sun Dec 30, 2012 6:55 pm

I don't recall the performance improvement that was seen going from the MPI based A1 and A2 cores to the inter-thread based A3 core, there might be some postings here or in the blog. I remember from running those older SMP cores that the CPU utilization on each core tended to be about 90% on average on my system. My understanding was that the overhead was worse for higher core counts. The localhost communication was fast enough to be usable, inter-thread is even faster.

Post by **Jesse_V** » Sun Dec 30, 2012 7:35 pm

Joe_H wrote:I don't recall the performance improvement that was seen going from the MPI based A1 and A2 cores to the inter-thread based A3 core, there might be some postings here or in the blog. I remember from running those older SMP cores that the CPU utilization on each core tended to be about 90% on average on my system. My understanding was that the overhead was worse for higher core counts. The localhost communication was fast enough to be usable, inter-thread is even faster.

Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.

Post by **Joe_H** » Sun Dec 30, 2012 7:56 pm

Jesse_V wrote:Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.

Well, there is that too. But mostly folding on OS X machines myself, I and the Linux folders did not have that problem.

Post by **Jesse_V** » Sun Dec 30, 2012 8:21 pm

Joe_H wrote:
Jesse_V wrote:Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.
Well, there is that too. But mostly folding on OS X machines myself, I and the Linux folders did not have that problem.

Of course not: it's Linux and OS-X.

Folding Forum

WU: Subsetting from the Superset?

WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?

Re: WU: Subsetting from the Superset?