WU: Subsetting from the Superset?
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 19
- Joined: Thu Dec 20, 2012 3:01 am
WU: Subsetting from the Superset?
Can a WU be divided up into smaller WU_subsets such that the smaller WU_subsets are sent off the other CPU's for parallelization? Has anyone investigated this option?
Re: WU: Subsetting from the Superset?
No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.
Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.
Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.
-
- Site Admin
- Posts: 7927
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: WU: Subsetting from the Superset?
If you are asking if a WU could be processed on a cluster, in theory it can be done as the Gromacs code supports use of MPI. In practice the PG does not use that method with the current SMP cores as the amount of data that needs to be communicated between subsets is so high. The delays in communicating between CPU's over relatively slow links such as gigabit ethernet would slow down processing greatly. The current SMP processing parallels the WU processing over inter-thread communication using the fast core to core paths available within a CPU chip or between them on a multi-processor logic board.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Re: WU: Subsetting from the Superset?
This is both true and not quite true.mmonnin wrote:No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.
Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.
Folding a specific protein from a specific starting shape is divided into Runs, Clones, and Gens. The work is divided up into Runs and Clones which ARE run in parallel. Each Run-Clone is divided up into Gens which are strictly serial.
Gen (N+1) cannot be started until Gen (N) has been returned. The QRB is specifically designed to encourage everyone to minimize the total time from Gen 0 through the whatever the final Gen number is,
Various people will be working on one Run,Clone or a different Run,Clone in parallel since the current Gen of on has no dependencies with the current Gen of another.
A single R,C,G can be broken up but as Joe has said, it would be totally unsuitable for a Cluster. It is broken up into parallel threads when running on SMP or when running on a GPU. Those threads are restricted to a single device because they require the threads to be constantly resynchronized. Even using memory-to-memory data exchanges, any processor asymmetry (including something simple like an interruption by another process) is "expensive" since the other threads immediately have to wait for the next synchronization to take place.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Site Moderator
- Posts: 2850
- Joined: Mon Jul 18, 2011 4:44 am
- Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4 - Location: Western Washington
Re: WU: Subsetting from the Superset?
What Bruce said is confirmed by the Simulation FAQ: http://folding.stanford.edu/English/FAQ-Simulation and the first couple sections of the Wikipedia article in my signature.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
-
- Posts: 19
- Joined: Thu Dec 20, 2012 3:01 am
Re: WU: Subsetting from the Superset?
What do you exactly mean by, " You need to know where the atoms in the protein move next before you can calculate where they will move after that." Do you mean the (X,Y,Z) coordinate of said atoms? Please clarify.mmonnin wrote:No. Folding is pretty serial. You need to know where the atoms in the protein move next before you can calculate where they will move after that.
Stanford essentially breaks it down for us to the WU level and even that is a very short time frame.
Re: WU: Subsetting from the Superset?
Yes, XYZ coordinates.
If you know the current XYZ coordinates and the velocity vector and the forces, you can predict where all the atoms will be a short time later. This process can be repeated for 500 000 small steps (or some other number) to define a trajectory for every atom.
If you divide the atoms up into two groups and run half on two different nodes, after half the atoms move by one step, you have to retrieve the new positions before you can calculate the revised forces acting between an atom in this half and an atom that's in the other half. (Because the distance that each has moved changes the forces.)
If you know the current XYZ coordinates and the velocity vector and the forces, you can predict where all the atoms will be a short time later. This process can be repeated for 500 000 small steps (or some other number) to define a trajectory for every atom.
If you divide the atoms up into two groups and run half on two different nodes, after half the atoms move by one step, you have to retrieve the new positions before you can calculate the revised forces acting between an atom in this half and an atom that's in the other half. (Because the distance that each has moved changes the forces.)
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 887
- Joined: Wed May 26, 2010 2:31 pm
- Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard - Location: Finland
Re: WU: Subsetting from the Superset?
I suppose even interprocess communication over MPICH within a single computer (essentially TCP/IP transfers through localhost interface, unless I'm mistaken?) was too slow compared to interthread communication inside a single process where all the threads share the same virtual memory space and have full access to it. After all, FAH did have a working MPICH SMP client (now defunct).Joe_H wrote:The current SMP processing parallels the WU processing over inter-thread communication using the fast core to core paths available within a CPU chip or between them on a multi-processor logic board.
It boggles my mind: localhost interface is "kinda slow" as an interconnect...
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
-
- Site Admin
- Posts: 7927
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: WU: Subsetting from the Superset?
I don't recall the performance improvement that was seen going from the MPI based A1 and A2 cores to the inter-thread based A3 core, there might be some postings here or in the blog. I remember from running those older SMP cores that the CPU utilization on each core tended to be about 90% on average on my system. My understanding was that the overhead was worse for higher core counts. The localhost communication was fast enough to be usable, inter-thread is even faster.
iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
-
- Site Moderator
- Posts: 2850
- Joined: Mon Jul 18, 2011 4:44 am
- Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4 - Location: Western Washington
Re: WU: Subsetting from the Superset?
Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.Joe_H wrote:I don't recall the performance improvement that was seen going from the MPI based A1 and A2 cores to the inter-thread based A3 core, there might be some postings here or in the blog. I remember from running those older SMP cores that the CPU utilization on each core tended to be about 90% on average on my system. My understanding was that the overhead was worse for higher core counts. The localhost communication was fast enough to be usable, inter-thread is even faster.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
-
- Site Admin
- Posts: 7927
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2 - Location: W. MA
Re: WU: Subsetting from the Superset?
Well, there is that too. But mostly folding on OS X machines myself, I and the Linux folders did not have that problem.Jesse_V wrote:Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.
-
- Site Moderator
- Posts: 2850
- Joined: Mon Jul 18, 2011 4:44 am
- Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4 - Location: Western Washington
Re: WU: Subsetting from the Superset?
Of course not: it's Linux and OS-X.Joe_H wrote:Well, there is that too. But mostly folding on OS X machines myself, I and the Linux folders did not have that problem.Jesse_V wrote:Another big reason was that MPI-based GROMACS was a nightmare to run in Windows.... I'm sort of glad I joined late enough in the game to jump right into threads.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.