...from Quanta magazine...
A Digital Locksmith Has Decoded Biology’s Molecular Keys
Neural networks have been taught to quickly read the surfaces of proteins — molecules critical to many biological processes. The advance is already being used to create defenses for the virus responsible for COVID-19.
Geometric deep learning can read complex surfaces of proteins, allowing researchers to predict how the biological molecules interact with one another.
John Pavlus
Contributing Correspondent
June 3, 2020
The computational biologist Bruno Correia used to have a rule in his lab: No machine learning allowed. He didn’t consider it real science. Now Correia has used it to detect potential interactions between proteins — the complex folded molecules responsible for many biological processes — 40,000 times faster than conventional methods. The journal Nature Methods featured his system on its cover in February 2020. Correia said of his early reluctance to embrace machine learning, “I was wrong, and I’m glad I was wrong.”
What changed his mind? Geometric deep learning: an emerging subfield of artificial intelligence that can learn patterns on curved surfaces.
Proteins interact by fitting their bumpy, irregular shapes together like three-dimensional puzzle pieces. Researchers have spent decades trying to figure out how they do so. The well-known protein folding problem, which has challenged scientists since the mid-20th century, attempts to understand protein interaction by decoding the link between a protein’s constituent amino acids and its final 3D shape. In 1999, IBM began developing its line of Blue Gene supercomputers to tackle the folding problem; 20 years later, DeepMind applied state-of-the-art deep learning algorithms to it.
Correia’s system, called MaSIF (short for molecular surface interaction fingerprinting), avoids the inherent complexity of a protein’s 3D shape by ignoring the molecules’ internal structure. Instead, the system scans the protein’s 2D surface for what the researchers call interaction fingerprints: features learned by a neural network that indicate that another protein could bind there. “The idea [is that when] any two molecules come together, what they’re essentially presenting to one another is that surface. So that’s all you need,” said Mohammed AlQuraishi, a protein researcher at Harvard Medical School who also uses deep learning. “It’s very, very innovative.”
MaSIF’s surface-focused framework for predicting protein interactions could help accelerate so-called de novo protein design, which tries to synthesize useful proteins from scratch rather than relying on the naturally occurring variety. But it could also be used for basic biology, said Michael Bronstein, a geometric deep learning expert at Imperial College London who helped develop the system. “How does cancer affect protein properties?” he said. “You can ask whether mutations as a result of cancer destroy something in the protein that makes them work in a different way, by not binding to what they are supposed to. [MaSIF] could answer fundamental questions.”
Skin Deep
If you want to understand how deep learning can create protein fingerprints, Bronstein suggests looking at digital cameras from the early 2000s. Those models had face detection algorithms that did a relatively simple job. “You just need to detect that there is a face” — eyes, a nose, a mouth — “regardless of whether it has a long nose or a short nose, fat lips or thin lips,” he explained.
Modern cameras are more versatile. They can identify a particular person, allowing you to quickly search through your photo library to find all the photos they’re in.
This advance was made possible by deep neural networks, which gave computers a way to learn an individual’s subtle features from training data. The process involves feeding many instances of a particular face to the network and labeling them all as the same person. You don’t have to tell the computer in advance which exact mixture of attributes — green eyes, wide-set eyebrows, black hair — somehow adds up to your own face rather than another person’s. Instead, with enough properly labeled examples, the network learns the distinction itself.
MaSIF does the same thing for proteins. Previous approaches to interaction fingerprinting were like the basic face detection algorithms. They required researchers to define certain geometric patterns in advance — say, a bumpy patch on the surface of a protein with a specific shape and size — and then search for matches. MaSIF, by contrast, starts with a handful of basic surface features known to be associated with protein interactions: for instance, the surface’s physical curvature (into a knob or pocket), its electrical charge, and whether it repels or attracts water. Then, during training, the network learns how to combine these features into fingerprints that detect different higher-level patterns.
Until recently, this kind of machine learning couldn’t be used on the curved, irregular surfaces of proteins. The rise of geometric deep learning opened up the possibility. Correia credits Bronstein with bringing the method to his attention during a two-week collaboration at Bronstein’s home in February 2018. “It was totally him,” said Correia, who’s based at the École Polytechnique Fédérale de Lausanne. “Our handcrafted descriptors were going nowhere.”
One version of the system, called MaSIF-site, can examine the whole surface of a protein and predict where another protein is most likely to bind, an approach similar to painting a target on a curved canvas. “It’s what we like to call the one-body problem,” Correia said. “You can think about this as a way to understand where the functional sites on a particular protein are.” MaSIF-site performed roughly 25% better at this task than two leading site-interaction predictors.
Another version of the system, called MaSIF-search, tackles what Correia calls the many-to-many problem: Instead of predicting how one protein will fit together with one target molecule (as typically happens in docking simulations), the system compares the interaction fingerprints of many proteins to many others, looking for fits. (“In a cell you have 10,000 proteins, and many of them are bumping into each other all the time,” explained Correia.) On this task, MaSIF didn’t outperform a leading molecular-docking predictor; it found roughly half as many potential fits within a random set of 100 proteins. But the docking predictor needed nearly 100 days’ worth of computing time to perform its search. MaSIF took four minutes.
That massive speedup “opens interesting possibilities” for basic research, said Bronstein. After all, in the human body, proteins form functional networks comprising tens of thousands of interactions. “Constructing these graphs takes a lot of time,” Bronstein said. “With methods [like MaSIF], it may only be an approximation, but it allows you to at least build some rough version of these protein-to-protein networks for any organism.”
AlQuraishi noted that while MaSIF’s skin-deep approach to predicting protein interactions made sense, it wasn’t able to capture a phenomenon called induced fit: the way molecular surfaces change shape (and chemistry) when they get close to each other. In other words, the surfaces of two proteins may not exhibit complementary fingerprints until they’re already almost touching — a factor MaSIF will miss, since induced fit depends on the structure beneath a protein’s surface. “What evolution is probably optimizing for is precisely this induced fit,” said AlQuraishi. “What’s surprising about [MaSIF] is that even with this caveat, it still works pretty well.”
Incorporating induced fit and other surface dynamics into MaSIF is something Correia plans to explore. “To me it’s the last frontier of understanding [protein] function,” he said. “That’s probably how I’m going to be spending my next 10 years.” But at the moment he has other pressing business: using MaSIF to scan the spike-shaped proteins that stud the surface of SARS-CoV-2, the virus that causes COVID-19. “We are trying to see what fingerprints are in that virus,” he said. “It does seem like the virus has some places where we could try to attack it, besides the ones that we already knew.” Correia is already using this information about SARS-CoV-2 to synthesize antiviral proteins from scratch; he hopes to publish results this year. “If we could design new proteins based on the surface fingerprints of the viral protein in order to inhibit the way the virus invades host cells, that would be pretty exciting,” he said. “That’s what gets me out of bed.”
What might this mean to F@H...?
Moderator: Site Moderators
-
- Posts: 44
- Joined: Thu Feb 14, 2008 11:54 pm
- Hardware configuration: [img]http://folding.extremeoverclocking.com/sigs/sigimage.php?u=296154[/img]
- Location: Romeo, MIchigan
- Contact:
What might this mean to F@H...?
1 x i5 - stock GTX 1070 ti + RTX 3070
1 x e3550 - stock 1 x GTX 750 + 1 x GTX 950
...cogito ergo complicare
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: What might this mean to F@H...?
That's a really interesting research. My take is that F@H studies how a protein reaches it's final shape (normal or otherwise) and the journey is what's important and takes time (simulation via CPU/GPU). However, the research in ML focuses on how two proteins fit to function which is different but complementary to what F@H does.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: What might this mean to F@H...?
Apparently it's ok with him to assume the external sruface shaape is a static fingerprint. I'm afraid that's a pretty weak assumption but he does admit that it's only an approximation and a approximation that can be generated very quickly can certeainly give science a place to start looking.
It sounds a lot like what's happening with several of the studies described on the COVID-19 News page. My hunch is that the FAH research is more thorough and more accurate.
It sounds a lot like what's happening with several of the studies described on the COVID-19 News page. My hunch is that the FAH research is more thorough and more accurate.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 44
- Joined: Thu Feb 14, 2008 11:54 pm
- Hardware configuration: [img]http://folding.extremeoverclocking.com/sigs/sigimage.php?u=296154[/img]
- Location: Romeo, MIchigan
- Contact:
Re: What might this mean to F@H...?
Indeed, to Bruce's point...
"...OK with him to assume the external surface shape is a static fingerprint..."
"...it wasn’t able to capture a phenomenon called induced fit: the way molecular surfaces change shape (and chemistry) when they get close to each other. In other words, the surfaces of two proteins may not exhibit complementary fingerprints until they’re already almost touching — a factor MaSIF will miss, since induced fit depends on the structure beneath a protein’s surface. “What evolution is probably optimizing for is precisely this induced fit,” said AlQuraishi. “What’s surprising about [MaSIF] is that even with this caveat, it still works pretty well..."
...more to come, certainly something to keep an eye on...
"...OK with him to assume the external surface shape is a static fingerprint..."
"...it wasn’t able to capture a phenomenon called induced fit: the way molecular surfaces change shape (and chemistry) when they get close to each other. In other words, the surfaces of two proteins may not exhibit complementary fingerprints until they’re already almost touching — a factor MaSIF will miss, since induced fit depends on the structure beneath a protein’s surface. “What evolution is probably optimizing for is precisely this induced fit,” said AlQuraishi. “What’s surprising about [MaSIF] is that even with this caveat, it still works pretty well..."
...more to come, certainly something to keep an eye on...
1 x i5 - stock GTX 1070 ti + RTX 3070
1 x e3550 - stock 1 x GTX 750 + 1 x GTX 950
...cogito ergo complicare