‘AlphaFold was a huge advance in protein structure prediction…which led to a whole new wave of using deep learning,’ says computational biologist David Baker of the University of Washington
By Steven Rosenbush
March 22, 2023 10:30 am ET
Meta Platforms Inc.’s new tool predicting the structure of hundreds of millions of proteins is the latest example of a breakthrough in computational biology that began several years ago at an Alphabet Inc. subsidiary.
Some scientists expect the new class of artificial-intelligence systems to accelerate work in the life sciences, particularly drug development.
DeepMind Technologies, the London-based subsidiary of Google parent Alphabet, first solved a problem that had been vexing scientists for 50 years using artificial-intelligence as an alternative to much slower and more expensive laboratory techniques for determining the three-dimensional structure of proteins. Those structures are crucial to drug and vaccine development, climate change research and more.
DeepMind said in July that its AlphaFold2 AI system, first released in July 2021, had been used to predict the structure of nearly all proteins known to science. Meta said on March 16 that its ESMFold system had been used to reveal the structures of an even larger group of proteins, including the least understood ones: those found in microbes in the soil, deep in the ocean, and some inside human bodies.
The image released by the European Molecular Biology Laboratory’s European Bioinformatics Institute in July 2021 shows the structure of a human protein modeled by the AlphaFold artificial-intelligence system. Photo: handout/Agence France-Presse/Getty Images
Facebook-parent Meta’s ESMFold employs a large language model that can predict text from a few letters or words, based on the same technology underlying OpenAI’s ChatGPT. DeepMind devised a different approach employing a pair of neural networks. Meta said its approach is 60 times faster than DeepMind’s, but is less accurate.
“These proteins are incredibly diverse and very little is known about them. To reach this scale and go beyond it to potentially billions more sequences, it was critical to make a breakthrough in the speed of prediction,” said Meta AI Research Scientist Alexander Rives. Employing a large language model, Meta was able to make predictions for more than 600 million proteins in two weeks, he said.
“With AI it is now becoming possible to see deep into the structures of proteins and the incredible complexity of the natural world at the molecular scale,” he said.
Since DeepMind’s breakthrough, there has been an explosion of interest in the application of AI to biology.
“AlphaFold was a huge advance in protein structure prediction. We were inspired by the advances they made, which led to a whole new wave of using deep learning,” said Professor David Baker, a biochemist and computational biologist at the University of Washington.
“The advantage of ESMFold is that it is very fast, and so can be used to predict the structures of a larger set of proteins than AlphaFold, albeit with slightly lower accuracy, similar to that of RoseTTAFold,” Dr. Baker said, referring to a tool that emerged from his lab in 2021.
DeepMind open-sourced the code for AlphaFold2, making it freely available to the community. Nearly all proteins known to science—about 214 million—can be looked up in the public AlphaFold Protein Structure Database. Meta’s ESM Metagenomic Atlas includes 617 million proteins.
In the past, researchers spent months or years getting to the point where they were confident that they understood the structure of a protein, said Jennifer Lum, co-founder of Biospring Partners, a growth-equity firm that invests in life sciences technology. “That process has been cut short by AlphaFold, and allowed these teams to shift their time to research and product development further downstream, into other value-added areas,” she said.
The AlphaFold system came together in two distinct stages, and reflects DeepMind’s unusual approach of marrying the rigors of academic research with the culture of a tech startup to handle some of the world’s largest scientific problems.
The turning point occurred in 2018, when DeepMind co-founder and Chief Executive Demis Hassabis asked at an AlphaFold meeting if the team could solve the problem of finding better ways to predict the structure of a protein or if they should tackle something else, said John Jumper, lead scientist on DeepMind’s AlphaFold team.
“It was one of the most uncomfortable meetings I have been in at DeepMind,” said Dr. Jumper, 38, who joined the lab in 2017 after earning his doctorate in theoretical chemistry at the University of Chicago.
In 2018, AlphaFold1 had scored the best results in a biennial experiment known as CASP, where scientists test various methods for predicting protein structure. But that wasn’t good enough for DeepMind.
The AlphaFold team spent the time following CASP in 2018, trying different approaches to improve AlphaFold1, testing them to see if they could match the accuracy of protein structures determined using laboratory methods.
The majority of people on the 15 to 18 member interdisciplinary team came from machine-learning backgrounds. Others had a background in biology. “But they all…became effectively biologists over the course of the project,” Dr. Jumper said. AlphaFold was trained on public data resources, including those managed by the European Molecular Biology Laboratory’s European Bioinformatics Institute.
Dr. John Jumper, a senior staff research scientist at DeepMind, where he leads development of new methods to apply machine learning to protein biology. He is a key member of the AlphaFold team.Photo: DeepMind
Dr. Jumper said the group worked into 2019 before he was truly confident that the team could fulfill its mission.
Traditionally, biologists used laboratory techniques based on X-rays and other technologies to understand the structure of a single protein, a process that to this day can take multiple years and cost $100,000, according to Dr. Jumper.
While computational methods had made advances toward understanding protein structure, they weren’t accurate enough to replace laboratory methods.
The original AlphaFold model used AI to predict the distance between pairs of amino acids, and these distance distributions were used in a second step to arrive at the protein’s predicted structure. The second step involved AlphaFold using this information to come up with a consensus model of what the protein should look like, and didn’t call upon artificial intelligence.
In AlphaFold2, a protein’s structure is predicted by the neural network itself, according to Dr. Jumper. The neural network is paired up with a so-called attention-based neural network that works on various pieces of the structure simultaneously to connect them, much as a person might try to solve a jigsaw puzzle, according to the scientist. “It had nothing to do with time…It was all about the accuracy,” Dr. Jumper said.
Professor John Moult of the University of Maryland has helped lead a multidecade effort to apply computational methods to the understanding of protein structure, which is critical to drug discovery and other problems in biology.Photo: University of Maryland
“In some cases, AlphaFold can predict a protein structure with great accuracy in less than 20 seconds,” DeepMind said. Before AlphaFold, there was no computational method of any kind that equaled experimental accuracy, DeepMind said. While the approach has some limitations, it solved a big problem, said Professor John Moult of the Institute for Bioscience and Biotechnology Research at the University of Maryland, who co-founded the CASP experiments in 1994.
“The team is now turning its attention to new challenges in protein innovation,” Dr. Jumper said.
It is seeking to understand the connection between mutations and changes in the function of a protein that can help treat diseases. A malaria vaccine is currently being developed using AlphaFold, after a team at the University of Oxford used it to identify the structure of a vital protein after years of trying with other methods. “When we combined our model with AlphaFold’s predicted structure, we could suddenly see how the whole system worked.” said Matthew Higgins, professor of molecular parasitology.
Write to Steven Rosenbush at steven.rosenbush@wsj.com
WSJ - Scientists at DeepMind and Meta Press Fusion of AI, Biology
Moderator: Site Moderators
-
- Posts: 44
- Joined: Thu Feb 14, 2008 11:54 pm
- Hardware configuration: [img]http://folding.extremeoverclocking.com/sigs/sigimage.php?u=296154[/img]
- Location: Romeo, MIchigan
- Contact:
WSJ - Scientists at DeepMind and Meta Press Fusion of AI, Biology
1 x i5 - stock GTX 1070 ti + RTX 3070
1 x e3550 - stock 1 x GTX 750 + 1 x GTX 950
...cogito ergo complicare