AI HELPING SCIENCE: THE ‘SHAPE’ OF THINGS TO COME

When we started working with artificial intelligence (AI) more than a decade ago, people were skeptical about whether this technology would develop enough in the foreseeable future to do anything useful. But we held on to our faith in AI’s potential to benefit humanity. We used games like chess, Go and Atari to train and test our AI systems to become smarter and more capable. In 2016, we decided to use our smart systems to try to solve a 50-year-old fundamental problem in biology, called the protein-folding problem. This was the birth of AlphaFold, our AI system that predicts the three-dimensional structures of proteins based on their amino acid sequence. In this article, you will learn about AlphaFold’s achievements, which demonstrate the power of AI to dramatically accelerate scientific discovery and benefit society.

which is considered to be an AI-based solution to the -year grand challenge of predicting the structures of proteins based on their amino acid sequence.AlphaFold has been used to create the most accurate and complete picture of the human proteome-the set of all the proteins in the human body-with enormous potential to accelerate biological and medical research.
Graphical Abstract Graphical Abstract ( ) We started our journey in by training our AI systems to play and win classic computer games.( ) We then moved to playing more complicated games against real people and, in , our system won a challenge match of Go against the reigning world champion.( ) Shortly after, we began to tackle the protein-folding problem and trained our system on known protein structures.( ) To further train our system, we taught it to use additional databases containing information about how proteins evolved between species.( ) In , our system achieved .% average accuracy in the prediction of the three-dimensional structures of proteins.( ) We hope that our system will contribute to the development of new drugs, new tools for addressing climate change, and help scientists understand these tiny molecular machines that are the building blocks of life.Proteins are made of small building blocks called amino acids (to learn

AMINO ACIDS
The building blocks of proteins.more about proteins and their composition, see this video).You can think of a protein like a string of beads, where the amino acids are the beads.There are di erent amino acids, and they can be arranged in various combinations to make up a protein string.Proteins are made in a "factory" inside cells called the ribosome (to learn more about the ribosome, read this Nobel Collection article).In the ribosome, instructions from our genetic code (our DNA) get translated into chains of amino acids.Then, something amazing happens-these strings of amino acids fold up into complex, three-dimensional structures that in turn determine the functions proteins can perform.

A -YEAR-OLD PROBLEM
Since the early s, scientists have been trying to understand exactly how the particular sequence of an amino acid chain results in the particular three-dimensional structure of a protein.This is known as the protein-folding problem [ ].Because proteins are so important

PROTEIN-FOLDING PROBLEM
A scientific question posed in the s asking how proteins fold to their three-dimensional structure based on their amino acid sequence.
for living things, the protein-folding problem was considered one of the most important problems in biochemistry.When scientists study any protein, they can easily determine which amino acids that protein contains-and even the exact order of amino acids in the protein string.But it has been much more di cult over the years to figure out the final three-dimensional shape that the string of amino acids folds into, to create the working protein machine.After all, proteins are much too small to simply examine under the microscope to see their shapes.
To figure out the three-dimensional structure of proteins, scientists have traditionally used a technique called X-ray crystallography

X-RAY CRYSTALLOGRAPHY
An experimental method for determining the three-dimensional structure of protein using X-rays.(Figure ).This involves crystallizing the protein, which means "freezing" many copies of it in a repeating D pattern.The crystallized protein is then examined using a huge machine that bounces high-energy X-rays o the protein (Figure A).Finally, the researcher must look at the patterns produced by those X-rays and perform very complex math to interpret the results and determine the actual structure of the protein.This process can take up to a few years for each protein!In the past years, the structures of about , proteins have been determined by methods like X-ray crystallography, cryo-electron microscopy (to read more about cryo-electron microscopy, see here), and nuclear magnetic resonance analysis, and those structures have been made openly available in the Protein Data Bank.
While this process has been successful, it is clearly too slow and expensive, especially if we want to find all the structures of the more Hassabis and Jumper AlphaFold than million proteins that we know of.This is over , times more proteins than the number of structures we have determined so far!Why is it so challenging to figure out the final three-dimensional shape of a protein?Well, just like a shoestring, there are an enormous number of ways that a chain of amino acids could potentially fold.Even a small protein, composed of just amino acids, could be in as many as possible configurations ( is followed by zeroes-that is more than the number of stars in the universe!).With so many possible ways to fold a protein, how could scientists ever know which one is correct without doing time-consuming and expensive experiments like X-ray crystallography?This is why, at Google DeepMind, we decided to use the power of artificial intelligence-the ability of computers to learn from

ARTIFICIAL INTELLIGENCE
The ability of computers to learn like the human brain does and mimic human intelligence.
examples and gain insights to solve complex problems-to tackle the protein-folding problem.This approach has proven very useful and saves a lot of time, money, and human e ort while also giving us new insights into how proteins work (Figure B).Traditionally, the structure of proteins has been determined by experiments that use very large, expensive machines to bounce X-rays o a crystallized protein (X-ray crystallography), followed by complex math to interpret the results.(B) Our approach at Google DeepMind is to use sophisticated AI systems that can use known protein structures and protein databases to learn to predict the structures of proteins that have not been experimentally tested yet.This approach saves a great deal of time and resources.

FROM WINNING GAMES TO SOLVING SCIENTIFIC PROBLEMS
Our approach at Google DeepMind is to combine our passion for AI and our passion for science to find ways for AI to help humanity.At first, we taught our systems how to play simple computer games by teaching them the rules of the games and letting them improve through experience.Our next goal was to make these systems win more complex games, as a steppingstone to tackling di cult real-world problems.This included training an AI model to play a board game called Go, which is a very complex game with more than possible board configurations (more than the number of atoms in Hassabis and Jumper AlphaFold the known universe!).For a few years, we developed and tested AI systems in game situations, to see how well they were doing and to keep training them to get better.In , one of our systems called AlphaGo defeated a world champion Go player named Lee Sedol-an achievement that was previously considered unimaginable.This was a huge steppingstone, and it proved that our AI systems were smart enough to deal with complex problems.
Google DeepMind has proud roots in scientific research, and so the protein-folding problem was a natural next step for us (Figure ).Shortly after AlphaGo's achievement in , we assembled a team that started working on predicting the structures of proteins from their amino acid sequences.This new AI system was called AlphaFold (Figure A).AlphaFold was designed to learn from existing information about protein structures that had been published in open databases like the Protein Data Bank.Overall, we had access to about , known protein structures, which we used to train our AI system.We designed AlphaFold to process information somewhat similarly to the way the human brain does, using a computer science idea called artificial neural networks (to learn more about artificial neural networks and machine learning, read this Frontiers for Young Minds article).Like the human brain, AlphaFold can learn from experience and improve its performance.The more examples of protein structures we gave it, the better it got at predicting the structures of new proteins.evolutionarily related to the protein AlphaFold is making a prediction for, and together those sequences contain clues about the structure.The shapes of proteins determine the functions they can perform, and Hassabis and Jumper AlphaFold many organisms must perform the same biological function, such as carrying oxygen in the blood.This means that the three-dimensional structures of all oxygen-carrying proteins from di erent organisms probably stayed similar over the course of evolution, even if their underlying amino acid sequences changed.For that to happen, it means that whenever one amino acid changed in one place in the protein, another amino acid in the protein-the one closest to it in the three-dimensional structure-also had to change accordingly, to preserve the original shape.We call this co-evolution of amino acids, and by feeding this information into AlphaFold, we allowed the system to detect hidden relationships between amino acids.
Once we entered enough information into AlphaFold, the system could predict basic information about the shape of a protein, including the distances (Figure D) and angles between every two amino acids in the protein and the certainty of the prediction (how reliable it is).This information was "recycled" a few times within the system, and in each round AlphaFold improves its prediction.Finally it uses its basic idea of the protein shape to predict the D position of every atom in the protein structure (Figure E).When we started, we tested AlphaFold's predictions on proteins whose structures were already known and let AlphaFold improve by learning from its errors and repeatedly correcting itself until its predictions became much better.After it was trained, we used the same network to run on unsolved structures and provide predictions for them.

THE EVOLUTION OF ALPHAFOLD
One exciting milestone in our journey with AlphaFold occurred in , when AlphaFold came first in a biannual protein structure prediction challenge called CASP.AlphaFold received an average accuracy score of around out of on the hardest proteins [ ], which was a great leap from the previous best score (which was about ).This made us even more confident in AlphaFold's capabilities, and we decided to improve the system even further for the next assessment.In our next version, called AlphaFold , we incorporated more of our scientific knowledge about the physics and geometry of amino acid chains into the system's learning process and aligned it with everything we understood about the protein-folding problem.Essentially, we taught AlphaFold how to perform MSA analysis, and then used that improved MSA analysis to gain a better understanding of protein folding (and therefore the physics and geometry of amino acid chains).This back-and-forth flow of information improved AlphaFold 's performance.
In the CASP structure prediction challenge, AlphaFold won with an astounding accuracy score of .out of [ ].This is approaching the accuracy of determining protein structures using experiments such as X-ray crystallography, but without the high time Figure

FigureFigure
Figure Did you know that almost all the processes happening in your body are performed by tiny biological machines called proteins?Proteins help to see, to move, to digest food, to fight diseases, and to perform many other essential actions needed to keep organisms like us alive and healthy (to learn more about proteins, check out this video).There us