AlphaFold 3 is Beyond Protein?Folding
Collins Patrick O.
Research Analyst @ Dole | Data Science & AI in Agriculture | Precision Farming & Sustainability
In the early 1960s, Christian Anfinsen demonstrated that protein could spontaneously (with nothing extra) refold in a test tube, showing that all the folding is guided inherently by their sequence of amino acids. Since then, predicting the three-dimensional atomic structure of a protein from its sequence has been the biggest problem in biology, known as the “protein folding problem.”
Before jumping into the astounding work of Google DeepMind’s AlphaFold, let’s first lay down the contexts.
From Genes to?Proteins
We already understand the cell as the basic unit of life. In most living cells, there’s a structure called the nucleus, and inside the nucleus is the genome. In humans, the genome is composed of 23 pairs of chromosomes. Each chromosome consists of long strands of genes?—?DNA segments that carry instructions for making proteins, the building blocks of life. Proteins, which make up 20% of the human body and are present in every single cell, are the second most abundant component of the body after water (60%).
The process of protein synthesis begins when a gene is activated. The DNA sequence of a gene is transcribed (copied) into a molecule called messenger RNA (mRNA) in a process called, you guessed it, transcription. This mRNA then leaves the nucleus and enters the cytoplasm, where it is translated (read) by a ribosome (another protein) into a polypeptide chain (a string of amino acids connected by peptide bonds) according to the sequence. And, of course, this process is called translation.
Why Should We Care About Protein?Folding?
Once enzymes link amino acids into polypeptide chains, the chains spontaneously fold into complex three-dimensional structures called proteins. This folding process is incredibly complex and has been a longstanding problem in biology known as the “protein folding problem.”
Now, why is protein folding such a big deal? Well, there’s an age-old adage in biology: structure determines function. Take ATP Synthase, for example. It’s a protein complex that rotates when protons flow through it, allowing ADP and phosphate ions to combine in the mitochondrial matrix. This process builds most of the ATP that powers our cellular processes. The light gray strip in the image below represents the cell membrane, with the outside of the cell above and the inside below. As hydrogen ions diffuse through it via chemiosmosis (see figure below), the stalk subunit of ATP Synthase rotates, changing the shape of the enzyme’s active site. This process ultimately results in the phosphorylation of ADP, converting it into ATP, in what is known as oxidative phosphorylation. It goes to show that to understand how the 100,000+ proteins in the human body function, we first need to figure out their molecular structures.
The Protein Folding?Problem
Figuring out the structure of proteins is particularly challenging due to the complex nature of protein molecules, which consist of lengthy, contorted chains of potentially thousands of amino acids and chemical compounds. These components can interact in countless ways, resulting in a vast spectrum of probable three-dimensional configurations. To put it in context, consider that there are estimated to be about 10?? atoms in the observable universe. However, the number of potential arrangements for a relatively small protein can easily surpass this number due to the many options available for its folding patterns. As a result, solving the ‘protein-folding problem’ of a single protein can take months, if not years, of tedious experiments. Determining the configuration of ATP Synthase, a complex structure, required employing physics-based methods like X-ray crystallography and NMR spectroscopy. The process is both excessively time-consuming and exorbitantly costly.
领英推荐
Critical Assessment of Structure Prediction (CASP)
In 1994, CASP, often called the Olympics of protein folding, was founded to accelerate solving the “protein folding problem.” Typically, more than 100 research groups from all over the world participate in CASP. Teams must predict protein structures that are either yet to be solved by methods like X-ray crystallography and NMR spectroscopy or have just been solved and are held by the Protein Data Bank. CASP uses the global distance test (GDT) metric to score submissions, which measures the similarity between two protein structures. On a scale of 0–100, a successful solution would have a GDT score over 90.
As shown in the chart below, before AlphaFold, the results weren’t just poor; they seemed to worsen over time. However, the fact that the competition becomes increasingly challenging as time goes on, maintaining the same level of performance implies, at least, a little progress over time.
The Scientific Breakthrough
After years of stagnation, DeepMind entered the competition for the first time in 2018 with its AI-based model, AlphaFold, and won with a significant lead over the other contenders. Its GDT improvement, despite the increasing difficulty of the competition, means that it not only beats the contending solutions but also the previous method. Two years later, in 2020, the prediction results of their updated system, AlphaFold 2.0, were so impressive that some within the scientific community have proclaimed the “protein folding problem” as solved.
AlphaFold 3.0: A Gift to?Humanity
Biology operates as a dynamic system due to the interaction of different molecules and structures. On May 8, 2024, in collaboration with Isomorphic Labs, DeepMind unveiled AlphaFold 3.0. Now, this latest version not only predicts the structures of biomolecules, including DNA, RNA, and ions, but also their interactions, such as protein-ligand complexes, their binding sites, and their binding strengths. This capability represents the holy grail for drug discovery in the biomedical field and crop disease resistance in agriculture.
DeepMind has released AlphaFold 3.0 to the public for free, opening a world of possibilities for researchers worldwide. Before AlphaFold, researchers would dedicate entire doctoral degrees to solving a single protein structure. Today, we can do this in minutes. This computational approach has permanently altered the terrain of structural biology careers, equipping researchers with an additional tool to advance the field.
Here is my attempt to model the alpha subunit in the mitochondrial ATP-synthase using the AlphaFold3 webserver. I grabbed the protein sequence from the UniProt Knowledgebase (UniProtKB).
The job took 6 minutes, but since I ran it with the webserver, I don’t know how many GPUs are behind it. You can also see the color-coded confidence score for the predictions, which is often substantial for most structures. I understand that the model tends to be conservative, which is a positive feature— after all, it’s better to be safe than sorry. If anyone wants to check the structure and give me feedback, the model is here: [drive].