Protein Folding Problem
- Overview
If you could unravel a protein, you would find that it resembles a string of beads made up of a series of different chemicals called amino acids. These sequences are assembled based on the genetic instructions from the organism's DNA.
The attractive and repulsive forces between the 20 different types of amino acids cause the rope to fold in a "spontaneous origami" manner. This creates the complex coils, loops and folds of the protein's 3D structure.
Experimental methods for determining protein structure include nuclear magnetic resonance and X-ray crystallography. These all rely on a lot of trial and error, years of hard work, and millions of dollars worth of specialized equipment.
So for decades, scientists tried to find a way to reliably determine the structure of a protein based solely on its amino acid sequence.
This grand scientific challenge is known as the protein folding problem.
- The Protein Folding Problem
The protein folding problem is the question of how a one-dimensional chain of amino acids, or beads, folds into a specific three-dimensional structure to perform its function.
Solving this problem has been a goal for years, and has been made more difficult by the many potential arrangements for a protein's folding patterns. Traditionally, solving the problem for a single protein has required months or years of experiments using techniques like X-ray crystallography and NMR spectroscopy.
The protein folding problem is a set of challenges related to how a protein's amino acid sequence determines its three-dimensional atomic structure:
- What is the folding code?
- What is the folding mechanism?
- Can we predict a protein's final folded structure from its amino acid sequence?
The term "protein folding problem" first appeared around 1960 when the first atomic-resolution protein structures were discovered. These structures, of the globins, had helices that were packed together in unexpected ways, which was different from what was previously expected.
The problem has been considered a major mystery and a grand challenge in biology for decades. However, there has been significant progress in recent years, including:
- Designing foldable proteins and nonbiological polymers
- Using computer methods to predict the structures of small proteins
- Using artificial intelligence to predict protein structure from sequence
Some researchers say that AlphaFold's protein structure prediction results are "transformational" and "astounding". However, others note that the accuracy is not high enough for a third of its predictions, and that it doesn't reveal the physical mechanism of protein folding.
Understanding protein folding is important because it helps us understand how protein molecules encode the functions of living organisms. For example, misfolded proteins can cause disease, including amyloid diseases like Alzheimer's.
- The AlphaFold 3 Solution
Isomorphic Labs and Google DeepMind have unveiled AlphaFold 3, a powerful AI system that draws on a novel diffusion-based architecture to accurately model the structures of complexes containing proteins, nucleic acids, DNA and RNA strands, small molecules, ions and modified residues. Diffusion models are a type of generative model that has won significant popularity in recent years given their performance in tasks like image and video generation.
The adoption of a diffusion-based approach allows AlphaFold 3 to generate accurate 3D models of biomolecular complexes by predicting the raw coordinates of individual atoms, a significant departure from its predecessor’s protein-centric architecture.
AlphaFold 3 can predict protein shapes with nearly 80% accuracy in seconds:
- Training: AlphaFold 3 was trained using around 100,000 protein sequences and their structures.
- Evaluation: AlphaFold 3 was evaluated through the Critical Assessment of Structure Prediction (CASP), where research groups compare their predicted protein structures with actual data.
- Results: AlphaFold 3 can predict the structure of protein complexes with DNA, RNA, ligands, and ions.
For example, AlphaFold 3 accurately predicted the structure of a spike protein from a common cold virus interacting with antibodies and sugars. This could help improve understanding of coronaviruses and lead to better treatments.
However, some researchers have noted that AlphaFold 3's accuracy isn't high enough for a third of its predictions, and that it doesn't reveal the rules of protein folding. Others have also criticized the limited accessibility of AlphaFold 3, with the AlphaFold Server initially limiting users to 10 requests per day.
[More to come ...]