AI: The Final Frontier in Protein Folding Computation

AI: The Final Frontier in Protein Folding Computation

Since the end of 2020, AlphaFold2 has demonstrated exceptional predictive capabilities, rapidly processing and predicting proteins associated with all 214 million gene sequences sequenced by humans to date. This mass production is not a mindless assembly line operation; it continuously improves while solving problems.

One of AlphaFold2's most astonishing achievements is the prediction of a super-large complex structure called the "nuclear pore complex." Nuclear pores are openings in the cell nucleus membrane. Proteins, synthesized in the nucleus, need to exit through these pores, necessitating a channel for transport. This nuclear pore is a pathway for protein transport, comprising a highly complex structure of over 1,000 proteins.

Before the AlphaFold team, understanding the exact structure of the nuclear pore was limited to predicting the connections of certain primary proteins, covering at most 30% of the entire structure. AlphaFold2, while publishing the spatial configurations of 214 million proteins, also showcased the complete 100% spatial configuration of the nuclear pore complex, composed of over 30 different proteins. This accomplishment is a coveted result of many biological laboratories' years of effort.

With the voluminous data of 214 million proteins, categorizing and storing them in the scientifically recognized Protein Data Bank is a massive task. DeepMind anticipates completing the entry of about 100 million protein data this year, with the remainder scheduled for 2023. This first batch alone is several hundred times the size of the current database.

The tool's rapidly improving precision and astounding prediction speed mean that its 18 months of work is equivalent to several hundred times the combined efforts of all global scientists in this field over the past 30 years. Laboratories adept at using it have discovered methods to significantly accelerate research; however, teams that are less proficient or less fortunate might struggle with the results provided by AlphaFold2.

The Shortcomings of AlphaFold2

Why might teams struggle with AlphaFold2's results? This leads us to its limitations.

Despite its speed and comprehensive protein predictions, issues with accuracy persist. Of the few hundred thousand protein structures that humans confidently understand, even those have discrepancies when compared to AlphaFold2's predictions based on base sequences, with some deviations being quite significant, although not highly probable.

Therefore, even scientists with trust in AlphaFold2 cannot rely solely on its computational results for subsequent research and development work.

For instance, in new drug development, scientists use AlphaFold2 to generate preliminary results when screening new protein structures. However, these results are merely helpful hints, indicating potential key areas. Scientists then refine these findings using traditional methods. This doesn't diminish AlphaFold2's value, as providing an initial three-dimensional structural framework can inspire unexpected ideas and save time in early research stages.

AlphaFold2 also assesses its own output, grading the predictions and marking them with colors to indicate the reliability of different protein regions.

The Impact of AlphaFold2

AlphaFold2 is increasingly used in the scientific community. In the face of competition from other AI teams, notably the RoseTTAFold team, DeepMind made a bold move by releasing AlphaFold2's source code, inviting experts to contribute to its enhancement. What impact does this have?

Firstly, our understanding of life's evolution deepens. Previously, evolution was thought to be based on genetic mutations, with genetically similar species having closer evolutionary relationships. This approach works well for closely related species but becomes less clear over longer time spans. For instance, if your genes are AAAAA and mine are AAAAB, we're closely related. But if yours are AAAAA, mine are ABABA, and another person's are AABBA, determining the closer relationship becomes tricky. Protein structure differences can answer these questions, as protein changes occur more slowly than base mutations. Base sequence mutations may alter an amino acid, but don't typically cause major functional changes in the protein structure. More genetic mutations are needed for significant protein changes. Tools like AlphaFold2, analyzing differences in protein spatial configurations, can identify more distant evolutionary relationships.

Moreover, the impact extends beyond scientific theories to drug development. As mentioned, AlphaFold2 provides numerous results in a short time. Despite its current imperfect accuracy, its preliminary insights significantly speed up the interpretation of mechanisms and drug development. Many pharmaceutical companies have developed new modules for AlphaFold2, integrating it with their chemical simulation software to discover new drug molecules.

Furthermore, some laboratories are venturing beyond Earth's 214 million known gene sequences, attempting to create proteins never before seen on our planet. They do this by randomly assigning amino acid sequences to AlphaFold2, like 300 amino acids, and predicting the proteins formed from these randomly ordered sequences. If a stable three-dimensional structure seems likely, the sequence is retained as a candidate. Since these amino acids are random, they might not belong to the 214 million sequences in the database.

The Institute for Protein Design at the University of Washington has already used this method, combined with gene-editing technology, to express 129 types of proteins that do not exist in nature. Their work is close to the mythical tales of ancient gods creating various beings.

The rapid evolution of AlphaFold2 has even surprised its creators. The breakthroughs in this field over the next century could rival those in the semiconductor industry over the past century.

That's all for today. See you tomorrow.

要查看或添加评论,请登录

周キョウホウ的更多文章

社区洞察

其他会员也浏览了