The Golden Age of Structural Biology
Javier Tordable
CEO at Pauling | AI for drug discovery | Ex-Google, Ex-Microsoft
In 1953, Francis Crick and James Watson deciphered the double helix structure of DNA. That moment marked the birth of structural biology, based on the idea that the shape of molecules determines their properties and how they behave inside living organisms.
Back then, there were no computers that could model molecular shapes and calculate the movement and physical properties of these molecules. Several decades later, the first realistic simulations of materials and biomolecules could be built in mainframes, using tools like molecular dynamics (MD) simulations. In MD, quantum particles are approximated as classical particles subject to force fields.
The calculations necessary to derive molecular behavior from fundamental physical laws, like the Schr?dinger equation, are complex and impractical for very large numbers of particles. Because of this, methods which rely on physical laws (also called?ab initio) are hard to scale and constrained by the availability of computing capacity. This is the case for biological systems, which have very large and complex structures interacting with each other. For example, in the case of proteins, the fundamental structural and functional building blocks of cells, which can have thousands of amino acids and tens of thousands of atoms.
In 2020, AlphaFold marked a drastic advance in molecular simulation. AlphaFold is a deep learning system which approximates protein shapes. It leverages a large library of structures discovered experimentally over the course of several decades, which are available at the Protein Data Bank. The Alphafold team computed the structure of essentially every protein in humans and multiple other species.
Previously, protein structures were discovered experimentally using X-Ray crystallography, nuclear magnetic resonance or electron microscopy. A process that requires months or years of work. This new method effectively helped the scientific community save decades of effort.
The application of AI to protein folding was lauded as the 2021 scientific breakthrough of the year by?Science magazine. However, the most significant contribution of AlphaFold in the long term may not be the protein structures themselves, but bringing to the forefront of scientific process the ideas of approximate physics. At its core AlphaFold has no knowledge of Schr?dinger equations, classical mechanics, physics or biology. It simply extracts insights from a library of geometric shapes and applies that insight to new shapes. It only understands statistical regularity, just like other machine learning systems.
The methods of approximate physics are such a radical departure from traditional scientific thought that it’s worth looking at an example in more detail. Imagine an experiment in which we drop a solid ball from a particular height, and we want to predict when it will touch the ground. One approach to answer this question would be to use the law of gravity to predict the acceleration on the ball, and given the height from which it’s dropped, calculate the time it would take to reach the ground.
Another approach would be to throw several thousands balls of similar shapes, weights and material composition, and then build a statistical model which takes the features of the original ball into account and gives an approximate answer. This solution could be even more precise than the first one, because it takes into account air friction and other physical realities that our simplistic?ab initio?model may have overlooked.
On a practical basis the second model could be strictly better. It could be faster to calculate, and is probably more accurate for the use cases that we care about. But here is the key difference, it gives no insight at all about the reason why the phenomena is the way it is. It has no knowledge of gravity, friction, pressure, air turbulence, or any other physical feature of the experiment. In a way, the second method finds a shortcut through physical knowledge and computational models to give us an answer, which can be very precise, but is based on statistical data, rather than theoretical knowledge.
领英推荐
Of course, everything in the natural sciences is approximate. Physical laws are approximations to material reality. The equations that we use to describe these laws often include approximations in their deduction. Then, these equations are typically approximated numerically, and the computers that we use for those numerical calculations make approximations as well given their limited precission. But there is an overarching attempt to stay as true to reality as possible (often with provable error bands). Not here. Here we are throwing that out of the window and saying that, if it ends up close enough to the end result that we want, we don’t care where it came from.
A few important questions come up: are there cases in which the statistical approach breaks down? When can we be reasonably certain that our answer is close to the reality indicated by experiments? When is it too far to be of practical use? When can we use statistical methods to derive theoretical truths? We don’t know yet.
Despite of the lack of guarantees, the power of this method is so significant that there has been a Cambrian explosion of research using it. To a point that tasks which seemed impossible within structural biology just a couple of years ago can now be achieved easily. And this has spawned a tremendous amount of research extending these methods across structural biology. A golden age.
A few highlights:
Approximate physics doesn’t give us any insight about the underlying laws of nature. But it enables us to do things that were very hard before. Because of this, it’s easy to underestimate how transformational it can be. It reminds me of the radical reduction in the cost of gene sequencing, from hundreds of millions to hundreds of dollars.
To conclude, a bit of speculation. How will this research evolve in the next few years? Here are my predictions:
Account Executive at Full Throttle Falato Leads - We can safely send over 20,000 emails and 9,000 LinkedIn Inmails per month for lead generation
2 个月Javier, thanks for sharing! How are you?
Join the Automatic Action Mastermind to lose 12-20lbs, build muscle, and boost your performance—guaranteed results in 12 weeks or your money back!
1 年AwesomeJavier! Looking forward to more of your posts.
Innovation Leader
2 年Outstanding! Let's chat some time when you have a chance.
20+ years in finance, 15+ years in net zero & sustainability. Global Governing Trustee of Urban Land Institute.
2 年Absolutely amazing!! Hope you are staying safe!! Javier Tordable
Associate Vice President and Senior Director-Scientific Affairs, Labcorp Dept of Science Technology
2 年Stimulating article. Personalized medicine based on generalized conclusions derived from population studies - so there’s that paradox.