Simulating 500 million years of evolution with a language model
"Biology is the most advanced technology that has ever been created, far beyond anything that people have engineered. The ribosome is programmable—it takes the codes of proteins in the form of RNA and builds them up from scratch"
EvolutionaryScale - a company founded by ex members of Metas Fundamental AI Research lab, have just announced ESM3 - the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins.
ESM3 is a game-changer. It allows scientists to not just better understand proteins, but to create new ones.
The model was trained with an astonishing 1 trillion teraflops, outpacing any other biological model out there. The dataset? A massive 2.78 billion proteins from all corners of the Earth’s natural diversity.
What makes ESM3 truly revolutionary is that it’s the first generative model in biology that can handle the sequence, structure, and function of proteins all at once. This opens up a whole new frontier for scientific innovation.
How does it work?
ESM3 has a straightforward goal. For each protein, it looks at its sequence, structure, and function. These parts are broken down and partially hidden. ESM3’s job is to guess the hidden parts, much like language models guess missing words. To do this, ESM3 must deeply understand how sequence, structure, and function are connected.
By working with billions of proteins and parameters, ESM3 learns to mimic evolution.
Once trained, ESM3 can generate new proteins based on prompts. Scientists can guide ESM3 to create proteins for various uses, like medicine, research, and clean energy.
ESM3’s ability to understand and combine sequence, structure and function allows scientists to create new proteins with great control.
Say Hi To A New Green Florescent Protein
In their scientific preprint, Evolutionary Scale announce they have synthesised a new Green Florescent Protein (GFP) with significantly improved brightness (am I alone in wanting protein based lighting for my home?) which they've dubbed esmGFP.
The power of the generative model is illustrated by the fact that this new protein is far removed from other GFPs occurring in nature, and that the emergence of new GFPs takes a very long time in nature. To cite their press release :
"The process of evolution that gives rise to new fluorescent proteins takes epochs of time—the story of this protein family reaches back into depths of natural history and geologic time where somewhere in the distant past nature invented the first fluorescent protein. Natural fluorescent proteins have diverged over 100s of millions of years from ancestral sequences in deep history to become the proteins they are today."
ESM3 is truly simulating millions of years of natural evolution!
领英推荐
Why does this matter?
In the 1970s, molecular biology changed dramatically with the start of the recombinant DNA era. Scientists invented genetic engineering then. This led to a revolution in our understanding of genetics, decoding the human genome, and creating groundbreaking new medicines.
Today, making biology programmable and exploring the possible sequences, structures, and functions of molecules signals the start of a similar revolution. This will lead to numerous medical advances and significant scientific progress.
That's it for now, I'm all out of coffee! Talk soon ?
Co-Founder & COO at Impala Hub
8 个月Very interesting!
Committed to People happiness and our Planet regeneration ?? | Nature & People lover | Data Scientist | Entrepreneur | GloCal Citizen | Get in touch now if you aim to build a better world for all
8 个月Thanks for the detailed analysis of this amazing news! You rock!