By John Castle, Ph.D., Chief Data and Information Officer, Monte Rosa Therapeutics
Targeted protein degradation (TPD) is an innovative approach in medicine that focuses on eliminating proteins that contribute to disease by degrading them. Monte Rosa Therapeutics specializes in a specific type of targeted degraders called molecular glue degraders (MGDs). We’re pioneering the use of artificial intelligence (AI) and generating data at scale to discover MGDs that target proteins never thought to be druggable, and thus allowing us to aim at cancers, autoimmune diseases, and other conditions.?
Targeted protein degraders bring together a target protein of interest (POI) and an E3 ligase to tag the POI for degradation. An MGD redirects this cellular process in a
specific way: it binds the E3 ligase, forming an MGD-E3 ligase complex which remodels the ligase surface, creating a new surface (neosurface) that enables the ligase and POI to fit together like two puzzle pieces. Within the treated cells, this MGD-E3 ligase complex acts over and over again, repeating the tagging process to eliminate multiple copies of the disease-causing protein.
The Role of AI and Computational Chemistry
While MGDs can selectively eliminate proteins that are undruggable by other approaches, discovering MGDs has historically been complex due to the immense challenge of predicting how a small molecule can induce an E3 ligase and target protein to interact. We’ve overcome this challenge by generating data at scale in our labs and threading it through our breakthrough AI engines, and thus modeling molecular interactions in high throughput mode, discovering promising compounds, and rationally engineering them into effective drug candidates.??
Introducing the QuEEN? Discovery Engine
We’ve built the QuEEN? Discovery Engine at the interface of multiple technologies and disciplines – chemistry, biology, software, machine learning (ML) and automation – to rewrite the discovery of small molecule MGDs. QuEEN’s highly optimized proteomics, structural biology, next-generation sequencing (NGS), and screening platforms generate data at scale that our algorithms use to methodically identify and engineer drug-like MGDs that degrade disease-causing proteins.?
Our Discovery Process Explained
Leveraging our data, we apply our suite of proprietary algorithms across six distinct steps to accelerate the process of MGD discovery and continually improve the accuracy of our predictions.
- Building a hot-list of target opportunities: Mining our growing wealth of data, from proteomics to protein structures, we’ve built the first AI algorithms that specifically identify MGD target opportunities.? The algorithm, which we’ve named fAIceit?, employs ultra-fast, three-dimensional deep learning to sift through thousands of proteins to determine those with structural features – called degrons – that allow an E3 ligase to bind to and degrade them. At the same time, we’ve built highly optimized ML pipelines to ingest and process genetic and phenotypical disease associations for identifying critical disease-causing proteins. We intersect these lists of degradable proteins and disease-causing proteins to create a prioritized list of target opportunities.?
- Analyzing E3 ligases: Each of the more than 600 human E3 ligases has a unique surface. Leveraging fAIceit, we’ve analyzed the surface of each ligase, integrating features including small molecule binding pockets and protein-protein interaction (PPI) hotspots, to predict how reprogrammable the E3 ligase is with an MGD. Then we train fAIceit to identify the best “jigsaw puzzle” matches between target opportunities and E3 ligases.
- Identifying starting points: Next, we identify small molecule fragments that bind to the E3 ligase and serve as a foundation for developing MGDs. We’ve developed a totally new algorithm, Headlong?, that integrates protein surfaces, docking, conformational strain and structural modeling to virtually screen nearly a billion fragments. We’ve validated its ability to identify binders – even nanomolar (nM) binders – as starting points for our algorithms to build virtual libraries of novel, full-size small molecules (called FLASH? libraries).?
- Keeping it real: Too often, computational small molecule design leads into chemistry neverland? – predicting molecules that have poor ADMET properties or are difficult for medicinal chemists to synthesize in the lab. By combining external data with our rapidly growing internal MGD datasets, we’ve created ML models (called GlueAID?) that accurately predict the molecules’ ADMET properties and identify which can be made in the lab. Applying these predictions from library design to lead optimization, we keep our MGDs drug-like and can even guide their ability to access specific tissues, such as the central nervous system (CNS).
- Perfecting the fit: Our AI-powered Rhapsody? algorithm generates models of the ternary complex formed by the MGD-E3 ligase and a target protein. These models give us the first insights into how the MGD creates the neosurface that the target protein can bind. This has been critical, as we’ve discovered over ten novel PPI binding modes for the E3 ligase cereblon that are induced by our diverse MGDs. As we proceed from hit expansion to lead identification and optimization, the ternary complex models and then experimentally determined X-ray or cryoEM structures? power increasingly informative virtual screening and rapid MGD optimization.?
- Active learning: The tight teamwork between our computational chemists with our screening team, structural biologists, and medicinal chemists not only turbocharges project-centric work, but also creates a wealth of discovery that feeds back into our knowledge base, actively improving our predictive power for new targets and MGD discovery. With over 100 E3 ligase structures in our cryoEM database, we have critical knowledge of ligase surfaces for training fAIceit and Headlong to expand our target space and identify MGDs for any E3 ligase. Our large proteomics and screening databases allow us to computationally connect MGD features to protein surfaces and predict on- and off-target activity for MGDs. We continually retrain GlueAID on our growing repository of ADMET data. Together, we’ve increased the cadence of our work such that we can go from screening hits to validated series of degraders in less than two months.??
A Highly Optimized, Continually Improving Drug Discovery Engine
At Monte Rosa, we’ve always focused on pushing the boundaries of AI with our data streams. By using our powerful algorithms to rewrite MGD discovery we’re identifying new drug candidates that eliminate disease-causing proteins and – we hope – will make a difference in patients' lives.?