PIKASO & Artificial Intelligence: Pushing the boundaries of traditional drug discovery
Jordan Berg, Peter McCaffrey, Riccardo Guidi, Jesus Barajas, and Orly Levitan?
At Pragma Bio, artificial intelligence (AI) is central to our PIKASO discovery engine. The intelligent infusion of AI into our Find-Make-Test engine transforms small molecule discovery by streamlining and enhancing the process of identifying promising drug candidates. By integrating AI at each step, PIKASO is able to provide unique drug candidates (DCs) in a faster, more efficient, and more targeted manner in what is otherwise a serendipitous, inefficient, risk-laden, and lengthy process. Furthermore, we developed PIKASO Co-pilot, a large language model that is constantly updated with new Pragma data, and places our data in the context of historical internal data and external knowledge. In this article, we will highlight how AI enables PIKASO to identify and create transformative medicines.
1. Find - A comprehensive small molecule blueprint training database
At Pragma Bio, we start our discovery process with computational de novo discovery of clinically- and phenotypically-associated Biosynthetic Gene Clusters (BGCs), which encode the genetic instructions for producing a wide array of secondary metabolites. By focusing on these secondary metabolites, our approach delivers novel insights into drug discovery by focusing on precision in understanding the specific molecules produced by microbial strains that drive therapeutic outcomes. This approach concomitantly allows us to optimize microbial-based therapies and supplements by identifying and leveraging the key chemical and thus causative outputs. The majority of microbiome-derived chemicals are presently uncharacterized, with limited understanding of how these small molecules interact with their host and environment, and how they influence health and disease, providing fertile opportunity for the development of new therapeutics.
Key to any quality artificial intelligence system is a high-quality training set. The microbiome is – at its core – a database. This database captures a rich evolutionary history of how microbes and the chemicals they produce and metabolize have been optimized to interact with their hosts. The patterns that emerge from this database reflect linkages between genomics, disease risk factors, environmental exposures, microbial ecology, and biochemistry. Through the curation of hundreds of terabytes of high-quality metagenomics, metatranscriptomics, and metaproteomics data, we have built the world’s most comprehensive health-associated BGC database. This unique database currently represents over 12,000 genetically and biochemically distinct BGCs, over 500,000 microbial BGC genes, and tens of thousands of human and animal samples across multiple health areas. From this foundational database, we can better identify BGCs that can produce molecules with the best potential to be bioactive in different patients and populations.?
2. Find - Enhanced BGC discovery and annotation for disease-relevant insights
Leveraging the immense scale of the Pragma BGC database, we developed intuitive and innovative methods for traversing this database to discover the most relevant health-related signatures and identify the BGCs most likely to create products that can improve health. Our proprietary BGC annotation pipeline (patent pending) combines the best of traditional sequence homology with deep learning protein structure prediction embeddings to comprehensively annotate all BGC features, resulting in our ability to confidently annotated 90% of BGC genes functionally, compared to 30% with the leading open source tools.?
Once these annotations are incorporated into PIKASO, our multi-modal data - encompassing experimental and computational data, literature, DNA and protein sequence embeddings, and biochemical pathways—are integrated into a complex, highly interconnected graph database. This allows PIKASO Co-pilot to surface connections to our BGCs that would otherwise be difficult to extract by capturing both explicit and implicit associations between diseases, microbes, molecular biology, and scientific concepts, linking recent Pragma experimental data with historical patterns to reveal unexpected relationships.
领英推荐
3. Make - Automated synthetic biology ranking & optimization
Subsequent to creating the Pragma BGC-based database, our expert synthetic biologists and computational team developed a proprietary AI-based scoring system for ranking BGCs and downstream screening using an optimal microbial chassis. This system is based on the initial manual evaluation of BGC sequences done at Pragma Bio. These evaluations include a scoring system that ranks BGCs based on A) completeness, B) metabolite novelty, C) in-lab synthesis feasibility and D) health outcomes associated with the BGC. Using the comprehensive know-how generated by our team, we have created a self-reinforcing AI that is able to improve the assignment of BGCs by as much as 85%, is continuously improving, and automates future BGC annotations and predictions, turning what was at first a days long process for a set of new BGCs into a hours-to-minutes long exercise.
4. Make - Analytical chemical characterization of BGC-derived small molecules
With PIKASO, Pragma Bio reinvents natural products discovery using multi-faceted technological advancements, from BGC evaluation to synthetic biology and small molecule characterization, each infused thoughtfully with AI. Due to the rich chemical diversity found in nature, the chemical structures of the BGC products are often unknown. Even if known, it is likely that an analytical chemical standard does not exist for direct structure comparison, further complicating mass spectrometry validation. Characterizing and identifying new BGC-derived products is a challenge and often the question would be - How do we know we've made a new chemical product??
To mitigate this challenge, our analytical chemistry platform uniquely combines high-resolution mass spectrometry, traditional statistical analysis, and artificial intelligence methods for coarse-grained and fine-tuned spectral matching across disparate datasets to provide the best possible characterization of novel BGC products. We apply a wide filter to highlight potentially interesting molecules that are produced by our BGC-containing samples and not in our background control samples. We combine these outputs with our deep expertise in biochemical pathways to construct the most plausible synthesis routes of our BGC products. This comprehensive approach allows us to generate hypotheses about the chemical moieties present and plausible in our unknown metabolite(s). For more in-depth analysis, we then leverage deep learning models to compare our experimental spectra against vast databases of known compounds, and enable transfer learning between compound databases or mass spectrometry platforms.
5. Test - Preclinical model selection and drug modality prediction
Testing small molecules in mammalian models is expensive and time-consuming. Choosing the wrong disease model for your small molecule can be even more devastating and further contributes to the “great pharma wasteland”. Using our proprietary database, we are able to determine the most likely diseases where our BGC is relevant. Based on our initial BGC annotations, we analyze the BGC product class to identify the appropriate experimental screens that are most likely to contain the BGC product’s target.
Once plausible targets have been identified from experimental screening, we need to understand how our small molecule acts, and importantly, whether or not it provides a new drug modality in cases where a drug for the host target already exists. Using our ensemble docking pipeline, we combine the best of generalizable ML-based blind docking with traditional targeted docking methods to assess docking plausibility at known receptor and allosteric sites in addition to predicting all other possible interaction surfaces. To give us the best chance of defining this space and to aid in designing more bioactive forms of our small molecule, we generate 100s of rational chemical derivatives to dock in parallel with our known BGC product. Once this comprehensive search space is defined, we perform targeted docking simulations to confirm that a small molecule pose is physically possible and identify key residues regulating the interaction. We then computationally test specific residue-level models using methods such as molecular dynamics simulations and can use these as a filter to determine how to allocate time and resources for further experimental validation.
6. Tying it all together - Automation, continuous integration, retrieval, and learning through PIKASO Co-pilot
Pragma aims to tighten and close the small molecule discovery loop by placing model predictions and experimental results in direct dialog with one another through PIKASO Co-pilot. Co-pilot’s ability to retrieve, integrate, and infer across various data types—including genomic sequences, chemical compound structures, and clinical trial results—ensures that researchers have a comprehensive view of a BGC, its small molecule product, and its biologically- and clinically-relevant contexts. These capabilities allow Pragma’s interdisciplinary teams to stay ahead of emerging trends, reducing the time it takes to move from discovery to actionable insights and chemistry, and ultimately improving the speed and efficiency of drug development. Search and retrieval, as implemented through AI, remains a core pillar of Pragma Bio’s innovative approach to modern drug discovery. These tools allow more efficient and standardized design-build-test cycles, which enable the usage of frameworks common in AI model optimization to similarly optimize wet lab experimental designs and outcomes (see Pragma’s case study).
Conclusions
Pragma’s PIKASO is an AI-enabled platform designed to revolutionize small molecule discovery. By combining the strengths of both wet lab and dry lab approaches, we strategically use AI where it makes the most impact, allowing us to maximize efficiency in terms of time, resources, and costs. This approach enables us to extract more value from every experiment, identifying top drug candidates with fewer resources, while ensuring that our methodologies are always precisely tailored to specific biological challenges. Our mission remains focused on leveraging our rich microbiome-derived small molecule database to translate microbiology into actionable chemistry, unlock new therapeutic avenues, and drive innovation in drug discovery. PIKASO’s intelligent integration of AI ensures that we stay at the forefront of this evolving field, pushing the boundaries of what’s possible in modern medicine.
If you would like to know more about how PIKASO can accelerate your own small molecule discovery efforts, please reach out to us on LinkedIn or through our website - https://www.pragmabio.com/.?