LxMs in Biological Intelligence

LxMs in Biological Intelligence

by Vic Singh and Archit Gadhok

Introduction

In our previous post, we wrote about the rise of LxMs and ‘foundation models’. We discussed how popular foundation models, based on transformer and diffusion architectures are being deployed for content generation. These models are like ‘super-brains’ that can understand and create language almost like we do. Through conversations with multiple experts (investors, operators, researchers and so on) and our own assessment of this space, we had identified a framework: the Verticals (x in LxMs) X Capabilities framework and had identified three interesting verticals. These include: Scientific intelligence, Industrial intelligence and Gaming.?


Fig: Verticals (x in LxMs) X Capabilities framework


This post delves deep into one core area of Scientific intelligence - Biology: we explore new foundation models, that we dub LxMs, that are increasingly deployed to help scientists in studying living things, like plants, animals, and humans. The impact they’re beginning to have on life sciences is both surprising and promising.

As scientists are busy unraveling the mysteries of life - from mapping the genome, to decoding DNA, to even understanding how diseases spread, foundation models are helping them in this complex work. Given the sheer scale and intricacy of biological systems, foundation models are exceptionally good at analyzing large datasets, making it easier and faster for researchers to draw meaningful conclusions.

In this post we will start by painting a current lay of the land, explore recent breakthroughs, introduce you to some innovative companies leading the charge and finally end with some of our perspectives on lessons for investors in this space.?

The implications of these advancements extend far beyond academic interest. They hold the potential to revolutionize healthcare, agriculture, and environmental science, among other areas. So, whether you’re a science enthusiast or just curious about the future of technology, this exploration will shed light on how computational power is unlocking new possibilities in the study of life.


“Biology is becoming engineering and not just science”???

The pace of development in BioAI has been extremely rapid to say the least, with new models releasing every other week.? There are multiple players throwing their hat in the ring - established tech giants such as Alphabet (with their flagship Alphafold-2 by DeepMind) and Nvidia (BioNemo), research labs (Evo by the Arc Institute) and the numerous startups that have been raising funds as frequently as new models get released.?

During a recent interview, Nvidia CEO Jensen Huang noted how Nvidia’s AI processing chips could transform the science of life. He famously quipped how ‘Biology is becoming engineering and not just science’. Demonstrating his commitment and vision for the space, Nvidia, over the last one year, has launched a set of biology-focused generative-AI services - known as the BioNemo Cloud - which is a set of pre-trained foundation models that focus on the creation of new proteins and therapeutics. BioNemo not only provides pre-trained models but also allows customization of models with proprietary datasets that help automate various stages of the drug-discovery pipeline.

Another exciting new model released recently is Evo, developed by the Arc Institute, which represents a significant advancement in the field of biological research through the use of AI.?

Evo stands out for its ability to handle complex biological data across multiple modalities (DNA, RNA, and proteins). It can perform zero-shot predictions—meaning it can make accurate predictions without having been directly trained on those specific tasks. This is path-breaking because it demonstrates that a general model like Evo could outperform specific models at specialized functions such as protein design. Evo also has a very long context window (131K tokens) - which allows the model to recognize patterns and make predictions about much larger DNA or genomic sequences than was previously possible.?

But before we get lost in these amazing developments, let’s take a step back and survey the different use cases that these foundation models are targeting. Broadly speaking, these foundation models target 3 different use cases:?

Proteomics and Genomics

  • Proteomics involves the study of proteins produced by an organism or system, and how these proteins interact. Models such as AlphaFold and AlphaFold2 have revolutionized this field by predicting protein structures from amino acid sequences with high accuracy, which was considered a challenging task due to the complex nature of protein folding. Nvidia’s BioNemo services specializes in providing pre-trained models for protein development, which can be further fine-tuned using specific proprietary datasets. This breakthrough is significant for drug design and understanding disease mechanisms.
  • Genomics, on the other hand, deals with the function, structure, evolution, and mapping of genomes. Here, foundation models help in sequencing genomes more efficiently and analyzing vast amounts of genetic data. This enables researchers to identify genetic mutations associated with diseases and traits, and also assists in the development of personalized medicine where treatments can be tailored based on an individual’s genetic profile. Toronto based Deep Genomics and Evo are developing advanced foundation models that can be applied to a range of different therapeutic RNA tasks.

Drug discovery and development?

  • In the realm of drug discovery, AI models are used to predict the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of potential drug candidates. This application reduces the need for early-stage, high-throughput screening in wet labs, significantly cutting down both the time and cost of drug development. Moreover, AI-driven models can simulate how drugs interact with the body at the molecular level, improving the prediction of drug efficacy and safety. For e.g., Biostrand, a subsidiary of the Netherlands based ImmunoPrecise Antibodies, have developed a foundation model to analyze universal fingerprint patterns across the biological world. Their current knowledge graph maps 25 billion relationships across ~650 million data objects, offering a detailed insight into the relationship between genes, proteins, and other biological pathways. This is crucial for developing antibody drugs and precision medicine.

Disease prediction and diagnosis

  • AI models are extensively used in disease prediction and diagnosis by analyzing complex medical data such as images from MRIs, CT scans, or data from genomic sequences. These models help in identifying patterns that are indicative of specific diseases, which can sometimes be too subtle for human detection. For instance, machine learning models in oncology can predict cancer progression and patient outcomes based on tumor DNA. Moreover, foundation models are also being used in epidemiology to predict disease outbreaks and their spread.

Each of these fields demonstrates the power of foundation machine learning models to not only advance our understanding of complex biological systems but also to enhance our ability to intervene in them in medically and scientifically beneficial ways. These advancements suggest a future where medicine and biology are increasingly data-driven, personalized, and predictive.

Exciting startups/ companies in this space

Proteomics and Genomics - Phytoform

  • Phytoform Labs, also known as Phytoform, is a biotechnology company that was founded in 2017. Unlike other companies discussed here which focus on drug discovery, therapeutics and understanding human biology, Phytoform focuses on creating sustainable agricultural practices by developing new crop traits using cutting-edge AI and genome editing technologies.
  • The company's main goal is to minimize the negative environmental impact of agriculture and speed up the process of breeding more resilient and diverse crops. To achieve this, Phytoform has created their own LxM (a bio foundation model) called CRE.AI.TIVE which targets small changes in DNA sequences that will create maximum impact on crops. They had developed their foundation model as early as 2017, effectively making them the pioneers in this space.??
  • Phytoform has raised a total of $5.7 million in funding to support its research and development efforts aimed at revolutionizing agricultural biotechnology to better meet future global food demands.

Disclaimer: Vic led the seed round in this company while he was at Eniac Ventures

Drug Discovery - Recursion:

  • Recursion Pharmaceuticals is a pioneering TechBio company that stands out in the BioAI space for its innovative approach to drug discovery. Founded in 2013, the company was an early pioneer in the biotech world and has been particularly active in clinical trials and drug development. For instance, Recursion is involved in developing treatments for conditions like Clostridioides difficile colitis and HR-proficient ovarian cancer, which are in various phases of clinical trials.
  • Where Recursion differentiates itself on technology is the multiple strategic partnerships that it has established with not only data providers to get access to proprietary data but with tech firms like Nvidia to develop foundation models for drug discovery on Nvidia’s BioNemo platform.
  • Recursion Pharmaceuticals recently launched a new series of foundation models, starting with one named Phenom-Beta, available on the NVIDIA BioNeMo platform. This model utilizes cell imaging to study how cells react to various chemical and genetic changes, a field known as phenomics. Phenom-Beta is designed to convert large volumes of cellular images into data that can be easily analyzed, supporting the development of new drugs. This release marks a significant step in making Recursion's advanced research tools more accessible to the broader scientific community. This foundation model is a first in a series of foundation models - and Recursion will continue to develop models that further drug discovery.

Drug Discovery - Bioptimus

  • Bioptimus is a Paris-based generative AI startup that is planning to use the power of AI foundation models to capture the laws of biology. In this process, the startup plans to accelerate scientific breakthroughs and innovations in biomedical and environmental science.
  • While not much is written publicly about the exact use cases that Bioptimus is trying to solve, it’s Co-Founder and non-executive Chairman is Jean-Philippe Vert - who is the Chief R&D Officer at Owkin - the French biotech unicorn that operates in AI-based drug discovery and diagnostics.
  • Where Bioptimus claims to differentiate itself is it’s world class team and access to high-quality to multi-modal patient data that it has licensed through partnerships
  • Bioptimus has raised a $35M seed round led by Sofinnova Partners, with Hummingbird Ventures, Owkin, Top Harvest Capital and NJF Capital also participating in the round.

Our perspective and thesis?

As early stage technically oriented investors, we think about Biological Intelligence as its own version of the technical stack with a more nuanced understanding of how value accrues.

A simplified version of the Biological Intelligence tech stack can be abstracted as a typical foundation model stack. The bottom layer is the compute layer consisting of specialized hardware and high-performance servers, optimized for efficient processing of the vast amounts of multi-modal data. On top of that is the foundation model layer i.e. the LxM layer. Most companies developing in this space that we came across, including the ones mentioned above, are developing their custom bio-focused LxMs for the specific use case that they are targeting.

The interesting layer in this stack is the data generation layer, wherein companies can really differentiate themselves. Here, companies are focusing on generating the real world data that can be used to train the models - which really forms the secret sauce for these models. And on top of that is what can be considered analogous to the application layer - solving the different use cases we have discussed.

Fig: A simplified version of the BioAI stack?

Within this framework, the three archetypes of companies that we observed were:

  • Full stack companies: Companies that span the entire stack - they own their own model, data generation layer and product outputs (i.e. the applications).?
  • Data generation companies: Novel tooling to generate real world multi-modal data for training the foundation models can be very valuable companies on their own, serving the larger Bio LxM ecosystem. For e.g., Basecamp, as described above, is generating data for training their models
  • Applications: Startups exploiting the base layers with novel techniques to produce new products / solve new use cases

Point 1: Our framework suggests that while a full stack solution is the most complex to build and will take the most time to market, the potential for value creation is the highest for such a solution

As an investor in the Bio LxM space, we want to prioritize investing in full stack solutions and solutions for data generation, as these areas present the best competitive moats for Bio LxM startups. This is followed by solutions at the application layer. Building solutions for the three identified areas is complex and could entail a long time to market - but this where we see the highest value creation potential. Foundation models in themselves are likely to be commoditized.

Investing in a full stack solution will require patient capital from the VC - and the investor needs to decide whether they want to play in this field? based on their appetite for risk and fund strategy.

Value Accrual in the Bio LxM Stack


Point 2: Generating or getting access to real-world multimodal datasets will be the most critical competitive moat for startups developing in the BioAI space

  • Foundation models in biology, like those in other fields, depend heavily on the quality and diversity of the data they're trained on. In biology, data variability can encompass several dimensions - genetic, environmental, phenotypic - making the breadth and depth of training data crucial. Current foundation models in biology rely on public datasets, which were generally compiled before the current AI era. These datasets tend to lack size, diversity, quality, and often the contextual details necessary for creating robust AI models.?
  • In such a context, the biggest competitive moat for startups will come from their ability to generate or get access to real-world multi-modal data on which they can train their models. A startup's unique approach to generate training data is what defines its ability to generate value in the BioAI world - and what will excite investors too.?

  • For example, startups like Basecamp Research are investing heavily in creating the first foundational datasets designed expressly for BioAI. This involves collecting vast amounts of labeled, curated, metagenomic data from diverse regions worldwide - think rainforests, ocean trenches. Basecamp’s efforts include pioneering data collection and curation processes that enhance the contextual richness and ensure a high signal-to-noise ratio, which is critical for training effective BioAI models.

  • Other than creating their own data, startups in the BioAI space frequently enter into licensing agreements with data providers to access distinctive datasets that are not publicly available. These datasets are crucial for training accurate and reliable models. ?

Point 3: Specialization in LxMs? is necessary. We are far from ‘one foundation model to rule them all’ in biological sciences

  • Biological systems are incredibly complex, with intricate interactions that vary not just between species, but also between individuals of the same species. Each type of biological data—whether DNA, RNA, proteins, or other cellular components—carries its own unique set of information, challenges, and requirements. The complexity and diversity of this data, along with the multiple use cases that it can be applied to (drug discovery, agriculture) mean that a single foundation model (LxM) proficient in all areas is currently beyond our reach.
  • While models like the one from the Evo Institute have made significant advances, they are still specialized to specific tasks or types of data. For instance, DNA sequencing models might not be adept at protein structure prediction, which is a wholly different domain with its own specialized models like AlphaFold. This specialization is necessary due to the distinct nature of the data and the different computational challenges each type presents.

Point 4: This will not be a ‘winner takes all’ market - many category-defining companies can emerge that focus on vastly different use cases

  • Given the variety, complexity and nascence of use cases in this space, there is potential for many category defining companies to emerge, each focused on a different niche of the use cases described above. While it is relatively early to say, we doubt that this will be a winner takes all market.

Point 5: Founding teams need to demonstrate superior technical chops, while also demonstrating capabilities to commercialize the startup

  • While true for all kinds of tech startups, to establish a successful startup in the rapidly evolving field of BioAI, the founding teams must exhibit a robust blend of technical proficiency and commercial acumen. Technical expertise ensures that the foundation technologies — whether they pertain to machine learning, genomics, or other biological disciplines—are not only innovative but also scientifically valid and capable of pushing the boundaries of current knowledge. This technical skill set is crucial for developing products that are both effective and revolutionary.
  • Equally important is the team's ability to navigate the commercial landscape. This includes skills in business development, understanding market needs, securing funding, and navigating the regulatory pathways that are critical in fields such as biotechnology and pharmaceuticals. A founding team that combines these strengths can more effectively translate scientific advancements into marketable products, secure partnerships, and drive adoption of their technology across the industry.

Point 6: Look for Business Models that bridge the gap between research and commercialization

  • VCs looking to invest in LxMs focused on biological intelligence need a long term mindset. Unlike pure biotech startups focused on finding a miracle cure, LxM focused startups can get creative with their business models - essentially creating a platform and not relying on that one big-banger hit like most pharmaceutical companies.??

Acknowledgements

We would like deeply thank the multiple people we had the opportunity to speak during the creation of this work - Dylan Reid (Zetta Venture Partners), Nikolas Kral (Phytoform Labs), Nan Li (Dimension), Harsh Patel (Wireframe), Alex Iskold (2048), Wayne Hu (Signalfire). Conversations with these experts greatly influenced our perspectives on the space.

Jasmine Hoffman

Student, Imperial College - School of Public Health

8 个月

Just spotted this. Our paths are merging- I'm back to uni is Sept for my Msc Health Data Analytics and Machine Learning- we should chat! X

Toufic Boubez

Co-Founder & CTO at Catio | Former CTO at MacroHealth, Metafor, Layer 7, Saffron | Ex-VP Engineering at Splunk | Serial Entrepreneur & Innovator | AI, ML, Cloud Architecture Specialist | Author & Keynote Speaker

8 个月

Great article Vic! LxMs FTW! ?? ??

Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

9 个月

LxMs revolutionizing science? Fascinating dive - thanks for sharing. Complex topics made accessible. Insightful perspectives from experts elevate the dialogue. Vic Singh

Virgilia Kaur Pruthi

Product Executive - Marketplace & SaaS

9 个月

要查看或添加评论,请登录

Vic Singh的更多文章

社区洞察

其他会员也浏览了