Arc Institute and NVIDIA Launch Evo 2: An Open-Source Foundation Model for Genomic Research

Arc Institute and NVIDIA Launch Evo 2: An Open-Source Foundation Model for Genomic Research

Arc Institute, in collaboration with NVIDIA, has unveiled Evo 2, a biological foundation model designed to advance AI-driven genomic research. Trained on 9.3 trillion nucleotides from over 128,000 genomes spanning diverse species—including humans, plants, bacteria, and eukaryotes—this model represents a significant expansion in scope compared to its predecessor, Evo 1, which focused solely on single-cell organisms.

Technical Advancements

Evo 2 employs the StripedHyena 2 architecture, optimized for genomic sequence analysis through Fourier and convolution kernels. This architecture enabled training speeds nearly three times faster than conventional transformer models while processing sequences of up to 1 million nucleotides in a single pass. Such capability allows the model to analyze relationships across vast genomic distances, from individual molecules like tRNA to entire bacterial genomes or eukaryotic chromosomes.

With 40 billion parameters, Evo 2 matches the scale of modern large language models, though its design prioritizes biological relevance. Key benchmarks include over 90% accuracy in distinguishing benign from pathogenic mutations in genes like BRCA1, a critical advancement for clinical genomics.

Applications and Accessibility

Evo 2 demonstrates potential in multiple domains:

- Mutation Impact Prediction: High-precision classification of genetic variants linked to diseases.

- Genome Design: Generation of synthetic DNA sequences at the scale of yeast chromosomes, enabling novel synthetic biology applications.

- Epigenomic Simulation: Accurate modeling of chromatin accessibility profiles to study gene regulation mechanisms.

To promote collaborative innovation, the model is fully open-source, with code, weights, and training data accessible via NVIDIA’s BioNeMo platform and the Evo Designer interface. This initiative aims to establish a standardized framework for AI-driven genomic research, reducing barriers to entry for academic and industry researchers.

Broader Implications

By integrating Evo 2 into existing workflows, researchers can accelerate tasks such as variant interpretation, functional genomics, and synthetic biology. Its release underscores the growing role of AI in decoding biological complexity while emphasizing the importance of open-source tools in democratizing scientific progress.



要查看或添加评论,请登录

Rodrigo Macias MD, MBA的更多文章