New Generative AI Tool Predicts Gene Expression In A Single Cell

New Generative AI Tool Predicts Gene Expression In A Single Cell

This week an important paper was published about a new generative AI model to predict gene expression in a single cell. The tool, called scGPT, was developed by Bo Wang and his colleagues Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan at the University of Toronto. The model, which was developed through the analysis of over 33 million human cells, has been validated by numerous benchmark studies as a leading foundation model in single-cell analysis. Since its preprint was published in May 2023, scGPT has made a significant impact evidenced by over 13,000 installations, over 600 GitHub stars, and over 40 citations! The paper, entitled "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" was published in Nature Methods. (Link to the 20 page paper rdcu.be/dzDGd)

d) Diagram illustrating the size of training data and organs of origin e) UMAP visualization of the pretrained scGPT cell embeddings colored by major cell types

The paper demonstrates that scGPT effectively captures meaningful biological insights into genes and cells. It can identify specific cell types, predict the effects of disrupting genes, and determine which genes are interacting with each other. The tool can be fine tuned to achieve state-of-the-art performance across a variety of downstream tasks, including multi-batch integration, multi-omic?integration, cell-type annotation, genetic perturbation prediction, and gene network inference. scGPT has been validated by numerous benchmark studies as a leading foundation model in single-cell analysis. Its pre-trained embeddings extend its utility beyond single-cell studies, enhancing several tasks including protein enrichment and genetic perturbation predictions.

UMAP of 3 million cancer cells using the cell embeddings from the pre-trained pan- cancer model. From left to right, the colors indicate the cancer types, tissue types, and cell types.

scGPT Highlights

  • Expanded zero-shot applications for efficient reference mapping and integration, now with CellXGene census integration.
  • Advanced perturbation analysis capabilities, including genome-scale perturb-seq data analysis and bulk sequencing data generalization.
  • Upgraded scGPT package, offering versatile model loading compatible with PyTorch and flash-attn, for both GPU and CPU.
  • Cloud-based scGPT applications for reference mapping, cell annotation, and gene regulatory network inference are available with Hugging Face for easier model training.
  • scGPT is an early foray into foundation models for single-cell omics, facing challenges like limited zero-shot learning in some tasks, pretraining constraints, data quality issues, and evaluation limitations.

Seven UMAP plots of cells from individual organs using the cell emb. corresponding to organ-specific models

Short-Term Goals

  • Releasing a mouse model for broader analysis.
  • Developing a comprehensive evaluation suite for foundation models in single-cell analysis.
  • Creating a foundation model for single-cell spatial omics.
  • Enhancing zero-shot capacity by integrating scGPT with RAG (e.g., knowledge graphs)

Long-Term Goals

  • Expanding scGPT for comprehensive single-cell multi-omics analysis.
  • Developing an in-silico perturbation model for predicting genetic perturbation effects.
  • Merging scGPT with multi-modal genomic sequence models for a deeper understanding of cell biology.

j)?Evaluation of the cell-annotation performance of scGPT through n?=?5 random?train–validation splits on the myeloid, MS and human pancreas datasets.?Performance metrics from the test sets are presented as mean values?±?s.e.m.

Code Availability

All codes, data, and weights are open source. The scGPT codebase is publicly available at https://github.com/bowang-lab/scGPT.

Bo Wang, PhD

Authors

scGPT, was developed by Bo Wang, and his colleagues Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan at the University of Toronto. Dr. Wang is the Chief AI Scientist at the University Health Network, the largest research hospital in Canada. Dr. Wang is also tenure-track Assistant Professor in the Departments of Computer Science and Laboratory Medicine and Pathobiology at the University of Toronto. He is the inaugural Temerty Professor in AI Research and Education in Medicine.He also holds a CIFAR AI Chair at Vector Institute. Dr. Wang obtained his PhD from the Department of Computer Science at Stanford University in 2017. Dr. Wang’s research focuses on machine learning, computational biology, and computer vision, with a particular emphasis on their applications in biomedicine. His significant contributions to these fields have led to his recognition through numerous esteemed awards, including the Gairdner Early Career Researcher Award and the Canada Research Chair Award.

Subscribe, Comment, Join Group

I'm interested in your feedback - please leave your comments.

To subscribe to the AI in Healthcare Milestones newsletter click here.

To join the AI in Healthcare Milestones Group click here.

Copyright ? 2024 Margaretta Colangelo. All Rights Reserved.

This article was written by Margaretta Colangelo. Margaretta is a leading AI analyst who tracks significant milestones in AI in healthcare. She consults with AI healthcare companies and writes about some of the companies she consults with. Margaretta serves on the advisory board of the AI Precision Health Institute at the University of Hawai?i?Cancer Center @realmargaretta

Thanks for sharing Margaretta. At 57 pages, not a quick read, but I'm guessing it may have a major impact in bioinformatics and other areas. Looking forward to follow-up research based on this innovation

Cristobal Thompson

Coach y Mentor Ejecutivo de Lideres

1 年

Thanks for sharing Margaretta !!!

Andrew Gumbiner

Technical Support Scientist at Shimadzu Scientific Instruments

1 年

Does this tool use a predictive model or generative model (Also, what model)? Honestly, I'm not sure which would be more exciting!

Anthony Alcaraz

Senior AI/ML Strategist Startups & VC @AWS - Writing on AI/ML, analysis are my own ??

1 年

Congratz

Great work! Pleased to see model and whole data in open source.

要查看或添加评论,请登录

Margaretta Colangelo的更多文章

社区洞察

其他会员也浏览了