Introduction to BioTuring Cell Search

Should we build a machine learning model for cell type recognition? 

When analyzing single-cell transcriptome data, scientists often perform cell types annotations by checking individual marker genes. However, marker genes are not even consistent among different literature sources. One natural solution would be using the massive published data set to build a machine learning model for predicting cell types. 

We argue that this is not a suitable solution because of the following reasons


  1. Annotations in published studies are not consistent. Given the same cell, different groups with different research goals can choose a more general cell type label or a very specific subtype. 
  2. Besides cell types, researchers may be interested in other labelings. For instance, would it be more interesting to see that this group of microglia cells only appears in Parkinson patients rather than normal? This requires to build a machine learning model for EVERY label, not only for “cell type”. However, as each study has its specific annotations, it’s difficult to create a machine learning model using the combinations of multiple annotation types. 
  3. There are some rare cell types that will be ignored by using common machine learning models


We think “searching” is indeed a more suitable solution. We imagine, when a scientist selects a group of cells, a cell search engine can help find all cells in all published studies that have “similar” expression signatures. Scientists can download the matched cells, and see all other labelings of these cells. These may include cell types, age, tumor/normal conditions. Importantly, this cell search engine should bypass technical variations (cells with different biological conditions but were sequenced under similar sequencing technologies) to return only the cells that match biological conditions. 


We will soon release a cell search function on our largest collection of published single-cell data

要查看或添加评论,请登录

社区洞察

其他会员也浏览了