insitro at ASHG
insitro leverages both machine learning and statistical genetics at scale to power drug discovery. This week, at the American Society of Human Genetics (ASHG) Annual Meeting, insitro is presenting 5 posters, including one preprint now available online on BioRxiv.?
Genetic support in target discovery provides a powerful ground truth by increasing the probability of success. In addition, rare genetic variants can have a strong role in disease risk, which helps direct target discovery research, but large sample sizes are required to identify these variants. So machine learning helps to maximize the opportunity that genetics represents.
Machine learning provides several important advantages: it helps scientists uncover the relationship between complex datasets at scale, closes data gaps by increasing the sample size to power the analysis, and helps to predict the phenotypes that will be relevant to disease research.
A phenotype represents an observable trait or characteristic that is the result of the complex interaction between genotype and the environment. For example, liver fat (with both genetic and environmental causality), is a phenotype that can be derived from images. When trained appropriately, machine learning algorithms can identify complex or hidden patterns that improve our understanding of liver fat, and in turn help identify new phenotype-genotype associations.?
Our 5 posters span a range of themes from machine learning-derived and biological insights related to ALS and MASLD, to methods that leverage machine learning-derived traits, to transcriptome-wide association studies. A quick scan here and we hope to see you at ASHG in Denver!
Mukherjee et al., “Genes, Machines, and Missteps: The Unseen Risks of Machine Learning Derived Phenotypes”
Machine learning-derived predictions are a powerful tool for enhancing genome-wide association studies (GWAS), however, certain biases and challenges must be accounted for to make these predictions useful for drug discovery. This poster demonstrates – by using both simulated and real-world datasets – that both upstream and downstream biomarkers can be leveraged to detect valid associations.?
Somineni et al., “Mechanistic Insights from Liver Fat GWAS Loci”
Previously presented GWAS leveraging machine learning identified 321 novel liver fat loci that may be involved in the biomechanics of metabolic dysfunction-associated steatotic liver disease (MASLD). Here, we used a statistical framework, called truncated singular value decomposition, to classify these 321 liver fat loci across 37 relevant phenotypes. This approach identified patterns that may underlie dysregulated lipid metabolism, providing potential targets for drug discovery.
领英推荐
Amar et al., “Leveraging Mendelian Randomization for Target Discovery from Predicted Phenotypes”
Following Somenini et al, here we use causal inference techniques to triage GWAS hits and reveal the biological pathways by which they likely affect liver fat accumulation.
McCaw et al., “Unveiling the Power of Allelic Series: Enhancements and Applications of COAST”
This builds on last year’s published rare-variant association test, Coding-Variant Allelic-Series Test (COAST). We now leverage GWAS summary statistics which helps increase the sample size available to perform analyses. We demonstrate the use of COAST-SS to identify allelic series for blood lipid levels among 350K UK biobank samples. The software is publicly available and the work is described here.
Zhou et al., “A Transcriptome-wide Association Study of Amyotrophic Lateral Sclerosis”
We integrated multi-omics data with GWAS to probe for association of gene expression with known ALS traits and identified six genes to be associated with ALS as potential drug discovery candidates.