The Future of Genes is Algorithmic: 5 Real-Case Examples in Machine Learning for Genomics to Spark Your Curiosity

The Future of Genes is Algorithmic: 5 Real-Case Examples in Machine Learning for Genomics to Spark Your Curiosity

Introduction: Where Genomics Meets Machine Learning

Human genome holds immense potential for understanding health and disease. But this information is complex and ever-growing. Traditional methods for analyzing such data are often slow, limited, and struggle with the vast amount of information our genes contain. This is where machine learning (ML) steps in as a game-changer.?

In this blog post, we'll explore five exciting real-world applications of machine learning in genomics. We'll see how ML is helping us understand gene regulation, visualize the 3D organization of DNA, identify genetic variations, reconstruct evolutionary history, and even improve the precision of genome editing tools like CRISPR.


1) Beyond the Letters – Understanding How Genes Turn On and Off

Genes have switches that control whether they're "on" (working) or "off" (not working). ML can analyze chemical changes around genes to predict which ones are likely to be turned on, helping us understand how cells work.

In 2016, a team of researchers developed TargetFinder – a machine learning model that helps us understand how genes turn on and off by predicting enhancer-promoter interactions. Enhancer-promoter interactions are like long-distance calls between DNA segments, where enhancers activate specific genes by physically looping close to their promoters.?

TargetFinder uses genomic features, such as DNA methylation, histone marks, and protein binding, to determine if a specific enhancer is likely to interact with a specific promoter. The algorithm learns from known interactions and then uses this knowledge to make predictions for new enhancer-promoter pairs.?

2) Seeing the Big Picture – Mapping the 3D Structure of Genes

Our genes aren't flat like a sheet of paper – they twist and fold in 3D shapes. This folding is important for genes to work together. ML can help us see these 3D shapes by analyzing special data that shows how genes interact.

For example, in 2021 a research group developed a method called “pTADbit” that uses machine learning for 3D genome structure reconstruction. They leverage a technique called “Hi-C ” which captures how often different parts of the genome interact with each other spatially. This interaction frequency provides clues about the 3D proximity of these regions.

The researchers use a specific type of machine learning called a deep learning neural network. They train this network on thousands of microscopy-based distance measurements between different genomic loci (positions). Essentially, the network learns how Hi-C interaction frequencies translate to actual 3D distances between these loci.

Once trained, pTADbit can predict the probability distribution of spatial distances between any two regions in the genome solely based on their Hi-C interaction data. This allows for the generation of more accurate models of the 3D genome structure compared to traditional methods.

3) Spotting Missing/Additional Pieces – Copy Number Variation Detection

We inherit two copies of each gene – one from each parent. But sometimes, there might be an extra copy of a gene or one might be missing entirely. Copy Number Variation (CNVs) refer to variations in the number of copies a particular gene has in an individual's genome. These variations can affect our health. ML can analyze genomic data to identify these missing or extra pieces.

Traditional methods for CNV detection rely on analyzing features like read depth (how often a DNA segment is sequenced). Such methods struggle with uneven sequencing coverage (some areas sequenced more than others) and pooled data (combining DNA from multiple samples).

In 2019, researchers developed a deep learning model named ‘duplication and deletion Classifier using Machine Learning (dudeML) trained on simulated and real sequencing data. This model excelled at identifying CNVs compared to traditional methods. Notably, it maintained high accuracy even with factors that can challenge traditional approaches.

4) Mapping The Evolutionary History – Phylogenetic Analysis

Phylogenetic analysis involves comparing the similarities and differences in their genetic material, such as DNA or protein sequences, to construct a phylogenetic tree—a diagram that shows the evolutionary history and relatedness of these organisms

A 2023 study published demonstrates how ML can classify evolutionary traits. Researchers trained ML models on the "codon usage patterns" of nearly 13,000 organisms. Codon usage refers to how frequently specific triplets of nucleotides (codons) appear in an organism's DNA. The models successfully predicted the origin of certain genes (mitochondrial or nuclear) and even the taxonomic identity of organisms based on their codon usage patterns.

5) Rewriting the Script – Genome Editing

Genome editing is a technique that allows scientists to modify an organism's DNA. One of the most widely used genome editing tools is CRISPR . It relies on guide RNAs to target the desired location. However, CRISPR has limitations. While it allows targeting specific locations in the genome, efficiency can be unpredictable and unintended edits (off-target effects) can occur.

Machine learning can predict which guide RNA sequences will lead to the most efficient and precise editing.?

In 2018 researchers developed DeepCRISPR – an ML model that predicts the effectiveness of a specific gene-editing tool called CRISPR. It takes as input the sequence of a guide RNA (sgRNA) and other information about the DNA region being targeted.

First, DeepCRISPR uses a deep learning algorithm to learn the important features of sgRNAs by analyzing a large dataset of known sgRNA efficacies. It creates a representation of the sgRNA that captures its important characteristics.

Then, DeepCRISPR uses this representation to predict how well it can target and edit the desired gene. It also predicts the off-target effects, which are unintended edits to other parts of the genome.


Conclusion & Upcoming Workshop

These are just a few examples of how machine learning is transforming the field of genomics. As machine learning algorithms continue to evolve and more data becomes available, we can expect even more groundbreaking discoveries in the years to come.

With the growing importance of machine learning, now is the perfect time to dive into this dynamic and rapidly evolving field. Join us for our upcoming hands-on coding workshop, "OmicsLogic Introduction to Machine Learning Using Python."?

?? Date: May 08 - May 10, 2024

?? Time: 7:00 PM IST | 8:30 AM CST

?? Location: Online

For more information about the workshop curriculum and session details, register here: https://forms.gle/L5fpMtyjVPfGUCzDA ?

References

  1. Whalen, S., Truty, R. & Pollard, K. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48, 488–496 (2016). https://doi.org/10.1038/ng.3539 ?
  2. Andreo, David & Mendieta-Esteban, Julen & Marti-Renom, Marc. (2022). Probabilistic 3D-modelling of genomes and genomic domains by integrating high-throughput imaging and Hi-C using machine learning. 10.1101/2022.09.19.508575.?
  3. Belton, J. M., McCord, R. P., Gibcus, J. H., Naumova, N., Zhan, Y., & Dekker, J. (2012). Hi-C: a comprehensive technique to capture the conformation of genomes. Methods (San Diego, Calif.), 58(3), 268–276. https://doi.org/10.1016/j.ymeth.2012.05.001 ?
  4. P?s, Ondrej et al. “DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects.” Biomedical journal vol. 44,5 (2021): 548-559. doi:10.1016/j.bj .2021.02.003?
  5. Hill, Tom, and Robert L Unckless. “A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data.” G3 (Bethesda, Md.) vol. 9,11 3575-3582. 5 Nov. 2019, doi:10.1534/g3.119.400596?
  6. Hallee, L., Khomtchouk, B.B. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 13, 2088 (2023). https://doi.org/10.1038/s41598-023-28965-7 ?
  7. Gostimskaya, Irina. “CRISPR-Cas9: A History of Its Discovery and Ethical Considerations of Its Use in Genome Editing.” Biochemistry. Biokhimiia vol. 87,8 (2022): 777-788. doi:10.1134/S0006297922080090?
  8. Chuai, G., Ma, H., Yan, J. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol 19, 80 (2018). https://doi.org/10.1186/s13059-018-1459-4 ?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了