登录查看更多内容

Exploring the Human Microbiome through Sequencing & Clustering

Supper & Supper GmbH

Brains as a service - Geo AI, Computational Life Science und Mechanical Engineering Data Science L?sungen

发布日期: 2023年5月12日

The human microbiome is a vast collection of microorganisms that live in and on our bodies. For example, the human gut microbiome plays a critical role in the host’s:

physiology
metabolism
immune system
behavior
neurology

Through microbiome sequencing, you can gain insight into changes in the composition of the gut microbiota. This in turn helps to understand the interaction between our bodies and the microbiome, leading to new options for therapeutic intervention.

Project Objective: Microbiome Sequencing & Clustering Pipeline

The process of microbiome sequencing and clustering can be efficiently performed using data science. The goal of this project was to develop a pipeline for clustering microbiome sequencing data based on nucleotide order and then matching it with known sequences using the BLAST algorithm.

Homogenous sequences can be grouped into the same cluster which may correspond to the same taxonomy. In the next step, representative sequences from a cluster were matched to the known bacterial genus or species in the NCBI database. The results of this project can then be used to help make well informed decisions for clinical treatment.

Gathering Data to Understand the Gut Microbiome

The gut microbiome data was collected from the colon during a colonoscopy. For this, the colonoscope was advanced to the caecum, with an air injection being employed on the way back for the intestine to unfold, allowing a scan of the entire mucosa.

16S rRNA sequencing was then applied on the isolated gut RNA to identify the bacteria. The raw sequencing data were also provided in the .FASTQ file format.

Es wurde kein Alt-Text für dieses Bild angegeben. — The sample was taken from the location indicated by the arrow.

Challenges of Microbiome Sequencing and Clustering

The mixture of the human genome and bacterial genome increases the difficulty of identifying the microbiota species. Since there was no prior information provided for clustering, an unsupervised method was used.
A technical error of the machine may occur during sequencing, which may provide useless information (such as nucleotides being labeled as “N”) or even false information about the bacterial genome.
Due to the limitations of the technique, not all samples can be analyzed simultaneously. To overcome this problem, a batch effect should be normalized.

Metagenomics: Microbiome Sequencing Methods

Self Organizing Map (SOM)

SOM is an artificial neural network (ANN) trained by unsupervised learning to create a low-dimensional discretized representation of the input. It can process the batch effect to adjust for wrong data through technical errors. To work with the sample data, all sequences had to be truncated from the primer start to the 250 nucleotides.

领英推荐

Top Companies in Genomics

Bertalan Meskó, MD, PhD 6 个月前

Humans Decoded

Bill Gates 8 年前

I Got My Whole Genome Sequenced. Here’s What I Learned.

Bertalan Meskó, MD, PhD 6 年前

Since SOM only works with numerical data, the nucleotides first had to be translated into numbers – with the nucleotides A, C, T, G and G being translated into 1, 2, 3, 4, and 5 respectively. During the training cycle all 10,000 sequences were distributed over the network and the batch effect adjustments were applied.

For each cluster, a corresponding heat map was saved, with each nucleotide assigned a color to visualize cluster homogeneity. The mean cluster correlation was below 0.95 which was an important value for the second SOM run to increase the quality of subcluster identification.?After two SOM runs, each sequence was grouped into a fitting cluster with an average cluster homogeneity of over 90 %.

Basic Local Alignment Search Tool (BLAST)

The BLAST algorithm was used to identify the known sequences of microorganisms in the samples. The tool can reliably separate the human genome from bacterial genomes, addressing one of the major challenges. To minimize the analysis time, 50 representatives of each of the largest clusters were randomly selected and saved as FASTA files. The next step was to identify the most frequent BLAST results for each cluster.

Results of the Microbiome Analysis?

The sample results are promising:

The SOMs provided clustering with high homogeneity.
Neural networks were able to identify similarities in large amounts of DNA sequences based on Euclidean distances.
Heatmaps allowed users to identify dominant patterns within a cluster.

The identified microorganisms allow for a deeper understanding of the microbiota composition within patients, including changes in gene expression patterns. The obtained results have the potential to be utilized in further investigations of:

host-microbiota interactions
drug effects
environmental influences

Conclusion

Microbiome sequencing and clustering are powerful techniques for exploring the human microbiome and potential therapeutic applications. Through DNA clustering and sequence matching, we can analyze microbiome sequencing data and gain valuable insights. The use of SOMs and BLAST enables accurate identification and analysis of microbiome DNA, leading to more targeted and effective treatment options.

Curious about other ways data science can help to automate and optimize complex processes? Then explore our other use cases!

Exploring the Human Microbiome through Sequencing & Clustering

Supper & Supper GmbH

Brains as a service - Geo AI, Computational Life Science und Mechanical Engineering Data Science L?sungen

Project Objective: Microbiome Sequencing & Clustering Pipeline

Gathering Data to Understand the Gut Microbiome

Challenges of Microbiome Sequencing and Clustering

Metagenomics: Microbiome Sequencing Methods

领英推荐

Results of the Microbiome Analysis?

Conclusion

Supper & Supper GmbH的更多文章

社区洞察

其他会员也浏览了

#18 The Genome India Project: A Step Toward Self-Sufficiency in Genomic Research

Scientific Publications from Mapmygenome and Ocimum Biosolutions

Human Genome Project: Manual for Customized Treatment

How New York Genome Center manages the massive data generated from DNA sequencing

Genomics and Genetic Testing Overview: A Comprehensive List of Tests

NHGRI Director Eric Green outlines the four chapters of human genomics

The Human Genome Project Turns 20: Here’s How It Altered the World

Non-coding DNA: More than Genetic 'Junk'

The Role of Data in Genetics

Project Objective: Microbiome Sequencing & Clustering Pipeline

Gathering Data to Understand the Gut Microbiome

Challenges of Microbiome Sequencing and Clustering

Metagenomics: Microbiome Sequencing Methods

领英推荐

Results of the Microbiome Analysis?

Conclusion

Supper & Supper GmbH的更多文章

Breast Cancer Prediction: How machine learning can support the process

Predictive Maintenance in the Automotive Industry

Cloud Computing Services: Microsoft Azure, Amazon Web Services or Google Cloud?

Information Extraction through NLP on Job Portals

Density-based Approach to Crowd Counting

Automatic Crop Damage Detection

社区洞察

其他会员也浏览了

#18 The Genome India Project: A Step Toward Self-Sufficiency in Genomic Research

Scientific Publications from Mapmygenome and Ocimum Biosolutions

Human Genome Project: Manual for Customized Treatment

How New York Genome Center manages the massive data generated from DNA sequencing

Genomics and Genetic Testing Overview: A Comprehensive List of Tests

NHGRI Director Eric Green outlines the four chapters of human genomics

The Human Genome Project Turns 20: Here’s How It Altered the World

Non-coding DNA: More than Genetic 'Junk'

The Role of Data in Genetics