Exploring the Human Microbiome through Sequencing & Clustering
Accurate identification and analysis of microbiome DNA leads to more targeted and effective treatment options.

Exploring the Human Microbiome through Sequencing & Clustering

The human microbiome is a vast collection of microorganisms that live in and on our bodies. For example, the human gut microbiome plays a critical role in the host’s:

  • physiology
  • metabolism
  • immune system
  • behavior
  • neurology

Through microbiome sequencing, you can gain insight into changes in the composition of the gut microbiota. This in turn helps to understand the interaction between our bodies and the microbiome, leading to new options for therapeutic intervention.


Project Objective: Microbiome Sequencing & Clustering Pipeline

The process of microbiome sequencing and clustering can be efficiently performed using data science. The goal of this project was to develop a pipeline for clustering microbiome sequencing data based on nucleotide order and then matching it with known sequences using the BLAST algorithm.

Homogenous sequences can be grouped into the same cluster which may correspond to the same taxonomy. In the next step, representative sequences from a cluster were matched to the known bacterial genus or species in the NCBI database. The results of this project can then be used to help make well informed decisions for clinical treatment.


Gathering Data to Understand the Gut Microbiome

The gut microbiome data was collected from the colon during a colonoscopy. For this, the colonoscope was advanced to the caecum, with an air injection being employed on the way back for the intestine to unfold, allowing a scan of the entire mucosa.

16S rRNA sequencing was then applied on the isolated gut RNA to identify the bacteria. The raw sequencing data were also provided in the .FASTQ file format.

Es wurde kein Alt-Text für dieses Bild angegeben.
The sample was taken from the location indicated by the arrow.

?

Challenges of Microbiome Sequencing and Clustering

  1. The mixture of the human genome and bacterial genome increases the difficulty of identifying the microbiota species. Since there was no prior information provided for clustering, an unsupervised method was used.
  2. A technical error of the machine may occur during sequencing, which may provide useless information (such as nucleotides being labeled as “N”) or even false information about the bacterial genome.
  3. Due to the limitations of the technique, not all samples can be analyzed simultaneously. To overcome this problem, a batch effect should be normalized.


Metagenomics: Microbiome Sequencing Methods

Self Organizing Map (SOM)

SOM is an artificial neural network (ANN) trained by unsupervised learning to create a low-dimensional discretized representation of the input. It can process the batch effect to adjust for wrong data through technical errors. To work with the sample data, all sequences had to be truncated from the primer start to the 250 nucleotides.

Since SOM only works with numerical data, the nucleotides first had to be translated into numbers – with the nucleotides A, C, T, G and G being translated into 1, 2, 3, 4, and 5 respectively. During the training cycle all 10,000 sequences were distributed over the network and the batch effect adjustments were applied.

For each cluster, a corresponding heat map was saved, with each nucleotide assigned a color to visualize cluster homogeneity. The mean cluster correlation was below 0.95 which was an important value for the second SOM run to increase the quality of subcluster identification.?After two SOM runs, each sequence was grouped into a fitting cluster with an average cluster homogeneity of over 90 %.

?

Basic Local Alignment Search Tool (BLAST)

The BLAST algorithm was used to identify the known sequences of microorganisms in the samples. The tool can reliably separate the human genome from bacterial genomes, addressing one of the major challenges. To minimize the analysis time, 50 representatives of each of the largest clusters were randomly selected and saved as FASTA files. The next step was to identify the most frequent BLAST results for each cluster.

?

Results of the Microbiome Analysis?

The sample results are promising:

  • The SOMs provided clustering with high homogeneity.
  • Neural networks were able to identify similarities in large amounts of DNA sequences based on Euclidean distances.
  • Heatmaps allowed users to identify dominant patterns within a cluster.

Es wurde kein Alt-Text für dieses Bild angegeben.
The heatmaps of the clusters contained 389 sequences with a mean score of 0.9511...
Es wurde kein Alt-Text für dieses Bild angegeben.
... and 57 sequences with a mean score of 0.7504053.

The identified microorganisms allow for a deeper understanding of the microbiota composition within patients, including changes in gene expression patterns. The obtained results have the potential to be utilized in further investigations of:

  • host-microbiota interactions
  • drug effects
  • environmental influences


Conclusion

Microbiome sequencing and clustering are powerful techniques for exploring the human microbiome and potential therapeutic applications. Through DNA clustering and sequence matching, we can analyze microbiome sequencing data and gain valuable insights. The use of SOMs and BLAST enables accurate identification and analysis of microbiome DNA, leading to more targeted and effective treatment options.

Curious about other ways data science can help to automate and optimize complex processes? Then explore our other use cases!


要查看或添加评论,请登录

Supper & Supper GmbH的更多文章

社区洞察

其他会员也浏览了