Exploring the Human Microbiome through Sequencing & Clustering
Supper & Supper GmbH
Brains as a service - Geo AI, Computational Life Science und Mechanical Engineering Data Science L?sungen
The human microbiome is a vast collection of microorganisms that live in and on our bodies. For example, the human gut microbiome plays a critical role in the host’s:
Through microbiome sequencing, you can gain insight into changes in the composition of the gut microbiota. This in turn helps to understand the interaction between our bodies and the microbiome, leading to new options for therapeutic intervention.
Project Objective: Microbiome Sequencing & Clustering Pipeline
The process of microbiome sequencing and clustering can be efficiently performed using data science. The goal of this project was to develop a pipeline for clustering microbiome sequencing data based on nucleotide order and then matching it with known sequences using the BLAST algorithm.
Homogenous sequences can be grouped into the same cluster which may correspond to the same taxonomy. In the next step, representative sequences from a cluster were matched to the known bacterial genus or species in the NCBI database. The results of this project can then be used to help make well informed decisions for clinical treatment.
Gathering Data to Understand the Gut Microbiome
The gut microbiome data was collected from the colon during a colonoscopy. For this, the colonoscope was advanced to the caecum, with an air injection being employed on the way back for the intestine to unfold, allowing a scan of the entire mucosa.
16S rRNA sequencing was then applied on the isolated gut RNA to identify the bacteria. The raw sequencing data were also provided in the .FASTQ file format.
?
Challenges of Microbiome Sequencing and Clustering
Metagenomics: Microbiome Sequencing Methods
Self Organizing Map (SOM)
SOM is an artificial neural network (ANN) trained by unsupervised learning to create a low-dimensional discretized representation of the input. It can process the batch effect to adjust for wrong data through technical errors. To work with the sample data, all sequences had to be truncated from the primer start to the 250 nucleotides.
领英推荐
Since SOM only works with numerical data, the nucleotides first had to be translated into numbers – with the nucleotides A, C, T, G and G being translated into 1, 2, 3, 4, and 5 respectively. During the training cycle all 10,000 sequences were distributed over the network and the batch effect adjustments were applied.
For each cluster, a corresponding heat map was saved, with each nucleotide assigned a color to visualize cluster homogeneity. The mean cluster correlation was below 0.95 which was an important value for the second SOM run to increase the quality of subcluster identification.?After two SOM runs, each sequence was grouped into a fitting cluster with an average cluster homogeneity of over 90 %.
?
Basic Local Alignment Search Tool (BLAST)
The BLAST algorithm was used to identify the known sequences of microorganisms in the samples. The tool can reliably separate the human genome from bacterial genomes, addressing one of the major challenges. To minimize the analysis time, 50 representatives of each of the largest clusters were randomly selected and saved as FASTA files. The next step was to identify the most frequent BLAST results for each cluster.
?
Results of the Microbiome Analysis?
The sample results are promising:
The identified microorganisms allow for a deeper understanding of the microbiota composition within patients, including changes in gene expression patterns. The obtained results have the potential to be utilized in further investigations of:
Conclusion
Microbiome sequencing and clustering are powerful techniques for exploring the human microbiome and potential therapeutic applications. Through DNA clustering and sequence matching, we can analyze microbiome sequencing data and gain valuable insights. The use of SOMs and BLAST enables accurate identification and analysis of microbiome DNA, leading to more targeted and effective treatment options.
Curious about other ways data science can help to automate and optimize complex processes? Then explore our other use cases!