登录查看更多内容

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 3

Miodrag Cekikj, PhD CSE

Transforming Businesses with Applied AI | R&D Lead Technical Consultant @ ?IWConnect | Microsoft MVP | Technical Trainer | Web3 & Blockchain Practitioner

发布日期: 2022年12月27日

Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for understanding the colorectal cancer carcinogenesis

In one of the previous articles, I made an overview of the designing and developing of a comprehensive bioinformatics framework and machine learning pipeline for deep microbiome data analysis and interpretation. So far, I have applied the methodology and elaborated on the technical results and interpretation of the key biomarkers that can play a significant role in understanding the therapy-resistant mechanism for patients diagnosed with colorectal cancer (CRC). This article will follow the identical approach for the second CRC carcinogenesis case study covering the samples described by the same Tubular Adenoma histology. Referring to the data demographics overview, this group consisted of 23 representatives from patients with pre-operative Tubular Adenoma (Adenoma) and 21 samples diagnosed with a post-operative Newly Developed Adenoma (NDA).

* Note: Considering this case study follows the same design and implementation, I will explicitly elaborate only the main modelling phase, the high contribution features, and the statistical analysis results.

ML Modeling Results

As mentioned before, after applying the data normalization and scaling techniques, I calculated Cronbach`s alpha and Cohen`s kappa coefficients, respectively. Referring to the previous definition, The Cronbach`s alpha coefficient value thresholds can be explained based on the following stages: Early stage of research (0.5 or 0.6/0.7); Applied research 0.8; When making an important decision 0.9. Usually, Cronbach`s alpha value > 0.75 is considered acceptable for microbiome-related studies. On the other hand, Cohen`s kappa coefficient is determined by the following stages: <0.4 is considered poor; 0.4 - 0.75 is considered moderate to good; >0.75 represents excellent data agreement. The results from these calculations are presented in the table below:

No alt text provided for this image — Cronbach`s alpha and Cohen`s kappa coefficients for the pre-operative Adenoma and post-operative Newly Developed Adenoma groups

The general ML modelling performance metrics for the pre-operative Adenoma and post-operative NDA individuals’ group are presented in the following table.

Additionally, I also decided to calculate the Precision, Recall and F1-Score metrics for both subgroups, respectively. The results are displayed in the following table:

Identical to the previous immunotherapy effect case study, I also tried XGBoost and AdaBoost algorithms, which resulted in no significant improvements compared with the forest-based approach described above. Therefore, I identified the second-phase Python-based random forest classifier as the most performant and selected the resulting most important features as a reference set for further statistical analysis.

Statistical Analysis and Highly Contributing Features Results

The comparison for the Adenoma and NDA groups of samples presented a total of 86 unique genera. Subsequently, there were 28 separated by the ML algorithm from these genera as the most important features (32.6%) ranking in an interval of statistically calculated Benjamini-Hochberg p-value from 0.002 to 0.048 between the groups. Therefore, in the pre-operative Adenoma group, I found the Oscillospiraceae-UCG-002**, Anaerovoracaceae group, Ruminococcus, Prevotella, Lachnospiraceae, FCS020 group and Blautia as genera biologically interesting for further analysis and interpretation. Accordingly, the most significant genera among the post-operative NDA samples belong to Tyzzerella, Bifidobacterium and Lachnoclostridium.

** Note: The designed bioinformatics framework and pipelines identified some unclassified genome sequences (UCG) that need to be additionally investigated. This could potentially result from the applied taxonomic analysis and reannotation of the raw reads against updated bacterial references – using the SILVA 138.1–16s reference database (latest reference database update on 27 August 2020).

领英推荐

Feature story: Harnessing the potential of…

National Research Council Canada / Conseil national de recherches Canada 2 年前

Isolation, Expansion, Multilineage…

义翘神州 6 个月前

BOLD Innovation that Matters! - July 2023…

药明康德 1 年前

I completed the general insights picture providing the statistical analysis results for genera abundances in resistant and non-resistant groups visualized in the following diagram:

Biological analysis and interpretation

The most compelling genus detected as an important feature between the samples of patients with newly developed adenoma and patients diagnosed with tubular adenoma before clinical treatment was Prevotella. Prevotella is primarily reported to be present in the oral microbiome, only to be found in relatively high bacterial abundance in proximal colon cancer, which according to research, appears to be associated with elevated IL17-producing cells in the mucosa of patients with CRC. Conversely, as mentioned in the original publication, one study on Prevotella in the transgenic mouse showed that this genus promotes the differentiation of Th17 cells that primarily colonize the gut and migrate to the bone marrow, where they support the progression of multiple myeloma.

Conclusion

The study documented in this series of articles introduced a multidisciplinary systematic approach and a methodology for observing CRC drug-resistance mechanism and carcinogenesis using the microbial composition specified at the genus level. Leveraging the concepts of the bioinformatics studies, I developed different highly performant machine learning models to assist clinicians in efficiently analyzing resistant patients' microbiome diversity to address and threaten tumor proliferation, newly developed adenoma, inflammation promotion, and potential DNA damage. In terms of this, I identified the Random Forest Classifier as the most suitable algorithm for empowering follow-up techniques for feature significance interpretation. The significant features relevance obtained from the models was further observed using the stochastic algorithm's nature, where I retrieved additional data insights and variables' importance ranks. Additionally, I incorporated a symbiotic bacteria analysis to investigate the features' correlation and interaction (joint features contribution in correspondence to the specific resistance or adenoma class).

Thus far, many studies point out the importance of present genera in the microbiome and intend to treat it separately. This contributes to the field of predictive modeling in healthcare and points out the different perspectives of a treatment since our aggregate analysis gives clear results for the genera that are often found together in a resistant group of patients, meaning that resistance is not due to the presence of one pathogenic genus in the patient microbiome, but several bacterial genera that live in symbiosis. Also, our findings are complementary to the other microbiome related studies published in the literature showing the potential and justification of the applied approach.

The established methodology can also be used for unseen microbiome data to help oncologists decide on treatment and post-treatment strategies for immunotherapy and drug resistance understandings. From the further action points, I would like to emphasize the potential for improvement of the designed symbiotic bacterial analysis to provide a combined overview of the model's predictiveness and uncover additional deep data correlations and knowledge.

Thank you for reading this article and the whole series in general. I believe it is clear and comprehensive in covering the core concepts of the proposed methodology and technical pipeline.

Thank you for being so supportive as well, and I would be grateful if you take the time to comment, share the article and connect for further discussions and collaboration. Feel free to share your thoughts and experience in this regard.

Part 1 - Introductory article - Bioinformatics Framework design and Methodology Overview

Part 2 - Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for the colorectal cancer drug-resistance mechanism

Miodrag Cekikj, PhD CSE

Transforming Businesses with Applied AI | R&D Lead Technical Consultant @ ?IWConnect | Microsoft MVP | Technical Trainer | Web3 & Blockchain Practitioner

2 年

Towards Data Science publication available on the following URL: https://towardsdatascience.com/application-of-machine-learning-algorithms-in-modeling-the-role-of-the-microbiome-in-the-colorectal-2c222ea6ba0.

要查看或添加评论，请登录

Miodrag Cekikj, PhD CSE的更多文章

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.4

2024年11月15日

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.4

Leveling up existing RAG-based cloud-native solutions by using Azure AI Assistants All right, we come up to the wrap up…
RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.3

2024年11月14日

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.3

Intelligently index your data, optimize retrieval processes, and boost efficiency with Azure AI (Cognitive) Search…
RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.2

2024年11月13日

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.2

Building an Intelligent Document Processing Pipeline with Azure OpenAI and Azure AI (Cognitive) Services Let’s continue…
RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services - p.1

2024年11月11日

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services - p.1

Implementing Real-Time Speech Recognition, Translation, and Data Storage Using Azure Cognitive Services I get the…

1 条评论
Deploying your RAG-based GPT solutions using Microsoft Azure OpenAI

2023年12月4日

Deploying your RAG-based GPT solutions using Microsoft Azure OpenAI

Learn how to integrate and publish your intelligent GenerativeAI solution built with Azure OpenAI Service So far, we…

15 条评论
Azure OpenAI Studio - Chat Playground with GPT-3.5-turbo & GPT-4 models in a?nutshell

2023年11月30日

Azure OpenAI Studio - Chat Playground with GPT-3.5-turbo & GPT-4 models in a?nutshell

Everything you need to know to get started with Azure OpenAI Chat Playground As I announced in the previous post, here…

8 条评论
Crafting your customized ChatGPT with Microsoft Azure OpenAI Service

2023年11月27日

Crafting your customized ChatGPT with Microsoft Azure OpenAI Service

A Step-by-Step introductory guide for creating an Azure cloud native GenerativeAI solution specialized for…

9 条评论
Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 2

2022年12月26日

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 2

Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for the colorectal cancer…

1 条评论
Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 1

2022年12月25日

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 1

Introductory article - Bioinformatics Framework design and Methodology Overview After seven years of intensive and…

2 条评论
Utilizing the Model Builder and AutoML for creating Lead Decision and Lead Scoring model in Microsoft ML.NET

2021年9月30日

Utilizing the Model Builder and AutoML for creating Lead Decision and Lead Scoring model in Microsoft ML.NET

Recently, I wrote an article explaining the utilization of the ONNX format in integrating the Scikit-learn lead scoring…

See all articles

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 3

Miodrag Cekikj, PhD CSE

Transforming Businesses with Applied AI | R&D Lead Technical Consultant @ ?IWConnect | Microsoft MVP | Technical Trainer | Web3 & Blockchain Practitioner

Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for understanding the colorectal cancer carcinogenesis

ML Modeling Results

Statistical Analysis and Highly Contributing Features Results

领英推荐

Biological analysis and interpretation

Conclusion

Part 1 - Introductory article - Bioinformatics Framework design and Methodology Overview

Part 2 - Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for the colorectal cancer drug-resistance mechanism

Miodrag Cekikj, PhD CSE的更多文章

社区洞察

其他会员也浏览了

Uncovering tissue complexity through spatial biology

CRISPRMED25 Keynote Speaker Announcement

Weekly Research News Digest

FFPE DNA Extraction Struggles? Our Kit is Here to Change the Game!????

CPHMS Newsletter 2.0 | April 2024

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

Democratizing Next-Generation Sequencing – An interview with Rakesh Nagarajan,MD, PhD, Founder and Executive Chairman of PierianDx - Part 1 of 2

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

BOLD Innovation that Matters! - July 2023 Translational Research

Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for understanding the colorectal cancer carcinogenesis

ML Modeling Results

Statistical Analysis and Highly Contributing Features Results

领英推荐

Biological analysis and interpretation

Conclusion

Part 1 - Introductory article - Bioinformatics Framework design and Methodology Overview

Part 2 - Bioinformatics Framework design and Methodology - Machine Learning Modelling Results for the colorectal cancer drug-resistance mechanism

Miodrag Cekikj, PhD CSE的更多文章

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.4

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.3

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services?-?p.2

RAG-ING Ahead: Next-Gen Cloud-Native Intelligence with Azure AI Studio and Cognitive Services - p.1

Deploying your RAG-based GPT solutions using Microsoft Azure OpenAI

Azure OpenAI Studio - Chat Playground with GPT-3.5-turbo & GPT-4 models in a?nutshell

Crafting your customized ChatGPT with Microsoft Azure OpenAI Service

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 2

Application of Machine Learning algorithms in modeling the role of the Microbiome in the Colorectal Cancer diagnosis and therapy - Part 1

Utilizing the Model Builder and AutoML for creating Lead Decision and Lead Scoring model in Microsoft ML.NET

社区洞察

其他会员也浏览了

Uncovering tissue complexity through spatial biology

CRISPRMED25 Keynote Speaker Announcement

Weekly Research News Digest

FFPE DNA Extraction Struggles? Our Kit is Here to Change the Game!????

CPHMS Newsletter 2.0 | April 2024

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

Democratizing Next-Generation Sequencing – An interview with Rakesh Nagarajan,MD, PhD, Founder and Executive Chairman of PierianDx - Part 1 of 2

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

Ribosome Newsletter: Navigating the Convergence of Proteomics, AI, Synthetic Biology, and Translational Medicine

BOLD Innovation that Matters! - July 2023 Translational Research