Uncovering How Obesity Affects Metabolic Health: Using Data Analytics to Evaluate Meal Challenges and Optimal Measurement Times

Uncovering How Obesity Affects Metabolic Health: Using Data Analytics to Evaluate Meal Challenges and Optimal Measurement Times

Github repository

Thesis title: "The impact of overweight and obesity on metabolic reprogramming of the innate immune system: An evaluation of fasting and postprandial measurements for discrimination in cardiometabollic disease risk"

During my master's program in Bioinformatics & Computational Biology, I had the chance to dive into the fascinating world of metabolic health. One moment that stands out for me was when I first looked at the massive gene expression dataset and clinical dataset. I was working with—over 700,000 rows of gene expression data! I remember feeling both overwhelmed and excited. It was like opening a treasure chest filled with potential insights about human health. This project wasn't just another academic requirement; it was a unique opportunity to explore how our bodies react to different meals and how we can better understand metabolic health.

Why THIS Project?

I chose this project for my thesis because I wanted to enhance my skills in Python and apply them to real-world data analysis. My interest in understanding metabolic processes, especially in relation to gene expression, was a driving force. I also knew that tackling a problem as significant as metabolic health could contribute to understanding chronic diseases, which makes this work particularly meaningful.

What Readers Will Gain

In this article, you will gain insights into how metabolic modeling can inform research in nutritional science and health. I’ll share key findings from my analysis, the dataset I used, and the steps I took in my analysis, along with some visuals that illustrate the results. Additionally, I’ll reflect on the surprises I encountered along the way.

Key Takeaways

  • Postprandial measurements, especially at 120 minutes after meals, provide better insights into metabolic responses than fasting measurements.
  • Context-specific metabolic models revealed significant differences in key metabolic pathways related to inflammation and lipid metabolism.
  • Researchers are encouraged to use mixed meal test challenges over fasting or glucose-only tests for metabolic health assessments when time and money are an issue.
  • Future studies should broaden the diversity of participants to better understand chronic inflammation and cardiometabolic diseases.

Dataset Details

For this project, I utilized two primary datasets. The first, GSE88794_RAW.tar, from the Gene Expression Omnibus (GEO), contains gene expression data with 700,000 rows from 72 healthy, overweight participants.

The second dataset, nutritech_otherdata.xlsx, provided clinical data measurements, including BMI and other phenotypic characteristics. This combination of data made it an ideal resource for my analysis since it allowed me to explore the connections between clinical metrics and gene expression.

Analysis Process

My analysis journey began with data cleaning and preparation which I did in Excel and R. Since the dataset was extensive, I used Python for data transformation and statistical analysis. I applied techniques like Principal Component Analysis (PCA) to visualise patterns in gene expression over time, particularly at fasting (T0) and post-meals (T120). I also performed metabolic modelling, k-means clustering, significance & multiple testing and reaction pathway analysis. One of the surprises was how clear the differences became when looking at post-meal responses; they truly stood out compared to fasting data.

Visuals and Insights

I analysed and visualised datasets using key Python libraries such as numpy, pandas,sklearn, matplotlib and seaborn.

Let me walk you through some key visuals from my paper:

Table 1. Clinical Measurements Before and After Diet Intervention. Week 1 (Before Intervention) Mean ± SD (Standard deviation) vs Week 13 (After Intervention, split by diet group) Mean ± SD, Data from Week 13 is split into the intervention groups: Diet group = 20% Energy restrction. Control = Weight maintenance, (Significance at p < 0.05).

Initial Data Exploration of the clinical dataset, after cleaning and calculating columns in Excel I then performed a statistical analysis in python. Overall, minimal differences were observed between Week 1 and Week 13 across both intervention groups.

There were no substantial changes in body weight or fasting glucose, indicating no significant improvement in key metabolic health features following 12 weeks of energy restriction. In the diet group (20% energy restriction), significant reductions were seen in BMI (p = 0.0) and HOMA-IR (p = 0.04), indicating a reduction in body mass and improved insulin sensitivity. Trends toward improvement were noted in 2-hour glucose (p = 0.08) and Matsuda index (p = 0.06), though not statistically significant, suggesting enhanced glycemic control.

In contrast, the control group (weight maintenance) showed no significant changes in health markers, with none of the p-values below 0.05. This reinforces that any differences in the diet group were likely due to the intervention.


Figure S1.3: A Boxplots of week 1 vs week 13 (split by Start group) shows all the NutriTech data columns as boxplots of week 1 (Before intervention) vs Week 13 (After intervention), Week 5 is made up of both groups, Week 13 is split into the intervention groups: Weight loss (orange) and Weight maintenance (blue)

Box plots in Appendix Figure S1.3 further support the findings of table 1, showing similar distributions and overlapping spreads between groups, with large outliers in inflammation markers. While the diet group showed improvements in BMI and HOMA-IR, overall changes across other variables were limited, leading subsequent analyses to focus on uncovering metabolic differences.


Figure 1


The gene expression data set was cleaned and transformed in R studio so that it could then be analysed in python. PCA was conducted on the processed PBMC gene expression data to explore patterns and variations associated with different time points and conditions in figure 1 the OGTT gene expression data is shown coloured by 2 different parameters.

Here you can see a distinct trend where the T120 data points tend to group together on the left-had side of the figure where the T0 data points show much greater variation than before the intervention vs after the intervention.

This indicates measurable changes in PBMC gene expression profiles between the fasting state and the postprandial state within just two hours (T120).


Figure S5: Figure S5.2 PCA Principal components 1 vs 2 of OGTT Before Intervention at T0 vs T120 with Nutritech participants labelled.

In Figure S5.2 no distinct clustering of participants was observed at any individual time point. The scatter plots revealed a homogeneous distribution of data points, indicating that individual participants did not cluster consistently across different time points.

This suggests that the metabolic reprogramming of PBMCs does not exhibit distinct clustering patterns when comparing the fasting and postprandial states.

In the first plot (Figure 6A), the same PC1 vs. PC2 scores were color-coded based on the participants' ICAM-1 average levels, using a logarithmic colour gradient due to the wide range of ICAM-1 values. Notably, lighter colours (indicating higher ICAM-1 levels) were observed on the left side of the plot, while darker colors (lower ICAM-1 levels) appeared on the right. This gradient suggests a potential association between metabolic flux patterns and ICAM-1 levels, an inflammatory marker. While no distinct clusters were found, based on the observed gradient in the ICAM-1 levels at this T120 minutes MMT it was decided to define two clusters based on this PCA scores to investigate the ability of the GSMM and the predicted flux distributions to discriminate between healthy and unhealthy PBMC responses.

In the second plot (Figure 6B), the PC1 vs. PC2 scores were colored based on clusters identified through K-means clustering (k=2). Participants were assigned to Cluster 1 or Cluster 2, represented by different colors. This clustering aimed to determine whether participants could be grouped based on similarities in their metabolic flux pattern. When comparing cluster 1 and cluster 2 defined from the T120 MMT data (Figure 6A and B) significant differences were found between kmeans cluster 1 & 2 in plasma ICAM-1 levels at T120 for the MMT.


Figure 9: MMT T240: Number of significantly different reactions per pathway following Benjamini-Hochberg correction in a comparison of the kmeans clusters 1 versus 2 in PBMC context-specific model

In the MMT group at T120, there were significant reactions not only in extracellular transport, which is expected due to nutrient uptake after a meal, but also in several other pathways related to both lipid metabolism and amino acid metabolism.

Notably, significant differences were observed in glycerophospholipid metabolism, mitochondrial transport, and tyrosine metabolism. Two reactions in the tyrosine metabolism pathway were found to be significantly different between Cluster 1 and Cluster 2 at MMT T120. Tyrosine metabolism plays a crucial role in neurotransmitter synthesis and immune function (Jongkees et al., 2015; Bekhbat et al., 2020).

The observation made in this analysis reflect the physiological metabolic adjustments expected from the nutritional composition of the MMT at T120 and provide confidence in the use of GSMM predictions in metabolic studies as well as highlighting its usefulness over the OGTT.

These visuals not only make the data more digestible but also provide a clear picture of the relationships in the data, allowing for deeper insights.

Main Takeaways

Reflecting on my findings, it’s clear that postprandial measurements (T120) offer substantial advantages for studying metabolic health. The metabolic pathways I identified—like glycolysis and glycerophospholipid metabolism—would likely be overlooked if we relied solely on fasting measurements. This suggests a promising shift in how researchers might approach studies on nutrition and metabolism.

The Mixed meal test (MMT) elicited more pronounced metabolic alterations compared to the OGTT, suggesting that mixed meals are more effective than glucose-only challenges in revealing metabolic reprogramming. Therefore, researchers should consider using mixed meal challenges over fasting measurements or OGTTs when designing studies to assess metabolic health.

Moreover, the use of context-specific metabolic models, such as genome-scale metabolic models, offers a new avenue for understanding metabolic variations across different populations. This can significantly contribute to personalized nutrition and health strategies.

Conclusion and Personal Reflections

This project taught me the value of rigorous data analysis while also honing my programming skills. One of the biggest challenges was navigating the large datasets and ensuring accuracy in my analysis. However, with persistence and support from my mentors, I overcame these hurdles. This experience not only deepened my understanding of metabolic processes but also shaped my future goals in research and data analysis.

Call To Action

I invite you to connect with me on LinkedIn. I’d love to hear your thoughts on this project or any ideas you have related to metabolic health and data analysis. Let’s keep the conversation going!

要查看或添加评论,请登录

Douglas Woollam的更多文章

社区洞察

其他会员也浏览了