登录查看更多内容

The Science of Cholesterol Prediction- An MLR Analysis

Prashansa Gupta

Biotechnologist|| PGDM HC 23-25 || Alumni Committee member || Welingkar Institute of Management || Ex- WNS

发布日期: 2024年4月11日

Cardiovascular diseases are the leading cause of deaths worldwide. They lead to poor quality of life, disability and death. They can often be prevented if one can control its risk factors like High BP and cholesterol levels.

Cholesterol is a type of lipid which is prevalent in all of higher animals. It is distributed in body tissues, especially the brain and spinal cord. It helps our body perform many important functions, however, too much cholesterol in the blood is bad for health as it can enter the?artery walls and damage its integrity leading to formation of hardened deposits (atherosclerotic plaque). It is a silent killer since an individual can take years to notice its presence, by then the plaque can cause serious problems like CAD and strokes. To curb the levels of cholesterol it is necessary to take preventive measures in time before any such problems can take root. This is where the power of statistics and ML modeling comes into play. In this article, I will be explaining how MLR can be used to predict total cholesterol levels using a sample dataset for chronic heart disease.

Machine learning has a huge impact on health through an effective analysis of chronic diseases for accurate diagnosis and proper treatment. In the field of healthcare this kind of prediction plays a major role to find out the risk of the disease in the patient. The only way to overcome the mortality due to chronic diseases is to predict it earlier so that the disease prevention can be done.

What is multiple linear regression?

Multiple linear regression is a technique that can be used to understand the association between several multiple independent variables and one continuous dependent (or outcome) variable. For example, if we are cooking, the flavor of the dish depends on several factors like type of ingredients used, their quantity and methods of preparation. Here, the ingredients are the various independent variables affecting the outcome of the dependent variable to be predicted i.e. the dish. Similarly Total cholesterol levels should predicted by taking different factors into account rather than only taking a single variable. For my Dataset I have taken the various independent variables and generated two iterations of regression which can be used to predict the values for total cholesterol along with the significance level of the results.

Y = a + m1X1?+ m2X2

(Above given is the multiple linear regression equation where the coefficients act as weights of how much each independent variable contributes to the predicted cholesterol level,Y)

The Prediction Model

I have utilized two platforms for performing MLR, so that there is an extensive approach to predictive modeling:

Using Excel
Using Python

Given below is a small introduction to the Dataset which was utilized in this analysis.

Gender - Sex of the patient
Age- Age of the patient
Education- Education level of the patient (1- Primary, 2- Lower Secondary, 3- Upper Secondary 4- Post Secondary non tertiary education)
CurrentSmoker- Information if patient is currently partaking in smoking or not
Cigsperday- If Patient is partaking is smoking, the average number of cigarettes smoked per day
Bpmeds- Information regarding if the patient is on BP medications or not
Prevalentstroke- Information regarding if the patient is prevalent to stroke
Prevalenthyp- Information regarding if the patient is prevalent to hypertension
Diabetes- Information regarding if the patient has diabetes or not
totChol- Total cholestrol levels of the patient
sysBP- Systolic BP level of the patient
diaBP- Diastolic BP level of the patient
BMI- Body Mass Index ratio of the patient
Heartrate- Number of times the patient's heart beats per minute
Glucose- Glucose amount/ sugar levels in blood of the patient
TenYearCHD- If the patient has a chronic heart disease ongoing for past 10 years.

Gerti Tashko, MD 2 个月前

A polypill to fight cardiovascular disease

Jean-Marc Bougie 2 年前

World Diabetes Day

Dr. Neha Singh 1 年前

The process to be followed involves following steps: Data Cleaning(Removing missing values, inconsistencies etc.), Data Preparation (Handling the Catagorical Values), Feature selection on the basis of significance levels of each independent variable, Model evaluation on the basis of Several metrics like R-squared and Mean Squared Error (MSE) tell us how well the model predicts values on unseen data. A high R-squared indicates a good fit between the predicted and actual cholesterol values. A low MSE means the model's predictions are close to the actual values. Once a well-performing model is obtained, one can interpret the β coefficients and understand how factors like age, gender, or dietary habits influence cholesterol levels based on their positive or negative values and their magnitude.

Outputs

The Excel output gives us an analysis of the multiple linear regression model generated. We obtain an insight into the coefficients and significance of each independent variable with respect to the field to be predicted on Y-axis. The intercept value of 115.7674 represents the predicted value of the dependent variable (e.g., total cholesterol) when all the independent variables are equal to zero. R-squared (R2) is a statistical measure which signifies the proportion of variance in the dependent variable that can be explained by the independent variables included in the model. The R2 value is for this model is 0.096, indicating that the given independent variables may not necessarily be a good fit for predicting total cholesterol since it only accounts for 9.6% of the total variance. Examining the individual coefficients (β), we found that Age, Diastolic BP levels, BMI, Heart Rate had a positive and statistically significant effect (β > 0, p-value < 0.05), suggesting that their higher values are associated with an increase in total cholesterol.

Both Python and Excel projections reflected the overall pattern of the data, albeit to varied degrees of precision. However, there are some cases where the disparities between actual and anticipated values are significant, indicating that the models may not be aligning at an optimum level.

Significance of the study

This study provided an insight on how one can leverage the ML modeling tools to combine and figure out the significant relationships to be utilized for prediction of Cholesterol levels. ML models can also be used to identify subgroups of individuals who are unlikely to develop very high risk levels of cholesterol and formulate schedules for screening individuals.?Such strategies can also be used to develop personalized care plans for managing cholesterol levels however there should be a heavy emphasis on the quality of data provided to the model for training since MLR cannot provide perfect predictions but can aid in providing insights to achieve optimal heart health.

TrueHealing | Chronic Disease Recovery: 'Your Route to Health'

7 个月

Can't wait to dive into your insightful blog post on cholesterol prediction using MLR!

1 次回应

Swarn Prabha

Hardware Services Delivery Specialist - Nokia India at Nokia

7 个月

True, off late cardiovascular condition has become so prevalent. Predictive analysis along with correct implementation can play crucial support in saving / improving lives. In any Family if there is +ve outcome even for a single member, it helps the whole family. Better Health -> Better Quality of Life ->Better Mental peace

1 次回应

查看更多评论

要查看或添加评论，请登录

Prashansa Gupta的更多文章

Australian National Digital Health Strategy

2024年9月1日

Australian National Digital Health Strategy

Overview of Australian Healthcare System Australian Healthcare is a hybrid of public and private sectors. Medicare is…

3 条评论
Medical Product Distribution for a Sample Dataset

2024年2月3日

Medical Product Distribution for a Sample Dataset

As a healthcare professional, tracking medical devices supply to Healthcare organizations is crucial. It empowers one…

5 条评论

The Science of Cholesterol Prediction- An MLR Analysis

Prashansa Gupta

Biotechnologist|| PGDM HC 23-25 || Alumni Committee member || Welingkar Institute of Management || Ex- WNS

领英推荐

Prashansa Gupta的更多文章

社区洞察

其他会员也浏览了

A case report about a new therapy of type 2 diabetes for 12 days. The factual rejuvenation of the human brain and body. No side effect.

Beyond Blood Clots: Exploring The Impact Of Hereditary Thrombophilia

Emerging technologies and the future of cardiovascular diseases management

Will Fish Oil/Omega-3 Increase Your Risk of Cardiovascular Disease?

Decoding Cardiovascular Disease (Atherosclerosis)

The Top 3 Genetic Liver Diseases Of 2023

How does inflammation increase the risk for heart attacks?

Preventing A Heart Attack

Diabetes Over the Years

Hypertension: A Common Comorbidity of COVID-19

领英推荐

Prashansa Gupta的更多文章

Australian National Digital Health Strategy

Medical Product Distribution for a Sample Dataset

社区洞察

其他会员也浏览了

A case report about a new therapy of type 2 diabetes for 12 days. The factual rejuvenation of the human brain and body. No side effect.

Beyond Blood Clots: Exploring The Impact Of Hereditary Thrombophilia

Emerging technologies and the future of cardiovascular diseases management

Will Fish Oil/Omega-3 Increase Your Risk of Cardiovascular Disease?

Decoding Cardiovascular Disease (Atherosclerosis)

The Top 3 Genetic Liver Diseases Of 2023

How does inflammation increase the risk for heart attacks?

Preventing A Heart Attack

Diabetes Over the Years

Hypertension: A Common Comorbidity of COVID-19