登录查看更多内容

Harnessing Data Insights: Revolutionizing Stroke Prediction for a Healthier Future.

Temilola Balogun

Operations & Strategy | Clinical Research | Drug Development

发布日期: 2023年10月5日

Introduction

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.

The tool I will use for this case study is MS Excel 2021.

The data analytics process will follow the PMAVD (Prepare, Model, Analyze, Visualize and Dashboard) process.

Preparation

In preparation, I made clear what the objective is, extracted the data from the web, cleaned and transformed, and loaded it into the software for modeling and analysis.

2.1 Objectives

In this dataset, I will create a dashboard that can be used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

2.2 Measures

The dataset for the project has the following columns:

1) id: unique identifier

2) gender: "Male", "Female" or "Other"

3) age: age of the patient

4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension

5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease

6) ever_married: "No" or "Yes"

7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"

8) Residence_type: "Rural" or "Urban"

9) avg_glucose_level: average glucose level in blood

10) bmi: body mass index

11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*

12) stroke: 1 if the patient had a stroke or 0 if not

*Note: "Unknown" in smoking_status means that the information is unavailable for this patient

2.2.1 Dictionary

2.3 Get the Dataset:

This is the link to download the dataset from kaggle:

Stroke Dataset from Kaggle

???????????

??2.4 EDA (EXPLORATORY DATA ANALYSIS):

EDA are a set of steps used to explore and understand the dataset better before cleaning and transformation.

Rows: 5110

Column: 12 columns

In gender column, ID=56156 gender is other. I will replace this value using statistics (mode).
Age is between 0.08 and 82 years. I will use a range to group the ages.
Heart Disease, change 0 to “None”, and 1 to “Heart Disease”.
Marital status, change no to “Single”, and yes to “Married”.
The BMI range from 10.3 to 97.6, with N/A. I will replace the N/A using statistics (median). Also the BMI will be grouped. Also check if the missing values (N/A) are more than 30% of the count of data in the column. If yes, delete the column and reject the data. Else continue by replacing the N/A with the mean/average.

Stroke, 1 means “Has Stroke”, and 0 means “No Stroke”.?

2.5 Cleaning and Transformation using Power Query

i. Load the data into power query. Click a data point, go to data tab, click from Table range. This will open the dataset in power query.

ii. Check the number of rows. Go to transform, click count rows. The number of rows will be 5,111. To remove the error, click on the page filter, and select remove errors. This will return the number to 5,110.

iii. Remove unwanted columns. Remove the following: Hypertension, Work_Type, Residence_Type, Average_Glucose_Level, Smoking_Status.

iv. Change data type of the ID to text.

v. Gender. Female 2,994, Male 2,115, therefore replace other with female. Right click the column, select replace values, type Other and type Female.

vi. Create the age range using: babies (0-2), children (3-12), teens (13-19), young adults (20-29), adult (30-45), mid age (46-60), elderly (61-120).

vii. Replace 1 with Heart Disease and 0 with None. Right click the column, select replace values and replace

viii. Ever-married, replace No with Single, and Yes with Married.

ix. BMI, change the data type to decimal number. This will set the N/A to errors. Then use replace errors to change the errors to 28.9.

Then set the BMI range using conditional column

x. In stroke column, change 1 to Stroke, and 0 to No Stroke.

Model, Analyze, Visualize

Go to the home tab, click close and load to load back to MS. Excel.

领英推荐

Five Things to Share

David Edelman 4 个月前

How Amazon will "own" YOUR "Digital Me" and detect a…

Jon Nordmark 5 年前

Are blood sugar checks enough?

Dr. Nadeem Ahmed 1 年前

Create a Pivot Table. Click on a data point, go to insert, click pivot table, click ok.

i. Stroke %.?

Row: Stroke

Values: Count of ID

ii. Based on gender

Row: Gender

Values: Count of ID

Findings: In the dataset, we have more females, 58.6% (2,995), than males 41.4% (2,115).

FINDINGS: From our analysis, when the dataset is equally distributed between females and males, I discovered that males are more likely to have stroke (5.11%) compared to females (4.71%).

iii. Based on Age range

In the dataset, we have more of elderly people (1,304), followed by middle age people (1,188), next is adults (1,103), young adults (549), children (413), teens (378 and babies (175).

FINDINGS:

Elderly people are more likely to have stroke (13.57%), followed by middle age (4.97%), adult have a (1.00%) chance, babies have 0.57% and teens have 0.26%.?

Based on the dataset, I don't have any record of young adults and children affected by stroke.

iv. Marital Status

In the dataset, we have more married people (3,353) than single people (1,757)

Findings

From this dataset, married people (6.6%) are more likely to have stroke than single people (1.7%

V. Based on BMI Range

From the dataset, we have obesity 1,920, followed by overweight 1,610, healthy weight, 1,231 and underweight 349.

FINDINGS:

Over weight people are most likely to have stroke (7.14%), followed by Obesity (5.10%), Healthy Weight (2.84%), and people that are under weight will most likely not have stroke (0.29%).

vi. Based on Heart Disease

From the dataset, we have more people with no heart disease (4,834), than people with heart disease (276).

FINDINGS:

People with heart disease are most at risk (17.03%) for developing stroke, compared to people with no heart disease (4.18%).

Dashboard

Distribution

You can find? here the link to my Excel worksheet and Data.

Stroke Prediction Data Analysis- Temilola Balogun

Israel Victor

1 年

Thank you for sharing. Beautiful thought process!

Adekola Olagunju, COREN, MNSE

COREN, MNSE | Project Manager | Civil Engineer | Software Engineer | Cybersecurity

1 年

Great article on data analysis in healthcare, Temilola! Your insights are spot-on and provide valuable knowledge for the healthcare industry. Keep up the fantastic work! ??

oluwapelumi Balogun

1 年

Great work… this is really helpful !

Oluwatosin Omolade Solaru

Public Health Professional

1 年

Welldone Temi … these analysis is very insightful… shows key risk factors of stroke … exposures and outcomes are equally visible… good job ??

Omolade Bolaji

Pharmacologist | Business Development | Sales Growth | Health Tech

1 年

This is very insightful. Well done Temi!

查看更多评论

要查看或添加评论，请登录

Harnessing Data Insights: Revolutionizing Stroke Prediction for a Healthier Future.

Temilola Balogun

Operations & Strategy | Clinical Research | Drug Development

领英推荐

社区洞察

其他会员也浏览了

OK I get it no one wants to know about "Computer Vision Syndrome"- Disease!

Today's Artificial Intelligence is Ready to Battle the Next Pandemic?

Football & Neurodegenerative disease: Hazards of Heading

From The Top – June 2020

A Tragedy of Errors

Unmasking the Silent Killer: A Tale of Heartlands and Hope

New longevity app draws on decades of precision medicine data

In the news: January 2025

Lombard Chronicles - Day 34th