Harnessing Data Insights: Revolutionizing Stroke Prediction for a Healthier Future.

Harnessing Data Insights: Revolutionizing Stroke Prediction for a Healthier Future.

  1. Introduction

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.

The tool I will use for this case study is MS Excel 2021.

The data analytics process will follow the PMAVD (Prepare, Model, Analyze, Visualize and Dashboard) process.

  1. Preparation

In preparation, I made clear what the objective is, extracted the data from the web, cleaned and transformed, and loaded it into the software for modeling and analysis.

2.1 Objectives

In this dataset, I will create a dashboard that can be used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

2.2 Measures

The dataset for the project has the following columns:

1) id: unique identifier

2) gender: "Male", "Female" or "Other"

3) age: age of the patient

4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension

5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease

6) ever_married: "No" or "Yes"

7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"

8) Residence_type: "Rural" or "Urban"

9) avg_glucose_level: average glucose level in blood

10) bmi: body mass index

11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*

12) stroke: 1 if the patient had a stroke or 0 if not

*Note: "Unknown" in smoking_status means that the information is unavailable for this patient

2.2.1 Dictionary

stroke 2

2.3 Get the Dataset:

This is the link to download the dataset from kaggle:

Stroke Dataset from Kaggle

???????????

??2.4 EDA (EXPLORATORY DATA ANALYSIS):

EDA are a set of steps used to explore and understand the dataset better before cleaning and transformation.

Rows: 5110

Column: 12 columns

stroke 3

  • In gender column, ID=56156 gender is other. I will replace this value using statistics (mode).
  • Age is between 0.08 and 82 years. I will use a range to group the ages.
  • Heart Disease, change 0 to “None”, and 1 to “Heart Disease”.
  • Marital status, change no to “Single”, and yes to “Married”.
  • The BMI range from 10.3 to 97.6, with N/A. I will replace the N/A using statistics (median). Also the BMI will be grouped. Also check if the missing values (N/A) are more than 30% of the count of data in the column. If yes, delete the column and reject the data. Else continue by replacing the N/A with the mean/average.

stroke 4
stroke 5

  • Stroke, 1 means “Has Stroke”, and 0 means “No Stroke”.?

2.5 Cleaning and Transformation using Power Query

i. Load the data into power query. Click a data point, go to data tab, click from Table range. This will open the dataset in power query.

ii. Check the number of rows. Go to transform, click count rows. The number of rows will be 5,111. To remove the error, click on the page filter, and select remove errors. This will return the number to 5,110.

iii. Remove unwanted columns. Remove the following: Hypertension, Work_Type, Residence_Type, Average_Glucose_Level, Smoking_Status.

iv. Change data type of the ID to text.

v. Gender. Female 2,994, Male 2,115, therefore replace other with female. Right click the column, select replace values, type Other and type Female.

vi. Create the age range using: babies (0-2), children (3-12), teens (13-19), young adults (20-29), adult (30-45), mid age (46-60), elderly (61-120).

stroke 6

vii. Replace 1 with Heart Disease and 0 with None. Right click the column, select replace values and replace

viii. Ever-married, replace No with Single, and Yes with Married.

ix. BMI, change the data type to decimal number. This will set the N/A to errors. Then use replace errors to change the errors to 28.9.

Then set the BMI range using conditional column


stroke 7

x. In stroke column, change 1 to Stroke, and 0 to No Stroke.

  1. Model, Analyze, Visualize

Go to the home tab, click close and load to load back to MS. Excel.


Stroke 8

Create a Pivot Table. Click on a data point, go to insert, click pivot table, click ok.

i. Stroke %.?

Row: Stroke

Values: Count of ID

stroke 9

ii. Based on gender

Row: Gender

Values: Count of ID


stroke 10

Findings: In the dataset, we have more females, 58.6% (2,995), than males 41.4% (2,115).

stroke 11

FINDINGS: From our analysis, when the dataset is equally distributed between females and males, I discovered that males are more likely to have stroke (5.11%) compared to females (4.71%).

iii. Based on Age range

stroke 12

In the dataset, we have more of elderly people (1,304), followed by middle age people (1,188), next is adults (1,103), young adults (549), children (413), teens (378 and babies (175).

stroke 13

FINDINGS:

Elderly people are more likely to have stroke (13.57%), followed by middle age (4.97%), adult have a (1.00%) chance, babies have 0.57% and teens have 0.26%.?

Based on the dataset, I don't have any record of young adults and children affected by stroke.

iv. Marital Status

stroke 14

In the dataset, we have more married people (3,353) than single people (1,757)

stroke 15

Findings

From this dataset, married people (6.6%) are more likely to have stroke than single people (1.7%

V. Based on BMI Range

stroke 16

From the dataset, we have obesity 1,920, followed by overweight 1,610, healthy weight, 1,231 and underweight 349.

stroke 17

FINDINGS:

Over weight people are most likely to have stroke (7.14%), followed by Obesity (5.10%), Healthy Weight (2.84%), and people that are under weight will most likely not have stroke (0.29%).


vi. Based on Heart Disease

stroke 18

From the dataset, we have more people with no heart disease (4,834), than people with heart disease (276).

stroke 19

FINDINGS:

People with heart disease are most at risk (17.03%) for developing stroke, compared to people with no heart disease (4.18%).


  1. Dashboard

stroke 20


  1. Distribution

You can find? here the link to my Excel worksheet and Data.

Stroke Prediction Data Analysis- Temilola Balogun









Israel Victor

|Pharmacist| |Drug Design and Discovery| |Cancer Research | |Pharma Quality Management|

1 年

Thank you for sharing. Beautiful thought process!

回复
Adekola Olagunju, COREN, MNSE

COREN, MNSE | Project Manager | Civil Engineer | Software Engineer | Cybersecurity

1 年

Great article on data analysis in healthcare, Temilola! Your insights are spot-on and provide valuable knowledge for the healthcare industry. Keep up the fantastic work! ??

回复

Great work… this is really helpful !

回复
Oluwatosin Omolade Solaru

Public Health Professional

1 年

Welldone Temi … these analysis is very insightful… shows key risk factors of stroke … exposures and outcomes are equally visible… good job ??

回复
Omolade Bolaji

Pharmacologist | Business Development | Sales Growth | Health Tech

1 年

This is very insightful. Well done Temi!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了