Data Analytics Capstone Project: Part 1, Studying Heart Disease Symptoms to derive insights from a public medical data.
Above Photo Credit: Ronit, UCI Data on Kaggle, Satyaki Ray analysed the public medical dataset (see content below)

Data Analytics Capstone Project: Part 1, Studying Heart Disease Symptoms to derive insights from a public medical data.

This article is part of a Capstone project by Satyaki Ray, that was conducted while he enrolled in the Texas A&M, Kingsville, Data Analytics Program recently. This is part 1 where the author investigates a heart disease symptoms public dataset to understand the factors that impact the chances of the disease.

Introduction The Capstone Project in TAMK Data Analytics course “Working in organizations and industry” required the download from Kaggle site and study of several datasets. Before starting with the hands-on exercise, we downloaded the data sets that are used for analysis and creating visualizations in Excel and Power BI from the given link: https://www.kaggle.com/datasets

Public Datasets Used:

The following two datasets were downloaded as csv/ xls files from the Kaggle portal.

1. https://www.kaggle.com/ronitf/heart-disease-uci

2. https://www.kaggle.com/gustavomartinez92/student-performance

Data Analytics Steps

Example 1: Heart Disease symptoms Imported csv data into excel and performed data analysis.

Step-1: Launched Excel and opened the heart.csv file. This is a data set for heart disease related symptoms or attributes. Once it opens, checked columns and data. The dataset has 303 records (rows) and 14 attributes (columns) per row. Created headers with meaningful names (for example trestbps is “resting blood pressure mm Hg”, Gender 1 is male, Gender is female, fbs is “fasting blood sugar” for which 1 is true and 2 is false and so on. Then created histograms on all columns after selecting all data per column and Insert > charts > All Charts > Histogram. We then resized the histogram and made chart label ( Figure 1). We would like to see if there are any relationship of gender to the symptoms such as blood pressure and cholesterol.

No alt text provided for this image

Figure 1: Creating histograms per column

Step-2: Create histograms for all 14 attributes (columns) by selecting the data (drag-select). Visualize the data at top of each column to create insights on heart disease symptom variability by Age, Gender, Chest pain, Blood Pressure level, cholesterol level, fasting blood sugar, resting ECG, max heart rate, exercise induced angina, old peak ST depression by exercise, slope of ST depression, Major vessel color, defect and target.

No alt text provided for this image

Figure 2: Get Data into Excel, compute histograms on all columns to see the variability of attributes or heart disease symptoms.

Step-3: Analyze the data closely. The histograms below show age variability of studied 303 data points range between 33 to 72 years. Two genders (male and female) have chest pain variabilities of 3 types. Blood pressure range between 94 – 167 with a peak of 112 in 70 cases. Serum cholesterol has peak value of 200 in 75 cases. Very high cholesterol cases are low in number 2-5 cases.

No alt text provided for this image

Figure 3: Data analysis by age, gender, chest pain, blood pressure, cholesterol.

Step-4: Analyze the data closely. The histograms below show heart disease symptom variability of 303 data points. The blood sugar true cases are high (250) and false cases are low. The maximum heart rates range between 143-150. There are 200 exercise induced angina cases.

No alt text provided for this image

Figure 4: Data analysis of blood sugar, ECG, max heart rate and angina cases

Step-5: Analysis of data by ST depression due to exercise, slope of ST depression, fluoroscopy major vessel color change (0-30), Normal and abnormal defects and target. There are nearly 140 cases of ST depression by exercise, 3 types of fluoroscopy, dominated by type 0, and 160 cases of dominant defects.

No alt text provided for this image

Figure 5: Data analysis of ST depression, slope of ST, major vessel color (fluoroscopy), defect and target.

Step-6: Now we examine the gender counts and percentages of the candidates from the 303 cases. We use the =COUNTIF(B14:B316,"0") and =COUNTIF(B14:B316,"1"), to identify number of Females (96 cases) and Males (207 cases). We use the =SUM(Q14:Q15) to verify that the total cases add to 303.Next we calculate the percentages by dividing Female cases / Total cases and Male cases/ Total cases. There are 31.68% females and 68.32% males in this study. We use these calculated tables to generate a bar graph and a pie chart for visualization of gender count and percentages (See Figure 6)

No alt text provided for this image

Figure 6: Bar Graph and Pie chart of gender count and percentages.

Step-7: In this step we examined the average “at rest blood pressure” by gender (Female average and Male average. To do that we used the =AVERAGE(D14:D316), =AVERAGEIFS(D14:D316,B14:B316,"0") and =AVERAGEIFS(D14:D316,B14:B316,"1") to extract the 3 averages. The first is the Aveg trestbps (overall average blood pressure, 131.62), the second is Female average blood pressure (133.08) and the third is male average blood pressure 130.95. It appears that females suffer from higher blood pressure in this study ( Figure 7). Blood pressure analysis based on gender.

No alt text provided for this image

Figure 7: At rest blood pressure analysis by gender (male, female).

Step-8: In this step we study the average cholesterol variability by gender. We use AVERAGE and AVERAGEIFS functions to achieve the results. It appears that female cases show higher average (261.30) versus male cases (239.29). The overall average cholesterol in this dataset is 246.25.?

No alt text provided for this image

Figure 8: Average cholesterol by gender

No alt text provided for this image

This completes the part 1 of my Capstone Project. It shows a simple analysis. Next article will be on part 2....please stay tuned. Any suggestions are welcome.

Author: Satyaki Ray, Houston, Texas, 2021, raysatyaki305@gmail.com

要查看或添加评论,请登录

Satyaki Ray的更多文章

社区洞察

其他会员也浏览了