The Machine Learning Model in R is used for identifying Heart Disease.

The Machine Learning Model in R is used for identifying Heart Disease.

Hello, Connections,

This is my first blog post. I'm going to talk about building some data models in R using the information I found on the UCI website about heart attacks.

The following data is about the global level perspective.

Data Dictionary

I got the data for my project from a big online library called the UCI Machine Learning Repository.

Statlog (Heart) - UCI Machine Learning Repository

Steps to Clean Data, I followed

Step-1:

I read files in R Studio; I used the online version. I had received the data as an Excel file, hence I used the readxl library to read the file

dfe=read_excel("VIT Internal Assignment Data.xlsx",sheet="#1")

Step-2:

The base data on the website is clean, but as it was an assignment we were provided data with some missing values. Hence the next step was to impute missing values and standardize the categorical values.

Code to impute missing values for four specific columns. The data was imputed using mean.

dfe$BP[is.na(dfe$BP)] = mean(dfe$BP,na.rm=T)

dfe$Cholesterol[is.na(dfe$Cholesterol)] = mean(dfe$Cholesterol,na.rm=T)

dfe$`Max HR`[is.na(dfe$`Max HR`)] = mean(dfe$`Max HR`,na.rm=T)

Code to Standardize the Class Column. The Data was Standardized using

dfe$Gender=ifelse(dfe$Gender=="Male",1,0)

Graphs Created

1.?????? Start by preparing a dataset so that it is in the right format.

2.?????? Create a plot object using the function ggplot() and install package ggplot.

3.?????? Define so-called “aesthetic mappings”, i.e. we determine which variables should be displayed on the X and Y axes and which variables are used to group the data. The function we use for this is called aes().

4.?????? Add one or more “layers” to the plot. These layers define how something should be displayed, e.g. as a line or histogram. These functions begin with the prefix geom_, e.g. geom_line().

5.?????? Add further specifications, such as the color scheme that should be used, and possibly facetting by levels of a grouping variable.

Graph No:- 1

ggplot(data =dfe,aes(BP,Cholesterol))+

geom_point()

Fig.1. Cholesterol VS BP

Graph No:- 2

ggplot(data =dfe,aes(BP))+

? geom_histogram(bins = 10)

Fig.2.BP

Graph No:- 3

ggplot(data =dfe,aes(BP,Cholesterol))+

? geom_boxplot()+

? geom_jitter(alpha=0.4,color="tomato")


Fig.3. Cholesterol VS BP

Graph No:- 4

ggplot(data =dfe,aes(BP))+

? geom_bar()

Fig.4. BP

Graph No:- 5

ggplot(data =dfe,aes(1:nrow(dfe),BP))+

? geom_line()

Fig.5. BP

Model Building: -

Linear Regression and Logistic Regression

Model No: - 1

Predict Heart Disease using given data.

regressor=lm(data = dfe,BP~.)

summary(regressor)

ypred=predict(regressor,newdata=dfe)

By prediction, the Multiple R-squared value is 0.1825 which is not equal to 1, so this is not a good model.

Fig.6. BP Prediction Model Summary

?Model No: - 2

Also, Predict Heart Disease using given data.

regressor=lm(data = dfe,BP~.)

summary(regressor)

ypred=predict(regressor,newdata=dfe)

ypred2=predict(regressor,newdata=dfe)

By prediction, the Multiple R-squared value is 0.1825 which is not equal to 1, so this is not a good model.

Fig.7. BP Prediction Model Summary

?Model No: - 3

Predict Heart Disease using given data.

classifier=glm(data=dfe,`Heart Disease`~.,family = binomial)

summary(classifier)

ypred3=predict(classifier,newdata=dfe)

Fig.8. Heart Disease Prediction Model Summary

By summery

I can interpret that the Number of Vessels Fluro, Thallium, BP and Chest Pain Type have a significant impact on the Model.

Below is the Confusion Matrix,


Below is the AUC Curve and AUC value is 1.

That’s all folks.

?

Rohit K.

Founder's Office @Datamango | Psychology | Master of Management Studies

9 个月

Insightful!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了