The Machine Learning Model in R is used for identifying Heart Disease.
Hello, Connections,
This is my first blog post. I'm going to talk about building some data models in R using the information I found on the UCI website about heart attacks.
The following data is about the global level perspective.
Data Dictionary
I got the data for my project from a big online library called the UCI Machine Learning Repository.
Steps to Clean Data, I followed
Step-1:
I read files in R Studio; I used the online version. I had received the data as an Excel file, hence I used the readxl library to read the file
dfe=read_excel("VIT Internal Assignment Data.xlsx",sheet="#1")
Step-2:
The base data on the website is clean, but as it was an assignment we were provided data with some missing values. Hence the next step was to impute missing values and standardize the categorical values.
Code to impute missing values for four specific columns. The data was imputed using mean.
dfe$BP[is.na(dfe$BP)] = mean(dfe$BP,na.rm=T)
dfe$Cholesterol[is.na(dfe$Cholesterol)] = mean(dfe$Cholesterol,na.rm=T)
dfe$`Max HR`[is.na(dfe$`Max HR`)] = mean(dfe$`Max HR`,na.rm=T)
Code to Standardize the Class Column. The Data was Standardized using
dfe$Gender=ifelse(dfe$Gender=="Male",1,0)
Graphs Created
1.?????? Start by preparing a dataset so that it is in the right format.
2.?????? Create a plot object using the function ggplot() and install package ggplot.
3.?????? Define so-called “aesthetic mappings”, i.e. we determine which variables should be displayed on the X and Y axes and which variables are used to group the data. The function we use for this is called aes().
4.?????? Add one or more “layers” to the plot. These layers define how something should be displayed, e.g. as a line or histogram. These functions begin with the prefix geom_, e.g. geom_line().
5.?????? Add further specifications, such as the color scheme that should be used, and possibly facetting by levels of a grouping variable.
Graph No:- 1
ggplot(data =dfe,aes(BP,Cholesterol))+
geom_point()
Graph No:- 2
ggplot(data =dfe,aes(BP))+
? geom_histogram(bins = 10)
Graph No:- 3
ggplot(data =dfe,aes(BP,Cholesterol))+
? geom_boxplot()+
? geom_jitter(alpha=0.4,color="tomato")
领英推荐
Graph No:- 4
ggplot(data =dfe,aes(BP))+
? geom_bar()
Graph No:- 5
ggplot(data =dfe,aes(1:nrow(dfe),BP))+
? geom_line()
Model Building: -
Linear Regression and Logistic Regression
Model No: - 1
Predict Heart Disease using given data.
regressor=lm(data = dfe,BP~.)
summary(regressor)
ypred=predict(regressor,newdata=dfe)
By prediction, the Multiple R-squared value is 0.1825 which is not equal to 1, so this is not a good model.
?Model No: - 2
Also, Predict Heart Disease using given data.
regressor=lm(data = dfe,BP~.)
summary(regressor)
ypred=predict(regressor,newdata=dfe)
ypred2=predict(regressor,newdata=dfe)
By prediction, the Multiple R-squared value is 0.1825 which is not equal to 1, so this is not a good model.
?Model No: - 3
Predict Heart Disease using given data.
classifier=glm(data=dfe,`Heart Disease`~.,family = binomial)
summary(classifier)
ypred3=predict(classifier,newdata=dfe)
By summery
I can interpret that the Number of Vessels Fluro, Thallium, BP and Chest Pain Type have a significant impact on the Model.
Below is the Confusion Matrix,
Below is the AUC Curve and AUC value is 1.
That’s all folks.
?
Founder's Office @Datamango | Psychology | Master of Management Studies
9 个月Insightful!