Diabetes Prediction

Diabetes Prediction

Machine learning (ML) models can be used to predict the likelihood of developing diabetes. These models use information about a person's medical history, lifestyle choices, blood sugar levels, and genetic makeup.

We will use SVM-Support Vector Machine which is a type of supervised learning algorithm.

It is used to solve classification and regression tasks.

For this case, it will classify whether a person has diabetes or not. Plots the data given in the graph and try to find a hyperplane. Medical information used are BMI, Blood Glucose Level, Insulin Level etc.

WORKFLOW

  1. Diabetes Data
  2. Data Preprocessing
  3. Train Test Split
  4. SVM Classifier
  5. Trained SVM Classifier
  6. Whether a Person is Diabetic or Not


GitHub Link:- https://github.com/dvvansia1805/Diabetes-Prediction-Machine-Learning-Model

Diabetics Data: - https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset

Google Collab: - https://colab.research.google.com/drive/19ciTTvzk6HdeKAOCwptwKxbkn7pHXQfU#scrollTo=i-epX2eqsA6v&uniqifier=1


import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Data Collection and Analysis
PIMA Diabetes Datasets

#loading the diabetes data to pandas Dataframe
diabetes_dataset = pd.read_csv('/content/diabetes.csv')

#printing the first 5 rows of the dataset
diabetes_dataset.head()

	Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age	Outcome
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

#number of rows and columnes in the dataset
diabetes_dataset.shape

#getting the statistical measures of the data
diabetes_dataset.describe()


Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age	Outcome
count	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000
mean	3.845052	120.894531	69.105469	20.536458	79.799479	31.992578	0.471876	33.240885	0.348958
std	3.369578	31.972618	19.355807	15.952218	115.244002	7.884160	0.331329	11.760232	0.476951
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.078000	21.000000	0.000000
25%	1.000000	99.000000	62.000000	0.000000	0.000000	27.300000	0.243750	24.000000	0.000000
50%	3.000000	117.000000	72.000000	23.000000	30.500000	32.000000	0.372500	29.000000	0.000000
75%	6.000000	140.250000	80.000000	32.000000	127.250000	36.600000	0.626250	41.000000	1.000000
max	17.000000	199.000000	122.000000	99.000000	846.000000	67.100000	2.420000	81.000000	1.000000

diabetes_dataset['Outcome'].value_counts()

Outcome
0    500
1    268
Name: count, dtype: int64

0 --> Non Diabetic 1 --> Diabetic

diabetes_dataset.groupby('Outcome').mean()

Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age
Outcome								
0	3.298000	109.980000	68.184000	19.664000	68.792000	30.304200	0.429734	31.190000
1	4.865672	141.257463	70.824627	22.164179	100.335821	35.142537	0.550500	37.067164

#seperating data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']

print(X)

     Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0              6      148             72             35        0  33.6   
1              1       85             66             29        0  26.6   
2              8      183             64              0        0  23.3   
3              1       89             66             23       94  28.1   
4              0      137             40             35      168  43.1   
..           ...      ...            ...            ...      ...   ...   
763           10      101             76             48      180  32.9   
764            2      122             70             27        0  36.8   
765            5      121             72             23      112  26.2   
766            1      126             60              0        0  30.1   
767            1       93             70             31        0  30.4   

     DiabetesPedigreeFunction  Age  
0                       0.627   50  
1                       0.351   31  
2                       0.672   32  
3                       0.167   21  
4                       2.288   33  
..                        ...  ...  
763                     0.171   63  
764                     0.340   27  
765                     0.245   30  
766                     0.349   47  
767                     0.315   23  

[768 rows x 8 columns]

print(Y)

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

Data Preprocessing

scalar = StandardScaler()
scalar.fit(X)

standardized_data = scalar.transform(X)
print(standardized_data)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]
  
X = standardized_data
Y = diabetes_dataset['Outcome']

print(X)
print(Y)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]
0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

Train Test Split

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size= 0.2, stratify= Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)

(768, 8) (614, 8) (154, 8)

Training The Model

classifier = svm.SVC(kernel='linear')
#training the support vector machine classifier
classifier.fit(X_train, Y_train)

#Accuracy score of the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

print('Accuracy score of training data is : ', training_data_accuracy)

Accuracy score of training data is :  0.7866449511400652

#Accuracy score of the testing data
X_test_prediction = classifier.predict(X_test)
testing_data_accuracy = accuracy_score(X_test_prediction, Y_test)

print('Accuracy score of training data is : ', testing_data_accuracy)

Accuracy score of training data is :  0.7727272727272727

Making a Predictive System

input_data = (3,169,74,19,125,29.9,0.268,31)

#changing the input data to numpy array
input_data_as_numpy_arrray = np.asarray(input_data)

#reshape the array as we are predicting for one instance
input_data_shaped = input_data_as_numpy_arrray.reshape(1,-1)

#standardized the input
std_data = scalar.transform(input_data_shaped)

prediction = classifier.predict(std_data)

if (prediction[0] == 0):
  print('The person is non diabetic')
else:
  print('The person is diabetic')
  
 The person is diabetic
        


要查看或添加评论,请登录

Dhruvrajsinh Vansia的更多文章

社区洞察

其他会员也浏览了