登录查看更多内容

Diabetes Prediction

Dhruvrajsinh Vansia

Instrumentation and Control Engineer | Intern @PRL | Ex-President, PRAKALPA LDCE | IBM’24 Intern | Robotics Honours Degree | LDCE25 | OpenCV | MediaPipe | YOLOv8 | Matplot | ??

发布日期: 2024年6月2日

Machine learning (ML) models can be used to predict the likelihood of developing diabetes. These models use information about a person's medical history, lifestyle choices, blood sugar levels, and genetic makeup.

We will use SVM-Support Vector Machine which is a type of supervised learning algorithm.

It is used to solve classification and regression tasks.

For this case, it will classify whether a person has diabetes or not. Plots the data given in the graph and try to find a hyperplane. Medical information used are BMI, Blood Glucose Level, Insulin Level etc.

WORKFLOW

Diabetes Data
Data Preprocessing
Train Test Split
SVM Classifier
Trained SVM Classifier
Whether a Person is Diabetic or Not

领英推荐

April's Latest News and updates from DrugBank

DrugBank 10 个月前

Medlior Monthly Newsletter

Medlior Health Outcomes Research Ltd. 1 年前

Introducing GIF - An Automated Genomic Information…

GOSH DRIVE 4 个月前

GitHub Link:- https://github.com/dvvansia1805/Diabetes-Prediction-Machine-Learning-Model

Diabetics Data: - https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset

Google Collab: - https://colab.research.google.com/drive/19ciTTvzk6HdeKAOCwptwKxbkn7pHXQfU#scrollTo=i-epX2eqsA6v&uniqifier=1

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Data Collection and Analysis
PIMA Diabetes Datasets

#loading the diabetes data to pandas Dataframe
diabetes_dataset = pd.read_csv('/content/diabetes.csv')

#printing the first 5 rows of the dataset
diabetes_dataset.head()

	Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age	Outcome
0	6	148	72	35	0	33.6	0.627	50	1
1	1	85	66	29	0	26.6	0.351	31	0
2	8	183	64	0	0	23.3	0.672	32	1
3	1	89	66	23	94	28.1	0.167	21	0
4	0	137	40	35	168	43.1	2.288	33	1

#number of rows and columnes in the dataset
diabetes_dataset.shape

#getting the statistical measures of the data
diabetes_dataset.describe()


Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age	Outcome
count	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000	768.000000
mean	3.845052	120.894531	69.105469	20.536458	79.799479	31.992578	0.471876	33.240885	0.348958
std	3.369578	31.972618	19.355807	15.952218	115.244002	7.884160	0.331329	11.760232	0.476951
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.078000	21.000000	0.000000
25%	1.000000	99.000000	62.000000	0.000000	0.000000	27.300000	0.243750	24.000000	0.000000
50%	3.000000	117.000000	72.000000	23.000000	30.500000	32.000000	0.372500	29.000000	0.000000
75%	6.000000	140.250000	80.000000	32.000000	127.250000	36.600000	0.626250	41.000000	1.000000
max	17.000000	199.000000	122.000000	99.000000	846.000000	67.100000	2.420000	81.000000	1.000000

diabetes_dataset['Outcome'].value_counts()

Outcome
0    500
1    268
Name: count, dtype: int64

0 --> Non Diabetic 1 --> Diabetic

diabetes_dataset.groupby('Outcome').mean()

Pregnancies	Glucose	BloodPressure	SkinThickness	Insulin	BMI	DiabetesPedigreeFunction	Age
Outcome								
0	3.298000	109.980000	68.184000	19.664000	68.792000	30.304200	0.429734	31.190000
1	4.865672	141.257463	70.824627	22.164179	100.335821	35.142537	0.550500	37.067164

#seperating data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']

print(X)

     Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0              6      148             72             35        0  33.6   
1              1       85             66             29        0  26.6   
2              8      183             64              0        0  23.3   
3              1       89             66             23       94  28.1   
4              0      137             40             35      168  43.1   
..           ...      ...            ...            ...      ...   ...   
763           10      101             76             48      180  32.9   
764            2      122             70             27        0  36.8   
765            5      121             72             23      112  26.2   
766            1      126             60              0        0  30.1   
767            1       93             70             31        0  30.4   

     DiabetesPedigreeFunction  Age  
0                       0.627   50  
1                       0.351   31  
2                       0.672   32  
3                       0.167   21  
4                       2.288   33  
..                        ...  ...  
763                     0.171   63  
764                     0.340   27  
765                     0.245   30  
766                     0.349   47  
767                     0.315   23  

[768 rows x 8 columns]

print(Y)

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

Data Preprocessing

scalar = StandardScaler()
scalar.fit(X)

standardized_data = scalar.transform(X)
print(standardized_data)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]
  
X = standardized_data
Y = diabetes_dataset['Outcome']

print(X)
print(Y)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]
0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

Train Test Split

X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size= 0.2, stratify= Y, random_state=2)
print(X.shape, X_train.shape, X_test.shape)

(768, 8) (614, 8) (154, 8)

Training The Model

classifier = svm.SVC(kernel='linear')
#training the support vector machine classifier
classifier.fit(X_train, Y_train)

#Accuracy score of the training data
X_train_prediction = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

print('Accuracy score of training data is : ', training_data_accuracy)

Accuracy score of training data is :  0.7866449511400652

#Accuracy score of the testing data
X_test_prediction = classifier.predict(X_test)
testing_data_accuracy = accuracy_score(X_test_prediction, Y_test)

print('Accuracy score of training data is : ', testing_data_accuracy)

Accuracy score of training data is :  0.7727272727272727

Making a Predictive System

input_data = (3,169,74,19,125,29.9,0.268,31)

#changing the input data to numpy array
input_data_as_numpy_arrray = np.asarray(input_data)

#reshape the array as we are predicting for one instance
input_data_shaped = input_data_as_numpy_arrray.reshape(1,-1)

#standardized the input
std_data = scalar.transform(input_data_shaped)

prediction = classifier.predict(std_data)

if (prediction[0] == 0):
  print('The person is non diabetic')
else:
  print('The person is diabetic')
  
 The person is diabetic

要查看或添加评论，请登录

Dhruvrajsinh Vansia的更多文章

Credit Card Fraud Detection Model

2024年6月6日

Credit Card Fraud Detection Model

This project aims to develop a robust and efficient machine-learning model that detects fraudulent credit card…
Movie Genre Recommendations

2024年6月4日

Movie Genre Recommendations

This project aims to build a movie recommendation system using Machine Learning (ML) techniques. The system will…
Fake News Prediction Model

2024年6月3日

Fake News Prediction Model

In this model, we will give a labeled data set. It consists of several thousands of news articles.

2 条评论
SONAR Rock VS Mine Prediction

2024年6月1日

SONAR Rock VS Mine Prediction

We will be predicting if there is a rock or mine under the submarine by using data from SONAR. SONAR Stands for - Sound…
The Impact of Technology on College Life: Pros and Cons of the Digital Age

2023年8月8日

The Impact of Technology on College Life: Pros and Cons of the Digital Age

In the modern world, technology has revolutionized every aspect of human life, including education. College students…
The Role of Soft Skills in Student Success: Beyond Academics

2023年6月6日

The Role of Soft Skills in Student Success: Beyond Academics

When we think of student success, we often focus on academic skills such as reading, writing, math, and science…
5 Tips for Building a Strong Professional Network as a Student

2023年6月5日

5 Tips for Building a Strong Professional Network as a Student

In today's competitive job market, building a strong professional network is essential for students looking to…
Mastering Stress Management: Strategies for College Students

2023年6月3日

Mastering Stress Management: Strategies for College Students

College life is an exciting and transformative period, but it can also be a source of stress and overwhelm. The key to…

See all articles

Diabetes Prediction

Dhruvrajsinh Vansia

Instrumentation and Control Engineer | Intern @PRL | Ex-President, PRAKALPA LDCE | IBM’24 Intern | Robotics Honours Degree | LDCE25 | OpenCV | MediaPipe | YOLOv8 | Matplot | ??

WORKFLOW

领英推荐

Dhruvrajsinh Vansia的更多文章

社区洞察

其他会员也浏览了

Turn Results into Insight. How to Visualize Your Microbiome Data

Temedica's medical researchers deliver real-life insights on axial spondyloarthritis (axSpA)

Digital Dawn: Revolutionizing Healthcare with AI & eTools

Ultra-full "PCR abnormal curve" analysis!

Analysis of the key points of fluorescence quantitative PCR

Are Data Scientists the New Rock Stars of Pharma?

Interview with Savan Devani, CEO, BioTrillion - Speaker at 7th Annual Global Big Data Conf - Santa Clara Aug 2019

40 // AI Market Watch Special

?? Incidental Findings - Friends or Foes? ??

Reducing the bias from on the inverse probability of treatment weighting (IPTW)

WORKFLOW

领英推荐

Dhruvrajsinh Vansia的更多文章

Credit Card Fraud Detection Model

Movie Genre Recommendations

Fake News Prediction Model

SONAR Rock VS Mine Prediction

The Impact of Technology on College Life: Pros and Cons of the Digital Age

The Role of Soft Skills in Student Success: Beyond Academics

5 Tips for Building a Strong Professional Network as a Student

Mastering Stress Management: Strategies for College Students

社区洞察

其他会员也浏览了

Turn Results into Insight. How to Visualize Your Microbiome Data

Temedica's medical researchers deliver real-life insights on axial spondyloarthritis (axSpA)

Digital Dawn: Revolutionizing Healthcare with AI & eTools

Ultra-full "PCR abnormal curve" analysis!

Analysis of the key points of fluorescence quantitative PCR

Are Data Scientists the New Rock Stars of Pharma?

Interview with Savan Devani, CEO, BioTrillion - Speaker at 7th Annual Global Big Data Conf - Santa Clara Aug 2019

40 // AI Market Watch Special

?? Incidental Findings - Friends or Foes? ??

Reducing the bias from on the inverse probability of treatment weighting (IPTW)