7 Types of Classification using python

Rupak Roy

Sr. Manager Data Science for GenAi & ML Analytics | KaggleX BIPOC Mentor | Hybrid-Genetic Auto-Ai Programming | Big Data ML-Ops Automation

发布日期: 2020年6月28日

Full guide to knn, logistic, support vector machine, kernel svm, naive bayes, decision tree classification, random forest classification.

Hi, how are you doing,?I hope it's great……….

Today let's understand and perform all types of classification and also we will compare each performance for its accurate prediction.

Let’s get started, we will use the demographics to understand and predict if the client will subscribe a term deposit.

Introducing 7 types of Classification with python

This dataset is public available for research. The details are described in [Moro et al., 2011]. [Moro et al., 2011] S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology.?

?In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference — ESM’2011, pp. 117–121, Guimar?es, Portugal, October, 2011. EUROSIS.

The data is related with direct marketing campaigns of a Portuguese banking institution.?

The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required,?

in order to access if the product (bank term deposit) would be (or not) subscribed.

The original owners of the dataset:

Created by: Paulo Cortez (Univ. Minho) and Sérgio Moro (ISCTE-IUL) @ 2012

And it contains attributes:

Client ID, Income, Age, Loan, Default

Let’s get started with our commonly used Classification method:

1.) Logistic Regression then we will use

2.) Knn

3.) Support Vector Machine

4.) Kernel SVM

5.) Naive Bayes

6.) Decision Tree Classification

7.) Random Forest Classification

Any else Classification? Let me know in the comment below.

#Logistic Regression
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values

Well till here it’s the same as before, load the data then split the data into X and Y where Y is the dependent/target variable 4th column (defaulters column) and rest from 0 to 3 are independent variable X.

Note: in python index position of the columns start from 0 and not from 1.

Then we will split the data into train & test datasets, After we will transform all the column values into one standard value/range that will reduce the spread, magnitude of the data points without losing the original meaning of the data. COOL!

This helps the algorithm to compute the data faster and efficiently.

#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Now its time to fit the data with logistic regression and predict with test results.

#Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
lr_model= LogisticRegression(random_state = 0)
lr_model.fit(X_train, y_train)
# Predicting the Test set results
y_pred = lr_model.predict(X_test)

DONE…?!!! super easy isn’t it??

Let’s compare the predicted results with our original dataset

#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df

Fatten is helps to represent data in 1-dimensional array like a list.

OK. We can see it's quite accurately able to identify. I can understand it's difficult to examine the whole data like this…. For that, we will use aggregated result method ‘confusion matrix’.

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm

Our model is able to identify 0’s i.e. non-defaulters which are actually non-defaulters 424 non-defaulters from a total 500 ( 500 from total 1997 is becoz we have split the dataset into train and test ~ 75% — 25% )

And 18 from 500 are False Negative(FN) which means 18 were defaulters but predicted as non-defaulters. The same goes with 8 from 500 are non-defaulters but predicted as defaulters And prediction 50 refers to defaulters from 500 that are actually defaulters.

This is a really good model with high accuracy and no model is 100% accurate if so THEN there might be some bias issue.

Alright we have another metric to evaluate the model performance is by using metrics.accuracy_score

#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))

GREAT! We have a model accuracy score of 0.948 i.e. 95%

And balanced accuracy score?.858 i.e. 86%

And we have one more precision

Precision-Recall is a useful measure of the success of prediction when the classes are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned. Precision is more important than recall when you would like to have fewer False Positives in trade-off to have more False Negatives

Finally its time to predict with new?input

#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
lr_model.predict(new_data)
#if we wish to enter manually
lr_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan = 66952.7, 28, 8770.1 seems will be a non-defaulter (class=’0').

BONUS

Save and load the model.

#save the model in the disk
import pickle
# save the model to disk
filename = 'class_model.sav'
pickle.dump(lr_model, open(filename, 'wb'))
# load the model from disk
filename1 = 'class_model.sav'
loaded_model = pickle.load(open(filename1, 'rb'))
#another method using joblib
'''Pickled model as a file using joblib: Joblib is the replacement of pickle as
 it is more efficent on objects that carry large numpy arrays. 
'''
from sklearn.externals import joblib 
  
# Save the model as a pickle in a file 
joblib.dump(lr_model, 'classification.pkl') 
  
# Load the model from the file 
loaded_model2 = joblib.load('classification.pkl')  
  
# Use the loaded model to make predictions 
loaded_model2.predict(X_test)

Let’s put all the pieces together

# Logistic Regression
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#---------------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
lr_model= LogisticRegression(random_state = 0)
lr_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = lr_model.predict(X_test)
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
"""We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format else inverse transform will give back the original value"""
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
lr_model.predict(new_data)
#if we wish to enter manually
lr_model.predict([[66952.7,28,8770.1]])
#---------------------------------------
#save the model in the disk
import pickle
# save the model to disk
filename = 'class_model.sav'
pickle.dump(lr_model, open(filename, 'wb'))
# load the model from disk
filename1 = 'class_model.sav'
loaded_model = pickle.load(open(filename1, 'rb'))
#another method using joblib
'''Pickled model as a file using joblib: Joblib is the replacement of pickle as
 it is more efficent on objects that carry large numpy arrays. 
'''
from sklearn.externals import joblib 
  
# Save the model as a pickle in a file 
joblib.dump(lr_model, 'classification.pkl') 
  
# Load the model from the file 
loaded_model2 = joblib.load('classification.pkl')  
  
# Use the loaded model to make predictions 
loaded_model2.predict(X_test)

Congratulations! We have successfully completed our first Classification model.

Next is?KNN

What is KNN?

K-Nearest Neighbors (KNN) is one of the simplest algorithms used in Machine Learning for regression and classification. KNN algorithms classify new data points based on similarity measures (e.g. Euclidean distance function).

Classification is done by a majority vote to its neighbors (K).

Let’s get started on how to apply KNN for classification problems.

#K-Nearest Neighbors (K-NN)
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()

X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#-------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Till here it’s the same as before, load the data, define X and Y?, split the data, and then scale the independent variables

NOW we will fit the KNN to our training data set where K nearest neighbors K =3?, metric = minkowski which helps to measure three-dimensional Euclidean space, and p = 2 is Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2.

# Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn_model.fit(X_train, y_train)

That’s it….!

It's time to predict with the test dataset.

# Predicting the Test set results
y_pred = knn_model.predict(X_test)

Alright let's compare our predicted results with our original results.

#actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df

well it seems,, our model is predicting very good….

Now let’s try to assess our model with evaluation metrics.

First is confusion matrix

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Wow it's classifying True positive and True negative more accurately than with logistic regression.

#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))

WELL WELL WELL

Even the accuracy score is 98% and balanced accuracy score?.957 i.e. 96%

And we have one more precision

Precision is more important than recall when you would like to have fewer False Positives in trade-off to have more False Negatives.

Now let's predict with totally unseen random data, if income, age, loan = 66952.7,28,8770.1

#if income, age, loan = 66952.7,28,8770.1
import numpy as np
#Create a numpy array
new_data = np.array([66952.7,28,8770.1])
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
knn_model.predict(new_data)
#if we wish to enter manually
knn_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan= 66952.7, 28, 8770.1 seems will to be a non-defaulter (class=’0') with KNN model

Let’s put all of the codes together

#K-Nearest Neighbors (K-NN)
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = knn_model.predict(X_test)
#Model Evaluation------------------------------------
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#---------------------------------------------------
#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
#Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
knn_model.predict(new_data)
#if we wish to enter manually
knn_model.predict([[66952.7,28,8770.1]])
#scaled version input data
knn_model.predict([[0.382027,-0.979416,1.45499]])

Congratulations! We have successfully completed our KNN model to classify defaulters

Next is SVM an another powerful classifier

SUPPORT VECTOR MACHINE

What is SVM??

SVM is a supervised machine learning algorithm that can be used for classification or regression problems

In brief, the principle working of SVM is to find the nearest data point(either class) with the help of hyper-plane. This distance is called as Margin

SVM is highly preferred by many as it produces significant accuracy with less computation power.

Lets get understand this with the help of an?example.

#SVM
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#--------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Well till here it’s the same as others. First, we import the data that defined X & Y, Split the data into train and test sets, scale the independent variables to reduce the magnitude of the spread of data points without losing their original meaning.

It's time to fit the SVM into the training set.

#Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)

Now our model is ready to predict with new data

#Predicting the Test set results
y_pred = svm_model.predict(X_test)

Let’s access the performance of our?model.

First, we will compare the predicted values with the actual output

#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df

The second Performance metrics is the Confusion Matrix

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Its classifies True positive and True negative more accurately than False-positive & False-negative.

Further, we can use scikit sklearn evaluation metrics to assess the model accuracy score.

#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))

The accuracy score is 95% and the balanced accuracy score?.87 i.e. 87%

And we have one more precision

领英推荐

The Nixtlar library, Gaussian Processes with PyMC…

Rami Krispin 3 个月前

Explainable ML models with SHAP

Patrick Nicolas 1 年前

Top Languages to Master Machine Learning!

Nicole Bre?a Ruelas 11 个月前

Precision is more important than recall when you would like to have fewer False Positives in trade-off to have more False Negatives.

Now let's predict with totally unseen random data, if income, age, loan = 66952.7,28,8770.1

#if income, age, loan = 66952.7,28,8770.1
import numpy as np
# Create a numpy array
new_data = np.array([66952.7,28,8770.1])
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#---------------------------------------
svm_model.predict(new_data)
#if we wish to enter manually
svm_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan= 66952.7, 28, 8770.1 seems will to be a non-defaulter (class=’0') with SVM model.

Let’s put all of the codes together.

#SVM
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#-------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = svm_model.predict(X_test)
#We can also compare the actual versus predicted
df = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
df
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#if income, age, loan = 66952.7,28,8770.1
import numpy as np
#Create a numpy array
new_data = np.array([66952.7,28,8770.1])
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
svm_model.predict(new_data)
#if we wish to enter manually
svm_model.predict([[66952.7,28,8770.1]])

Congratulations! We have successfully completed our SVM model to classify defaulters

Next is Kernel SVM an another powerful SVM

What is Kernel SVM

The complexity of Linear svm grows with the size of the dataset. In simple words Kernel SVM ‘rbf’ transforms complex non-linear data to higher dimensional 3D space to separate the data classes.

kernel svm helps to transform non-linear data in high dimensional space.

Usually linear and polynomial kernels are less time-consuming and provide less accuracy than the rbf or Gaussian kernels.

So, the rule of thumb is: use linear SVMs (or logistic regression) for linear problems, and nonlinear kernels such as the Radial Basis Function kernel for non-linear problems.

Lets. Compare Linear svm with kernel Radial based svm

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#-----------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scale
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Well till here it’s the same things everywhere. Load the data then define X and Y, split the data, and transform to the standard range to reduce the magnitude of data without losing its original meaning.

Now we will fit the data in both Linear as well as Kernel ‘rbf’ svm to compare both of them.

#Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = svm_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#-------------------------------------------------------
#Fitting Kernal SVM to the Training set
from sklearn.svm import SVC
Ksvm_model = SVC(kernel = 'rbf', random_state = 0)
Ksvm_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = Ksvm_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#Hence we noticed Kernal SVM perform better than SVM

So what we got

Confusion Matrix: Right | cm | Linear SVM, Left | cm1| Kernel ‘rbf’ SVM

The confusion matrix of Kernel SVM is performing better in identifying True Positive and True Negative than Linear SVM

Evaluation Metrics | First | Linear SVM, Second | Kernel ‘rbf’ SVM

The accuracy score of our Kernel svm model is better than linear svm

Hence Kernel SVM performs better than Linear?SVM.

Finally, with the model, we can predict any new input.

#if income, age, loan = 66952.7,28,8770.1
import numpy as np
#Create a numpy array
new_data = np.array([66952.7,28,8770.1])
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
Ksvm_model.predict(new_data)
#if we wish to enter manually
Ksvm_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan = 66952.7, 28, 8770.1 seems will to be a non-defaulter (class=’0') even with Kernel SVM model.

Let’s put all of the codes together.

#Kernal SVM
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#-----------------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scale
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#------------------------------------------------------
#Fitting SVM to the Training set
from sklearn.svm import SVC
svm_model = SVC(kernel = 'linear', random_state = 0)
svm_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = svm_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#---------------------------------------------------------
# Fitting Kernal SVM to the Training set
from sklearn.svm import SVC
Ksvm_model = SVC(kernel = 'rbf', random_state = 0)
Ksvm_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = Ksvm_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm1 = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#Hence we noticed Kernal SVM performs better than SVM
#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
#Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
Ksvm_model.predict(new_data)
#if we wish to enter manually
Ksvm_model.predict([[66952.7,28,8770.1]])

Congratulations! We have successfully completed our Kernel SVM model to classify defaulters.

Next is Naive Bayes Classifier.

What is Naive Bayes in short?

Na?ve Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem.

P(c|x) is the posterior probability of class (target) given predictor (attribute).

P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.

Likelihood: How probable is the evidence given that our hypothesis is true.

Prior: How probable was our hypothesis before observing the evidence?

Posterior: How probable is our hypothesis given the observed evidence?

Marginal: How probable is the new evidence under all possible hypotheses?

It's a long chapter about how Naive Bayes works. if are you interested to go in-depth further you can visit my another site Data2Dimensions, However

In short Naive Bayes uses class of probability method to classify the problem solution.

Let’s see how can we apply Na?ve Bayes in classifying the bank defaulters.

#Naive Bayes
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#-----------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Alright till here we have the same as above, load the data, define X and Y, split the data into train and test sets then scale the data to reduce the magnitude of the spread of data points without losing their original meaning.

Let’s fit the Na?ve Bayes to our data

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)

Done,,,, in just 3 lines of code easy isn’t?it?

Time to predict on unseen data.

# Predicting the Test set results
y_pred2 = NB_model.predict(X_test)

Done we have our predicted values saved in y_pred2

Now let’s access the model performance with a confusion matrix

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)

Well, we can see it's classifying True positive and True negative more accurately than False-positive & False-negative.

Further, we can use scikit sklearn evaluation metrics to assess the model accuracy score.

#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))

Well, we got an accuracy score of our model is 98% that’s a good model.

Finally, we can use this model to predict any new data

#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
NB_model.predict(new_data)
#if we wish to enter manually
NB_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan= 66952.7, 28, 8770.1 seems will to be a non-defaulter (class=’0') with Naive Bayes model.

Let’s put all of these codes together.

#Naive Bayes
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#----------------------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#-------------------------------------------------------------
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)
# Predicting the Test set results
y_pred2 = NB_model.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
NB_model.predict(new_data)
#if we wish to enter manually
NB_model.predict([[66952.7,28,8770.1]])

Congratulations! We have successfully completed our Naive Bayes model to classify defaulters.

Next is the Decision Tree / Rule-based Classifier.

Introducing Decision Tree for Classification

What are Decision Trees?

Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules derived from the data features.

The decision rules are generally in form of if-then-else statements. The deeper the tree, the more complex the rules and fitter the model.

A decision tree gives output in a tree-like graph with nodes. Take this graph as an example, beautifully explained.

Decision Trees | Graph Credit ~ TDS

Let’s get hands-on experience on how to perform Decision trees.

Decision Tree Classification
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#--------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Up to this part, we have do the basic pre-processing tasks import data, define X and y, split the data into train and test set, and last normalization that is scaling to narrow it down the magnitude of the spread of the data points.

Now what we will do here is, We will fit our model in Na?ve Bayes as well as Decision trees and will compare which one performs better in classification.

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)
# Predicting the Test set results
y_pred2 = NB_model.predict(X_test)
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred2))
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)
#Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = dt_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred))
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))

So what we got.

Confusion Matrix | Left| Decision Tree, Right | Naive Bayes

The confusion matrix of Decision Tree is performing better in identifying True Positive and True Negative than Na?ve Bayes.

cm2 ~ Decision Tree | cm ~ Naive Bayes

The accuracy score of our Decision Tree model is better than Na?ve Bayes

Hence Decision Tree performs better than Na?ve Bayes.

Finally, with the model, we can predict any new input.

#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
dt_model.predict(new_data)
#if we wish to enter manually
dt_model.predict([[66952.7,28,8770.1]])

We have an output of array([0], dtype=int64) that is ‘0’ class. Done… we have classified if income, age, loan = 66952.7, 28, 8770.1 seems will to be a non-defaulter (class=’0') with Decision Tree model.

Its time to visualize the decision tree,

#import export_graphviz
 from sklearn.tree import export_graphviz# export the decision tree to a tree.dot file
#for visualizing the plot easily anywhere
 export_graphviz(dt_model, out_file ='e:/tree.dot',feature_names =['Pressure'])

The tree is finally exported and we can visualize using https://www.webgraphviz.com/ by copying the data from the ‘tree.dot’ file.

Decision Tree Classification with https://www.webgraphviz.com/

Putting all these together the who code looks something like this.

#Decision Tree Classification 
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#----------------------------------------------------
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#---------------------------------------------------------
#Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
NB_model = GaussianNB()
NB_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred2 = NB_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred2))
#if income, age, loan = 
NB_model.predict([[66952.7,18,8770.1]])
NB_model.predict([[0.382027,-0.979416,1.45499]])
#-----------------------------------------------------
# Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)
# Predicting the Test set results
y_pred = dt_model.predict(X_test)
# Making the Confusion Matrix 
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
from sklearn.tree import export_graphviz  
  # export the decision tree to a tree.dot file 
# for visualizing the plot easily anywhere 
export_graphviz(dt_model, out_file ='e:/tree.dot', 
               feature_names =['income','age','loan'])
"""
The tree is finally exported and we can visualized using  
https://www.webgraphviz.com/ by copying the data from the ‘tree.dot’ file."""
#if income, age, loan = 66952.7,28,8770.1
import numpy as np 
# Create a numpy array 
new_data = np.array([66952.7,28,8770.1]) 
new_data.dtype
new_data.shape
#We need to reshape to match the dimensions
new_data = new_data.reshape(-1,3)
new_data.shape
#------------------------------------
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
#scale the data
new_data = sc.fit_transform(new_data)
#We might see the scaled data as 0, 0, 0 but its not its 0.000000e+ and can be view by changing the format
#else inverse transform will give back the original value
inversed = sc.inverse_transform(new_data)
print(inversed)
#-------------------------------------
dt_model.predict(new_data)
#if we wish to enter manually
dt_model.predict([[66952.7,28,8770.1]])

Here we are, we have finished how to apply decision trees for non-linear data

NEXT RANDOM?FOREST

Introducing Random Forest for Classification

What is a random forest?

Random Forest is the upgrade version of decision trees. The name itself refers it consists of a large number of individual decision trees that operate as an ensemble. Thus we are combining the predictive power of several decision trees to give more accuracy.

Random Forest Graphical Representation

Let’s get started with the help of an example

#Random Forest Classification
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('credit_data.csv', sep=",")
#drop the missing values
dataset = dataset.dropna()
X = dataset.iloc[:,1:4].values
y = dataset.iloc[:, 4].values
#---------------------------------------------------
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#fearure scaling/Normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Till here it’s the same basic data pre-processing step from loading the data, defining X & Y, splitting the data into train, and test to data normalization/scaling to reduce the magnitude of the spread of data points.

Now we will fit the random forest into the dataset. Also, we will do for decision tree so that later we can compare the performance.

#Fitting Decision Tree Classification to the Training set
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred = dt_model.predict(X_test)
#Making the Confusion Matrix 
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
 
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred))
#------------------------------------------------------------
#Fitting Random Forest Classification to the Training set
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators = 500, criterion = 'entropy', random_state = 0)
rf_model.fit(X_train, y_train)
#Predicting the Test set results
y_pred2 = rf_model.predict(X_test)
#Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm2 = confusion_matrix(y_test, y_pred2)
#evaluation Metrics 
from sklearn import metrics
print('Accuracy Score:', metrics.accuracy_score(y_test, y_pred2))  
print('Balanced Accuracy Score:', metrics.balanced_accuracy_score(y_test, y_pred2)) 
print('Average Precision:',metrics.average_precision_score(y_test, y_pred2))

Confusion Matrix: Left | cm | Decision Tree, Right | cm2 | Random Forest

Accuracy Score: Decision Trees 98% | Random Forest 99%

Wwowwww?! we have 99% model accuracy score. How about yours?

let me know if u need anything or even the data set as this blog doesn’t support file hosting. Ping me at data2dimensions.com.

Congratulations! we have completed all, yes I would say all the kinds of classification techniques available till today.

It's a long blog, I tried to keep it as short as possible. I hope you have enjoyed it.

I will also be making another version in R. Have a good day. Keep in touch!

要查看或添加评论，请登录

Rupak Roy的更多文章

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

2024年11月24日

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

Discover the essential elements to transform a data science project into a compelling narrative for stakeholders. From…
Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

2024年9月20日

Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

As digital transformation reshapes industries across the globe, two groundbreaking technologies—Digital Twins and…
Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

2024年8月19日

Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

Revolutionize the way we approach natural language processing and information retrieval. We delve into the intricacies…

2 条评论
Transitioning from a Sports-Centric Life to a Career in Data Science

2024年8月9日

Transitioning from a Sports-Centric Life to a Career in Data Science

As I was going through my archives, I found these photos that brought back so many memories. It seemed like the perfect…

3 条评论
What are the parameters in?LLM?

2024年3月26日

What are the parameters in?LLM?

Let’s understand how parameters play an important role in LLM The scale of present large language models (LLMs) is…
Fixing LLM Hallucination Conversations

2024年2月18日

Fixing LLM Hallucination Conversations

Enhancing LLM Conversations via Semantic Routing Hi everyone, what's up? how things are up to? i hope its good. Today I…

3 条评论
Machine Capable of Creating Thinking

2022年10月6日

Machine Capable of Creating Thinking

Create your own Mid-journey via VQGAN+CLIP Hi there, it's been a while since i wrote something new and trending, I…
Chained and MultiLabel Algorithms

2022年8月24日

Chained and MultiLabel Algorithms

Chained and MultiLabel Algorithms New Guide to Advanced Predictive analytics via Multi-label, Multi-output & Chained Hi…
Bias-Variance Decomposition

2022年7月2日

Bias-Variance Decomposition

Implementing quadratic risk function to sklearn classifiers, regressors and keras/tensorflow Hi there, whatsup? things…

2 条评论
Bias-Variance Decomposition

2022年6月7日

Bias-Variance Decomposition

Bias-Variance Decomposition Implementing quadratic risk function to sklearn classifiers, regressors and…

See all articles

7 Types of Classification using python

Rupak Roy

Sr. Manager Data Science for GenAi & ML Analytics | KaggleX BIPOC Mentor | Hybrid-Genetic Auto-Ai Programming | Big Data ML-Ops Automation

Let’s get started with our commonly used Classification method:

BONUS

What is SVM??

领英推荐

NEXT RANDOM?FOREST

Rupak Roy的更多文章

社区洞察

其他会员也浏览了

Top 12 Python Skills Every Data Scientist Should Learn

Platforms for Machine Learning, AI, & Data Science Best Practices

Essential AI Tools for Aspiring Data Scientists ????

Document Splitting

Mastering XGBoost: From Basics to Advanced Techniques with a Complete Use Case

Summarization with LLMs: A Comprehensive Guide

Python library & It's Uses

Shapash : Machine Learning Interpretable & Understandable

Logistic Regression with deciles made simple

A Practical Example for Improving ML Models with Multiple Linear Regression

Let’s get started with our commonly used Classification method:

BONUS

What is SVM??

领英推荐

NEXT RANDOM?FOREST

Rupak Roy的更多文章

Crafting a Data Science Success Story: 11 Key Topics to Engage Stakeholders Effectively

Digital Twins vs. Generative AI Digital Twins: Understanding the Difference and Their New* Potential

Mastering RAG Fusion in Simple Steps: A Deep Dive into Retrieval-Augmented Generation"

Transitioning from a Sports-Centric Life to a Career in Data Science

What are the parameters in?LLM?

Fixing LLM Hallucination Conversations

Machine Capable of Creating Thinking

Chained and MultiLabel Algorithms

Bias-Variance Decomposition

Bias-Variance Decomposition

社区洞察

其他会员也浏览了

Top 12 Python Skills Every Data Scientist Should Learn

Platforms for Machine Learning, AI, & Data Science Best Practices

Essential AI Tools for Aspiring Data Scientists ????

Document Splitting

Mastering XGBoost: From Basics to Advanced Techniques with a Complete Use Case

Summarization with LLMs: A Comprehensive Guide

Python library & It's Uses

Shapash : Machine Learning Interpretable & Understandable

Logistic Regression with deciles made simple

A Practical Example for Improving ML Models with Multiple Linear Regression