登录查看更多内容

Supervised Machine Learning With Python: Classification. Gaussian Na?ve Bayes

Shivek Maharaj

Data Analyst | Automation Architect | Business success doesn’t follow a blueprint, It follows me | AI Engineer

发布日期: 2024年2月26日

Implementing supervised learning for classification problems will be the main topic of the next few posts.

The classifier method or model seeks to draw some inference from the values that were observed. We have categorized output in classification problems, such as “Black” or “White” or “Teaching” and “Non-Teaching” We need a training dataset with data points and the appropriate labels so that we can develop the classification model. If we want to determine whether the image is of a car or not, for instance. We will create a training dataset with the two classes “vehicle” and “no car” in it to verify this. The model must then be trained using the training data. The classification methods are primarily utilized in spam identification and facial recognition.

For convenience purposes, the examples that we work with will utilize SciKit Learn’s built-in datasets.

The General Framework For building a classification model in Python

The below phases make up the general framework that a Data Scientist or Machine Learning Engineer needs to follow when building a classification model in Python, using the SciKit Learn package.

PHASE 1: IMPORT THE NECESSARY PACKAGES

This would be the very first step in creating a Python classifier. One of the most popular machine learning modules for Python is the powerhouse package, Scikit-learn. We may import the package using the following Python command:

import sklearn

PHASE 2: SELECT, AND LOAD THE DATASET INTO SYSTEM MEMORY

We may start utilizing the dataset for our machine learning model in this stage. For our first classifier, we will utilize the Breast Cancer Wisconsin Diagnostic Database available to us via the sklearn.datasets package. The collection contains numerous details about breast cancer tumors, together with labels designating whether they are malignant or benign. The dataset contains information on 30 variables, or features, such as the radius of the tumor, texture, smoothness, and area, and contains 569 instances, or data, on 569 tumors. We may import the Scikitlearn breast cancer dataset by using the following Python command:

dataset = load_breast_cancer()

Now, before we move forward, I would like to explain a few concepts about our dataset variable. First, I want you to check the type of this object in python.

print(type(dataset))

The output to the above line of code will show as follows:

Notice, that the type of this object is Bunch. A Bunch is essentially like a dictionary in Python. To learn more about the Bunch type in Python, you may install the powerful dependency for Bunch support as follows:

pip install bunch

The dataset variable which is storing the contents of our Breast Cancer dataset is a very interesting object. This is because it is not a dataframe, nor is it an array. It is a Bunch, as we have mentioned before. However, this Bunch is essentially a dictionary and is storing very interesting information for us. Next, I would like you to see the information that is being stored by the dataset variable. Head to your Python IDE and print the contents of the dataset variable to the screen.

print(dataset)

The output to the above line of code will show as follows. The below image displays a truncated output snippet:

As we can see in the image, this object is storing all our data in a dictionary-like object.

The Bunch object, dataset, has a few important methods that may be called on it. These methods will enable us to effectively analyze and understand the dataset that we are working with. For example, if you would like to obtain insight into the target values of our dataset, you may utilize and call the method .target_names on the Bunch object (in our case, dataset). For example:

print(dataset.target_names)

The output to the above line of code will show as follows:

If you would like to view the target vector, you may utilize and call the .target method on the Bunch object as follows:

print(dataset.target)

If you would like to view the header row of the features matrix, you may call the .feature_names method on the Bunch object as follows:

print(dataset.feature_names)

The output to the above line of code will show as follows:

If you would like to make the column names more readable, you may iterate through the list using a for loop:

for i in dataset.feature_names:
    print(i)

The output to the above code block will show as follows:

Finally, if you would like to view the data itself, i.e., all the values contained in the features matrix, you may call the .data method on the Bunch object as follows:

领英推荐

Exploring Python’s Role in Machine Learning and AI

Naresh i Technologies 2 个月前

Build a RAG App in Python Using Llama 3.2 ??

Clarifai 5 个月前

I Created a Machine Learning Model with Auto Data…

Cláudio César da Costa Junior 8 个月前

print(dataset.data)

The output to the above line of code will show as follows. It is good to note that the output is truncated:

Now that we effectively understand how the data is structured inside the bunch object, we are able to allocate the data to brand-new variables for each significant collection of data inside the Bunch. In other words, the following lines of code can be used to organize the data:

target_labels = dataset["target_names"]
feature_names = dataset["feature_names"]
target_vector = dataset["target"]
features_matrix = dataset["data"]

Thereafter, we may print the contents of each of the above variables:

print(target_labels)
print(feature_names)
print(target_vector)
print(features_matrix)

The output to the above block of code will show as follows:

Thus based on the insights that we have obtained by structuring our dataset, we can see that the first observation in the dataset is a record belonging to a malignant tumor and its radius is 1.799e+01.

PHASE 3: ORGANIZING THE DATA INTO TESTING AND TRAINING SETS

A training set and a testing set will be created from our data in this stage. We must test our model on the unobserved data, hence it is crucial to divide the data into these sets. Sklearn contains a function named train_test_split() that divides the data into sets. We can divide the data into these sets using the following commands:

from sklearn.model_selection import train_test_split

The program below will divide the data into training and test data after the command above imports the .train_test_split() function from the sklearn.model_selection package. The example below shows how we would use the remaining data to train the model while using 40% of the data for testing.

train, test, train_labels, test_labels = train_test_split(features_matrix, target_vector, test_size = 0.40, random_state=None)

PHASE 4: BUILDING THE CLASSIFICATION MODEL

We will construct our model in this stage. The Na?ve Bayes technique will be used to generate the model. The Gaussian Na?ve Bayes model creation may be commanded with the lines of code below:

from sklearn.naive_bayes import GaussianNB

Thereafter, we are required to instantiate an object of the GaussianNB class as follows:

algorithm = GaussianNB()

Next, we will proceed to train the model as follows:

model = algorithm.fit(train, train_labels)

And with that, our Gaussian Na?ve Bayes classification model has been trained and now resides in our system memory.

PHASE 5: EVALUATING MODEL PERFORMANCE

By making predictions based on our test data, we will assess the model in this phase. Then, we can ascertain its accuracy. We’ll utilize the .predict() method to make predictions. You can achieve this by using the aforementioned command, and in Python, it is simple and would be done as follows:

predictions = model.predict(test)

print(predictions)

The output to the above block of code will show as follows:

To evaluate the performance of our model, we will utilize the .accuracy_score() function available to us from the sklearn.metrics package. The above series of 0s and 1s in the image represent the expected values for the malignant and benign tumor classes, respectively. We can now determine the accuracy of our model by comparing the two arrays, test_labels and predictions.

from sklearn.metrics import accuracy_score

Finally, we proceed to calculate the accuracy of our Gaussian Na?ve Bayes model as follows:

print(accuracy_score(test_labels, predictions))

The output to the above line of code which shows us our models accuracy will display as follows:

The output in the image above depicts the accuracy of our Na?ve Bayes classifier which lies at approximately 92.105%.

Thus, we have effectively learned about the general framework for developing a Machine Learning Classification Model in Python Programming Language.

This effectively concludes this tutorial, and I do hope that you have new takeaways about Machine Learning Classification with Python.

I thank you for your time.

Shivek Maharaj

要查看或添加评论，请登录

Shivek Maharaj的更多文章

Measuring The Clustering Performance

2024年3月12日

Measuring The Clustering Performance

Real-world data are not inherently grouped into several separate groupings. This makes it difficult to visualize and…
Unsupervised Machine Learning With Python: Clustering. Mean Shift Algorithm

2024年3月11日

Unsupervised Machine Learning With Python: Clustering. Mean Shift Algorithm

It is yet another well-liked and effective clustering method applied in unsupervised learning. It is a non-parametric…

1 条评论
Unsupervised Machine Learning With Python: Clustering. K-Means Clustering

2024年3月10日

Unsupervised Machine Learning With Python: Clustering. K-Means Clustering

The next few posts that we look at will explain a few of the many various clustering algorithms that are available for…

2 条评论
Unsupervised Machine Learning With Python: Clustering

2024年3月9日

Unsupervised Machine Learning With Python: Clustering

Machine learning algorithms that are unsupervised lack a supervisor to offer any kind of direction. They closely…

3 条评论
Artificial Intelligence With Python: Logic Programming- Part 2 (Examples)

2024年3月8日

Artificial Intelligence With Python: Logic Programming- Part 2 (Examples)

Hi, everyone! I hope you are all doing well. This article will demonstrate to us a few examples of Logic Programming…

8 条评论
Artificial Intelligence With Python: Logic Programming- Part 1

2024年3月7日

Artificial Intelligence With Python: Logic Programming- Part 1

Hi, everyone. I hope you are all doing well.
Supervised Machine Learning With Python: Regression. Simple Linear Regression

2024年3月6日

Supervised Machine Learning With Python: Regression. Simple Linear Regression

One of the most crucial statistical and machine learning tools is regression. Regression serves as the starting point…

1 条评论
Supervised Machine Learning With Python: Classification: Ensemble Techniques

2024年3月5日

Supervised Machine Learning With Python: Classification: Ensemble Techniques

In essence, this approach is used to adapt current classification algorithms to fit imbalanced data sets. We build…

3 条评论
The Class Imbalance Problem

2024年3月4日

The Class Imbalance Problem

When there are significantly fewer observations in one class than in the other classes, this is referred to as a class…

2 条评论
Evaluating The Performance Of Classification Models

2024年3月3日

Evaluating The Performance Of Classification Models

We need to evaluate the model’s performance after deploying a machine learning method. Datasets and metrics may serve…

3 条评论

See all articles

Supervised Machine Learning With Python: Classification. Gaussian Na?ve Bayes

Shivek Maharaj

Data Analyst | Automation Architect | Business success doesn’t follow a blueprint, It follows me | AI Engineer

The General Framework For building a classification model in Python

PHASE 1: IMPORT THE NECESSARY PACKAGES

PHASE 2: SELECT, AND LOAD THE DATASET INTO SYSTEM MEMORY

领英推荐

PHASE 3: ORGANIZING THE DATA INTO TESTING AND TRAINING SETS

PHASE 4: BUILDING THE CLASSIFICATION MODEL

PHASE 5: EVALUATING MODEL PERFORMANCE

Shivek Maharaj的更多文章

社区洞察

其他会员也浏览了

Why is Python the predominant language in AI and machine learning projects?

Why AI Platforms Favor Python and Its Potential to Dominate Future Programming

Why Is Python Used for Machine Learning

?? AI-Powered Number-Guessing: A Fun Python Learning Project ??

Python in healthcare:

<<PERFECT DETAILED PRODUCTION QUALITY PYTHON CODE FOR GPT3 POWERED SELF WRITING MASTERMIND PLATFORM WITH QUALITY UNIT TESTS>>

Python MACHINE LEARNING

Data Analysis with Python: Machine Learning using Scikit-Learn

Python Image Handling Libraries

Supervised Machine Learning With Python: Classification. Support Vector Machines

The General Framework For building a classification model in Python

PHASE 1: IMPORT THE NECESSARY PACKAGES

PHASE 2: SELECT, AND LOAD THE DATASET INTO SYSTEM MEMORY

领英推荐

PHASE 3: ORGANIZING THE DATA INTO TESTING AND TRAINING SETS

PHASE 4: BUILDING THE CLASSIFICATION MODEL

PHASE 5: EVALUATING MODEL PERFORMANCE

Shivek Maharaj的更多文章

Measuring The Clustering Performance

Unsupervised Machine Learning With Python: Clustering. Mean Shift Algorithm

Unsupervised Machine Learning With Python: Clustering. K-Means Clustering

Unsupervised Machine Learning With Python: Clustering

Artificial Intelligence With Python: Logic Programming- Part 2 (Examples)

Artificial Intelligence With Python: Logic Programming- Part 1

Supervised Machine Learning With Python: Regression. Simple Linear Regression

Supervised Machine Learning With Python: Classification: Ensemble Techniques

The Class Imbalance Problem

Evaluating The Performance Of Classification Models

社区洞察

其他会员也浏览了

Why is Python the predominant language in AI and machine learning projects?

Why AI Platforms Favor Python and Its Potential to Dominate Future Programming

Why Is Python Used for Machine Learning

?? AI-Powered Number-Guessing: A Fun Python Learning Project ??

Python in healthcare:

<<PERFECT DETAILED PRODUCTION QUALITY PYTHON CODE FOR GPT3 POWERED SELF WRITING MASTERMIND PLATFORM WITH QUALITY UNIT TESTS>>

Python MACHINE LEARNING

Data Analysis with Python: Machine Learning using Scikit-Learn

Python Image Handling Libraries

Supervised Machine Learning With Python: Classification. Support Vector Machines