ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Image Classification Using Deep Learning with Python

Muhammad Talha

3D Point Clouds Compression | Gaussian Splatting | Research Assistant @ UMKC | ?? Pakistani @ Google Code Jam'23 ?? | Google Code Jam'22 Qualifier ?? | Top 6th Pakistani Advent Of Code??| Python Instructor @ Udemy

å‘å¸ƒæ—¥æœŸ: 2022å¹´7æœˆ16æ—¥

+ å…³æ³¨

Co-Author:

Saira Gillani

Graduate Research Assistant - Computer Vision

COLUMBUS STATE UNIVERSITY, COLUMBUS, GEORGIA

-------------------------------------------------------------------------------------------------------

What is Image Classification?

Image classification is the Computer Vision task through which we can categorise and label the collection of pixels based on specific rules and recognize certain objects within that image.

Today, we will see how to recognize input images and predict the single-label or multi-label outputs using machine learning. We will use Deep Learning techniques, Python and PyTorch to perform the desired task. At the end of this article, we will be able to form a model to which we will feed the input image to retrieve the predicted single label or multi-label output. We will use ResNet or AlexNet model to perform image classification as these models are also being extensively used for research purposes.

Resources

The following resources will be used in this project.

Python 3
PyTorch
Google Colab
Pre-trained Models (ResNet or AlexNet)

Single Label Classification

In single-label classification, the model is trained to predict only one object in the image and it ignores all the other objects.

In other words, single-label classification finds if a certain object is present in the image and gives it a single class in the output. Here "Cat" is a class name in which the data (pixels) has been classified.

Multi-Label Classification

Unlike single-label, in multi-label classification, the image is classified into more than one class. Simply put, when you will feed the input image to the network, in the result the model will predict all the different classes/labels against the objects present in the input image. For example, below all the animals have been classified into different labels including cat, dog, guinea pig, tortoise, rabbit and reptile.

Classification with Pre-trained Models

What are pre-trained Models?

Pre-trained models are open-source Deep Learning models trained on large benchmark datasets. These models are one of the reasons behind the rapid advancements in the Computer Vision domain as students and researchers can use these pre-trained state-of-the-art models instead of re-writing and re-inventing everything from the scratch.

Initially, these models are already trained on some database containing certain objects. Based on that database, we can feed the image to the model and retrieve the output without training the model again.

Keep in mind that in the case of a pre-trained model, the input image should be related to the database on which the model has been trained. For example:

In the image above, the input image is fed to the pre-trained model and as the result, the object is classified as the Pizza. For the pre-trained model to predict the output accurately, it has to be trained on the relevant data. To predict the Pizza, the model has to be trained on the database that contains the data related to Pizza, Burgers, Shawarma etc.

If the data is not trained on the relevant database then before feeding the image to the model, we will train the data on the related database so it can predict the object accurately.

Why do we learn Pre-trained Models?

One of the biggest reasons behind learning pre-trained models is the concept of Transfer Learning. Being a researcher in the Computer Vision domain, whenever we meet new tasks, the best approach is to utilize transfer learning instead of reinventing new models for each dataset.

Transfer Learning

Let's suppose you have pre-trained data that has been trained on the data related to junk food like Pizza, Zinger Burger, Pasta etc. However, you want to classify an image which contains regular cuisines and daily home meals. That is when transfer learning comes into play. you cannot simply feed your desired image to the pre-trained model as the data it has been trained on is simply irrelevant to the classes you want in the output. In these situations, we utilize the knowledge learned by these models like weights to train this model on the more relevant dataset. After that, we can use this model for our classification purposes without rewriting a new model.

In simple words, using transfer learning you can train the model on your own dataset using the learned knowledge of the pre-trained model. In this article, we will use two extensively used pre-trained models ResNet and AlexNet models. These models have been trained on the ImageNet dataset. ImageNet dataset has over 14 million images with 1000 classes maintained.

Data Processing

Before feeding the image (data) to the model, we have to process the data to transform the image so that it has the right shape and other characteristics like Mean and Standard Deviation. These values of the input image should be similar to the ones which were used during the training of the model otherwise the output label prediction will never be accurate.

We can process the data using the Torchvision library of python and transforms will be applied to the input image.

Step 1: Read the Image from Google Drive in Google Colab

We will read the image from the Google Colab and display it using "matplotlib.pyplot".

from PIL import Image

InputImg = Image.open("/content/drive/My Drive/Data/Sample.jpg")
import matplotlib.pyplot as plt

plt.imshow(InputImg)

Step 2: Transforms According to ResNet and AlexNet.

from torchvision import transforms

transform = transforms.Compose([
transfomrs.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
])

Both ResNet and AlexNet use the same transforms so the image has to be introduced to these characteristics before feeding into the model.

"transforms.Resize(256)" resizes the image to 256 x 256 dimensions.
"transforms.CenterCrop(224)" will crop the image to 224 x 224 pixels about the center.
"transforms.ToTensors() will convert the image to the processable tensor.
"transforms.Normalize()" will normalize the image according to the model's mean and standard deviation.

Step 3: Now, we will apply these transforms to the input image.

Transformed_inputImage = transform(InputImg)

Afterwards, we can analyze the shape of the image.

print(Transformed_inputImage.shape)

The output of the above line of code should be something like this.

torch.Size([3, 224, 224])

"3" here is showing the number of channels and "224 x224" are the pixels (height and weight) in which the image has been cropped out about the centre.

Step 4: Converting the data into the Batch format

Both ResNet and AlexNet accept the input in the batch format so we will convert our input data into the batch format.

import torch
InputImg_batched = torch.unsqueeze(Transformed_inputImage, 0)

"torch .unsqueeze" adds one more dimension at the start of the torch.Size() array.

We can again check the shape of the input image after batching it to see if it has been converted into the batch format or not.

print(InputImg_batched.shape)

The output of the above line of code should be something like this.

torch.Size([1, 3, 224, 224])

It can be seen that one more dimension has been added to the shape of the image data. It means that our input image has been converted into the batch format after being properly transformed.

Now, our input is all ready to be fed to the ResNet or AlexNet Deep Learning Models.

é¢†è‹±æŽ¨è

Geometric Learning in Python: Basics

Patrick Nicolas 12 ä¸ªæœˆå‰

Geometric Learning in Python: Introduction

Patrick Nicolas 1 å¹´å‰

Modular GANs with Neural Blocks in Python

Patrick Nicolas 7 ä¸ªæœˆå‰

Single-Label Classification using ResNet and AlexNet

1. ResNet Model

Step 1: Importing the Pre-Trained Models

At first, we will import the pre-trained models from the Python torchvision library.

from torchvision import models

If you want to take a look at all the pre-trained models in the torchvision then you can write the following line:

dir(models)

There are multiple versions of ResNet including ResNet18, ResNet34, ResNet50, ResNet101 and ResNet 152. The bigger the version number gets, the higher the accuracy can be achieved at the cost of more computations. We will use the ResNet101 version.

Loading the ResNet101 Model

from torchvision import models
resnet = models.resnet101(pretrained = True)

# activating the evaluation mode after this we can feed the input
resnet.eval()

#We will feed the Batch format input image to the resnet

out = resnet(InputImg_batched)

As the ResNet maintains 1000 Classes for the corresponding objects, that is why as the output we will have the index of the class instead of actually labelled image.

You can find the text file containing the list of all the classes against their index number. ImageNet1000Classes

Step 2: Read the ImageNet Classes Text file from Google Drive (When using Google Colab)

with open('/content/drive/My Drive/Data/imagenet1000Classes.txt') as classesfile:
  ImageNetClasses = [line.strip() for line in Classesfile.readlines()]

Step 3: Predicting the Output Index of the Class

After that, we will predict the index number for the output based on the maximum score.

_, predicted = torch.max(output, 1)
percentage = torch.softmax(output, dim = 1)[0] * 100
print(ImageNetClasses[predicted[0]], percentage[predicted[0]].item())

Here we have used the "Softmax" activation function. then outputting the probability of the predicted class and then multiplying it by 100 to retrieve the percentage of the predicted class.
"predicted[0]" is the index of the class having the maximum score. So we will use this index to fetch the class name from the ImageNetClasses "ImageNetClasses[predicted[0]]".
We will also print the percentage of the predicted class using "percentage[predicted[0]].item()".

As a result, you should have the output like this:

963: 'pizza, pizza pie', 99.99574279785156

2. AlexNet Model

As both ResNet and AlexNet Model use the same transforms and other characteristics, all we have to do is replace the resnet101 with alexnet in the section where we are loading the ResNet Model. The entire code other than that will be the same.

Loading the AlexNet Model

from torchvision import models

alexnet = models.alexnet(pretrained=True)
alexnet.eval()

output = alexnet(InputImg_batched)

The code will be exactly the same as the previous one.

Multi-Label Classification Using ResNet and AlexNet

We will use the ResNet and AlexNet models to perform multi-label classification.

Step 1: Read the Image from Google Drive in Google Colab

We will read the image from the Google Colab and display it using "matplotlib.pyplot".

from PIL import Image

InputImg = Image.open("/content/drive/My Drive/Data/Sample.jpg")
import matplotlib.pyplot as plt

plt.imshow(InputImg)

Step 2: Transforming the Image

We will use the same code to transform the image that we use above for the single-label classification. (Please refer to the code above)

Step 3: Converting the Image to the Batch

We will convert the image to the batch format exactly the same way we did for the single-label classification

import torch
InputImg_batched = torch.unsqueeze(Transformed_inputImage, 0)

1. ResNet Model

Step 4: Loading the ResNet Model

This time we will use ResNet152 version for multiple-label classification just to demonstrate that no matter which version we use, the code will be the same.

from torchvision import models

resnet = models.resnet152(pretrained=True)
resnet.eval()

After that, we will apply the ResNet model to the input Image

output = resnet(InputImg_batched)

Step 5: Reading the Classes text file from Google Drive in Google Colab

We will again read the ImageNet Classes Text file as the output of the ResNet model will give us the index of the predicted class.

with open('/content/drive/My Drive/Data/imagenet1000Classes.txt') as classesfile:
  ImageNetClasses = [line.strip() for line in Classesfile.readlines()]

Step 6: Predicting the output index of the Classes

Here we will change the code as now we are interested to predict all the classes in the image instead of just predicting one class.

_, predictedLabels = torch.sort(output, descending = True)

Percentage = torch.sigmoid(output)[0] * 100

[(ImageNetClasses[index], Percentage[index].item()) for index in predictedLabels[0][:5]]

This time we will not use "torch.max()" because we are not looking for a single class or label but are interested in finding the multiple labels in the image. that is why we will remove the max method and sort all the classes in descending order giving the max scoring class at the top of the list.
We will replace the "softmax" activation function with "Sigmoid" as the softmax gives us the probabilities of all classes from 0 to 1 whereas the sigmoid gives the independent score for each class.
We will also change the third line because this time we are not fetching the topmost class instead we are fetching the top 5 classes in the image. In the third line, we are fetching the ImageNetClass index and the percentage of the class against that index. As the classes are being stored in descending order that is why "[:5]" will fetch the top 5 classes present in the image.

If the image in the input had multiple animals in it then the predicted output should look like this:

[("281: 'tabby, tabby cat',", 99.99746373223),
("245: 'French bulldog',", 99.88746373223),
("285: 'Egyptian Cat',", 99.44746373223),
("195: 'Boston bull, Boston terrier',", 98.99746373223),
("254: 'pug, pug-dog',", 98.55746373223)

2. AlexNet Model

Just like Single-Label Model, while using Alexnet we will only replace the resnet152 with alexnet and all the other code will remain the same.

Step 1: Loading the AlexNet Model

from torchvision import models

alexnet = models.alexnet(pretrained = True)
alexnet.eval()

output = alexnet(InputImg_batched)

All the remaining code will be the same for the ResNet model for the Multi-Label Classification model.

Conclusion

Image Classification is the process of Computer Vision which helps us in classifying the objects present in an image. In this article, we used multiple versions of ResNet and AlexNet pre-trained models to predict the classes in the image. We performed Single-Label and Multi-Label classification using ResNet and AlexNet Deep Learning Models. The purpose of using the pre-trained model is to utilize the already learned knowledge of these models to predict the different types of objects using Transfer Learning.

Yogesh Sharma

Architecting Success Through Innovative Design & Development, In Publicis Sapient, Ex-Nagarro, Ex-Optum, Ex-Birlasoft, Ex-Snapon

1 å¹´

This really helped..

èµž

å›žå¤

Shaheer khan

2 å¹´

Good work Talha, I want to share one thing with you. In the article, the image having a bounding box (showing cat in red box) is named as a Single label classification example, although in a deeper sense we can consider it as classification but more concisely its object localization. In computer vision using deep learning works in this way: 1. Single object without bounding box = Classification. 2. Single object with bounding box = Localization (also called Object detection) 3. Multiple objects with bounding box = Object Detection

èµž

å›žå¤

2 æ¬¡å›žåº”

Engr.Rehman Ali

Electrical Engineer/Site Engineer Electrical/HT & LT Panel Termination Engineer/QA/QC Engineer/Executive Engineer Electrical

2 å¹´

Good job

èµž

å›žå¤

1 æ¬¡å›žåº”

Aadil Memon

2 å¹´

Did you worked on Multi labels classification?

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Muhammad Talhaçš„æ›´å¤šæ–‡ç«

Revolutionizing Autonomous Vehicles: Unveiling the Power of Point Clouds

2023å¹´3æœˆ17æ—¥

Revolutionizing Autonomous Vehicles: Unveiling the Power of Point Clouds

Introduction As autonomous vehicles edge closer to becoming an everyday reality, the computer vision technology behindâ€¦

3 æ¡è¯„è®º
Linear Regression in Machine Learning (Simple, Multiple, and Polynomial Linear Regression) With Code

2023å¹´1æœˆ16æ—¥

Linear Regression in Machine Learning (Simple, Multiple, and Polynomial Linear Regression) With Code

Linear Regression Linear regression is a statistical method that helps to predict the relationship between a dependentâ€¦

2 æ¡è¯„è®º
PyTorch For Deep Learning: Quick Start ( Installation & Coding)

2023å¹´1æœˆ8æ—¥

PyTorch For Deep Learning: Quick Start ( Installation & Coding)

What is PyTorch? PyTorch is an open-source machine-learning library for Python. It is used for developing and trainingâ€¦

4 æ¡è¯„è®º
Computer Vision and Future Technical Landscape

2022å¹´5æœˆ29æ—¥

Computer Vision and Future Technical Landscape

Summary In recent years, Computer Vision has proved to be one of the most impactful sciences domains. The idea ofâ€¦

Co-Author:

COLUMBUS STATE UNIVERSITY, COLUMBUS, GEORGIA

What is Image Classification?

Resources

Single Label Classification

Multi-Label Classification

Classification with Pre-trained Models

What are pre-trained Models?

Why do we learn Pre-trained Models?

Transfer Learning

Data Processing

Step 1: Read the Image from Google Drive in Google Colab

Step 2: Transforms According to ResNet and AlexNet.

Step 3: Now, we will apply these transforms to the input image.

Step 4: Converting the data into the Batch format

é¢†è‹±æŽ¨è

Single-Label Classification using ResNet and AlexNet

1. ResNet Model

Step 1: Importing the Pre-Trained Models

Loading the ResNet101 Model

Step 2: Read the ImageNet Classes Text file from Google Drive (When using Google Colab)

Step 3: Predicting the Output Index of the Class

2. AlexNet Model

Loading the AlexNet Model

Multi-Label Classification Using ResNet and AlexNet

Step 1: Read the Image from Google Drive in Google Colab

Step 2: Transforming the Image

Step 3: Converting the Image to the Batch

1. ResNet Model

Step 4: Loading the ResNet Model

Step 5: Reading the Classes text file from Google Drive in Google Colab

Step 6: Predicting the output index of the Classes

2. AlexNet Model

Step 1: Loading the AlexNet Model

Conclusion

Muhammad Talhaçš„æ›´å¤šæ–‡ç«

Revolutionizing Autonomous Vehicles: Unveiling the Power of Point Clouds

Linear Regression in Machine Learning (Simple, Multiple, and Polynomial Linear Regression) With Code

PyTorch For Deep Learning: Quick Start ( Installation & Coding)

Computer Vision and Future Technical Landscape

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

How to Build a Singing Voice Cloning Model in Python

The Role of Python in AI/ML Development: A Deep Dive into Tools and Frameworks

TensorFlow vs PyTorch vs Keras: Which Framework is Right for You?

?? 7 Python AI Projects Every Beginner Should Try Today ??

Building a neural network in python is quite simple

Develop AI Using Python: A Step-by-Step Guide

How do data scientists use PyTorch?

How do data scientists use PyTorch?

The Growing Role of Python in Artificial Intelligence and Its Impact on Various Industries

Deep Learning Libraries

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†