Image Classification Using Deep Learning with Python
Muhammad Talha
3D Point Clouds Compression | Gaussian Splatting | Research Assistant @ UMKC | ?? Pakistani @ Google Code Jam'23 ?? | Google Code Jam'22 Qualifier ?? | Top 6th Pakistani Advent Of Code??| Python Instructor @ Udemy
Co-Author:
Graduate Research Assistant - Computer Vision
COLUMBUS STATE UNIVERSITY, COLUMBUS, GEORGIA
-------------------------------------------------------------------------------------------------------
What is Image Classification?
Image classification is the Computer Vision task through which we can categorise and label the collection of pixels based on specific rules and recognize certain objects within that image.
Today, we will see how to recognize input images and predict the single-label or multi-label outputs using machine learning. We will use Deep Learning techniques, Python and PyTorch to perform the desired task. At the end of this article, we will be able to form a model to which we will feed the input image to retrieve the predicted single label or multi-label output. We will use ResNet or AlexNet model to perform image classification as these models are also being extensively used for research purposes.
Resources
The following resources will be used in this project.
- Python 3
- PyTorch
- Google Colab
- Pre-trained Models (ResNet or AlexNet)
Single Label Classification
In single-label classification, the model is trained to predict only one object in the image and it ignores all the other objects.
In other words, single-label classification finds if a certain object is present in the image and gives it a single class in the output. Here "Cat" is a class name in which the data (pixels) has been classified.
Multi-Label Classification
Unlike single-label, in multi-label classification, the image is classified into more than one class. Simply put, when you will feed the input image to the network, in the result the model will predict all the different classes/labels against the objects present in the input image. For example, below all the animals have been classified into different labels including cat, dog, guinea pig, tortoise, rabbit and reptile.
Classification with Pre-trained Models
What are pre-trained Models?
Pre-trained models are open-source Deep Learning models trained on large benchmark datasets. These models are one of the reasons behind the rapid advancements in the Computer Vision domain as students and researchers can use these pre-trained state-of-the-art models instead of re-writing and re-inventing everything from the scratch.
Initially, these models are already trained on some database containing certain objects. Based on that database, we can feed the image to the model and retrieve the output without training the model again.
Keep in mind that in the case of a pre-trained model, the input image should be related to the database on which the model has been trained. For example:
In the image above, the input image is fed to the pre-trained model and as the result, the object is classified as the Pizza. For the pre-trained model to predict the output accurately, it has to be trained on the relevant data. To predict the Pizza, the model has to be trained on the database that contains the data related to Pizza, Burgers, Shawarma etc.
If the data is not trained on the relevant database then before feeding the image to the model, we will train the data on the related database so it can predict the object accurately.
Why do we learn Pre-trained Models?
One of the biggest reasons behind learning pre-trained models is the concept of Transfer Learning. Being a researcher in the Computer Vision domain, whenever we meet new tasks, the best approach is to utilize transfer learning instead of reinventing new models for each dataset.
Transfer Learning
Let's suppose you have pre-trained data that has been trained on the data related to junk food like Pizza, Zinger Burger, Pasta etc. However, you want to classify an image which contains regular cuisines and daily home meals. That is when transfer learning comes into play. you cannot simply feed your desired image to the pre-trained model as the data it has been trained on is simply irrelevant to the classes you want in the output. In these situations, we utilize the knowledge learned by these models like weights to train this model on the more relevant dataset. After that, we can use this model for our classification purposes without rewriting a new model.
In simple words, using transfer learning you can train the model on your own dataset using the learned knowledge of the pre-trained model. In this article, we will use two extensively used pre-trained models ResNet and AlexNet models. These models have been trained on the ImageNet dataset. ImageNet dataset has over 14 million images with 1000 classes maintained.
Data Processing
Before feeding the image (data) to the model, we have to process the data to transform the image so that it has the right shape and other characteristics like Mean and Standard Deviation. These values of the input image should be similar to the ones which were used during the training of the model otherwise the output label prediction will never be accurate.
We can process the data using the Torchvision library of python and transforms will be applied to the input image.
Step 1: Read the Image from Google Drive in Google Colab
We will read the image from the Google Colab and display it using "matplotlib.pyplot".
from PIL import Image
InputImg = Image.open("/content/drive/My Drive/Data/Sample.jpg")
import matplotlib.pyplot as plt
plt.imshow(InputImg)
Step 2: Transforms According to ResNet and AlexNet.
from torchvision import transforms
transform = transforms.Compose([
transfomrs.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
])
Both ResNet and AlexNet use the same transforms so the image has to be introduced to these characteristics before feeding into the model.
- "transforms.Resize(256)" resizes the image to 256 x 256 dimensions.
- "transforms.CenterCrop(224)" will crop the image to 224 x 224 pixels about the center.
- "transforms.ToTensors() will convert the image to the processable tensor.
- "transforms.Normalize()" will normalize the image according to the model's mean and standard deviation.
Step 3: Now, we will apply these transforms to the input image.
Transformed_inputImage = transform(InputImg)
Afterwards, we can analyze the shape of the image.
print(Transformed_inputImage.shape)
The output of the above line of code should be something like this.
torch.Size([3, 224, 224])
"3" here is showing the number of channels and "224 x224" are the pixels (height and weight) in which the image has been cropped out about the centre.
Step 4: Converting the data into the Batch format
Both ResNet and AlexNet accept the input in the batch format so we will convert our input data into the batch format.
import torch
InputImg_batched = torch.unsqueeze(Transformed_inputImage, 0)
"torch .unsqueeze" adds one more dimension at the start of the torch.Size() array.
We can again check the shape of the input image after batching it to see if it has been converted into the batch format or not.
print(InputImg_batched.shape)
The output of the above line of code should be something like this.
torch.Size([1, 3, 224, 224])
It can be seen that one more dimension has been added to the shape of the image data. It means that our input image has been converted into the batch format after being properly transformed.
Now, our input is all ready to be fed to the ResNet or AlexNet Deep Learning Models.
领英推è
Single-Label Classification using ResNet and AlexNet
1. ResNet Model
Step 1: Importing the Pre-Trained Models
At first, we will import the pre-trained models from the Python torchvision library.
from torchvision import models
If you want to take a look at all the pre-trained models in the torchvision then you can write the following line:
dir(models)
There are multiple versions of ResNet including ResNet18, ResNet34, ResNet50, ResNet101 and ResNet 152. The bigger the version number gets, the higher the accuracy can be achieved at the cost of more computations. We will use the ResNet101 version.
Loading the ResNet101 Model
from torchvision import models
resnet = models.resnet101(pretrained = True)
# activating the evaluation mode after this we can feed the input
resnet.eval()
#We will feed the Batch format input image to the resnet
out = resnet(InputImg_batched)
As the ResNet maintains 1000 Classes for the corresponding objects, that is why as the output we will have the index of the class instead of actually labelled image.
You can find the text file containing the list of all the classes against their index number. ImageNet1000Classes
Step 2: Read the ImageNet Classes Text file from Google Drive (When using Google Colab)
with open('/content/drive/My Drive/Data/imagenet1000Classes.txt') as classesfile:
ImageNetClasses = [line.strip() for line in Classesfile.readlines()]
Step 3: Predicting the Output Index of the Class
After that, we will predict the index number for the output based on the maximum score.
_, predicted = torch.max(output, 1)
percentage = torch.softmax(output, dim = 1)[0] * 100
print(ImageNetClasses[predicted[0]], percentage[predicted[0]].item())
- Here we have used the "Softmax" activation function. then outputting the probability of the predicted class and then multiplying it by 100 to retrieve the percentage of the predicted class.
- "predicted[0]" is the index of the class having the maximum score. So we will use this index to fetch the class name from the ImageNetClasses "ImageNetClasses[predicted[0]]".
- We will also print the percentage of the predicted class using "percentage[predicted[0]].item()".
As a result, you should have the output like this:
963: 'pizza, pizza pie', 99.99574279785156
2. AlexNet Model
As both ResNet and AlexNet Model use the same transforms and other characteristics, all we have to do is replace the resnet101 with alexnet in the section where we are loading the ResNet Model. The entire code other than that will be the same.
Loading the AlexNet Model
from torchvision import models
alexnet = models.alexnet(pretrained=True)
alexnet.eval()
output = alexnet(InputImg_batched)
The code will be exactly the same as the previous one.
Multi-Label Classification Using ResNet and AlexNet
We will use the ResNet and AlexNet models to perform multi-label classification.
Step 1: Read the Image from Google Drive in Google Colab
We will read the image from the Google Colab and display it using "matplotlib.pyplot".
from PIL import Image
InputImg = Image.open("/content/drive/My Drive/Data/Sample.jpg")
import matplotlib.pyplot as plt
plt.imshow(InputImg)
Step 2: Transforming the Image
We will use the same code to transform the image that we use above for the single-label classification. (Please refer to the code above)
Step 3: Converting the Image to the Batch
We will convert the image to the batch format exactly the same way we did for the single-label classification
import torch
InputImg_batched = torch.unsqueeze(Transformed_inputImage, 0)
1. ResNet Model
Step 4: Loading the ResNet Model
This time we will use ResNet152 version for multiple-label classification just to demonstrate that no matter which version we use, the code will be the same.
from torchvision import models
resnet = models.resnet152(pretrained=True)
resnet.eval()
After that, we will apply the ResNet model to the input Image
output = resnet(InputImg_batched)
Step 5: Reading the Classes text file from Google Drive in Google Colab
We will again read the ImageNet Classes Text file as the output of the ResNet model will give us the index of the predicted class.
with open('/content/drive/My Drive/Data/imagenet1000Classes.txt') as classesfile:
ImageNetClasses = [line.strip() for line in Classesfile.readlines()]
Step 6: Predicting the output index of the Classes
Here we will change the code as now we are interested to predict all the classes in the image instead of just predicting one class.
_, predictedLabels = torch.sort(output, descending = True)
Percentage = torch.sigmoid(output)[0] * 100
[(ImageNetClasses[index], Percentage[index].item()) for index in predictedLabels[0][:5]]
- This time we will not use "torch.max()" because we are not looking for a single class or label but are interested in finding the multiple labels in the image. that is why we will remove the max method and sort all the classes in descending order giving the max scoring class at the top of the list.
- We will replace the "softmax" activation function with "Sigmoid" as the softmax gives us the probabilities of all classes from 0 to 1 whereas the sigmoid gives the independent score for each class.
- We will also change the third line because this time we are not fetching the topmost class instead we are fetching the top 5 classes in the image. In the third line, we are fetching the ImageNetClass index and the percentage of the class against that index. As the classes are being stored in descending order that is why "[:5]" will fetch the top 5 classes present in the image.
If the image in the input had multiple animals in it then the predicted output should look like this:
[("281: 'tabby, tabby cat',", 99.99746373223),
("245: 'French bulldog',", 99.88746373223),
("285: 'Egyptian Cat',", 99.44746373223),
("195: 'Boston bull, Boston terrier',", 98.99746373223),
("254: 'pug, pug-dog',", 98.55746373223)
2. AlexNet Model
Just like Single-Label Model, while using Alexnet we will only replace the resnet152 with alexnet and all the other code will remain the same.
Step 1: Loading the AlexNet Model
from torchvision import models
alexnet = models.alexnet(pretrained = True)
alexnet.eval()
output = alexnet(InputImg_batched)
All the remaining code will be the same for the ResNet model for the Multi-Label Classification model.
Conclusion
Image Classification is the process of Computer Vision which helps us in classifying the objects present in an image. In this article, we used multiple versions of ResNet and AlexNet pre-trained models to predict the classes in the image. We performed Single-Label and Multi-Label classification using ResNet and AlexNet Deep Learning Models. The purpose of using the pre-trained model is to utilize the already learned knowledge of these models to predict the different types of objects using Transfer Learning.
Architecting Success Through Innovative Design & Development, In Publicis Sapient, Ex-Nagarro, Ex-Optum, Ex-Birlasoft, Ex-Snapon
1 å¹´This really helped..
Machine Learning Asst. Engineer | Silver Medalist | Web3 and Me Hackathon '22 Winner | Build with AI hack '21 Mentor | Smart Cities hack '21 finalist | Ex-MLSA, Hackmakers Ambassador
2 å¹´Good work Talha, I want to share one thing with you. In the article, the image having a bounding box (showing cat in red box) is named as a Single label classification example, although in a deeper sense we can consider it as classification but more concisely its object localization. In computer vision using deep learning works in this way: 1. Single object without bounding box = Classification. 2. Single object with bounding box = Localization (also called Object detection) 3. Multiple objects with bounding box = Object Detection
Electrical Engineer/Site Engineer Electrical/HT & LT Panel Termination Engineer/QA/QC Engineer/Executive Engineer Electrical
2 å¹´Good job
IT/IS Consultant, GRC, SAMA, NDMO, PDPL, ISO 27001, Cyber Defence and Network Security, DFIR| Cyber Security Instructor, Electrical Engineer | Penetration Testing | SOC Analyst | SIEM | IoT | ICS, IT/OT Security
2 å¹´Did you worked on Multi labels classification?