Applied Machine Learning: CNNs for Image Recognition

Aditya Vivek Thota

Technical Content Creator for Developer Blogs | React, Nextjs, Python, Applied AI

发布日期: 2018年11月8日

Before we get started, check out the previous article of this series here, Applied Machine Learning: Linear Regression, LassoCV, ElasticNet, RidgeCV, and xgboost, if you haven’t checked it yet. Like in the previous article, the model in this article is built using python in Spyder IDE. Refer that article if you are not familiar with how to set up your development environment.

Now let’s move on to the second application of machine learning. This time we explore the domain of image recognition. For this purpose, I have chosen the ‘Sign Language MNIST Dataset’. The idea here is to build a model to recognize what alphabet is being referred to in the sign language. The Sign Language MNIST dataset has images of hand gestures each representing one of the 24 alphabets. You are encouraged to choose your own dataset and create your own problem statement. The method of implementation will remain similar.

You can find a variety of datasets on Kaggle or by using the Google dataset search tool.

(Reference for Sign Language)

A note to the reader- If you wish to understand the technical basis of how a CNN works, you can refer to this article, Performance Analysis of Deep Learning Algorithms: Part 1.

This article primarily focuses on the practical implementation of a CNN on a non-standard dataset with a unique application.

Without any further delay, let’s get started with our project:

Given a hand gesture image, I want my model to recognize the corresponding sign language alphabet.
In most images recognition problems, using a Convolutional Neural Network [CNN for short] could work pretty well. Keep in mind that a CNN can be used for any image recognition problem just like how we use linear models like Regression for prediction problems. Here I will implement the same. Firstly, let’s import all necessary libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#Libraries for implementing a CNN
from keras.layers import Conv2D, Flatten, MaxPooling2D, Dense, Dropout
from keras.models import Sequential
from keras.utils import to_categorical

Next step is loading the dataset. Make sure that the datasets are copied into the main project folder. In my case, I have the training and testing dataset as separate files.

train = pd.read_csv('sign_mnist_train.csv')
test = pd.read_csv('sign_mnist_test.csv')

Now, we need to represent the data in such a way that it can represent the image. Note that the raw data has all pixel values stored as an array. It needs to be converted into a 28*28 matrix in this case. The ‘label’ column in the dataset gives us the information about what alphabet the image represents. Some sample input images are visualized below.

labels = train.pop('label')  #Pops the label column and stores in 'labels'
labels = to_categorical(labels)
train = train.values
train = np.array([np.reshape(i, (28,28)) for i in train])
train = train / 255

Next, we create the training and validation sets in the usual 70:30 ratio. Try to experiment with changing the ‘random_state’ parameter.

X_train, X_val, y_train, y_val = train_test_split(train, labels, test_size=0.3, random_state=41)
#Reshaping the training and validation sets
X_train = X_train.reshape(X_train.shape[0], 28,28,1)
X_val = X_val.reshape(X_val.shape[0], 28,28,1)

We are now ready to build our CNN. But how exactly do we do that? Follow the steps below. Understand that for the ‘input_shape’ parameter, you will have to use the dimensions of your input image. The basic architecture is the same. You can try experimenting with changing the numbers used to tweak your CNN. Make a note of ‘relu’ and ‘softmax’, both of which are the activation functions.
Notice that the Dropout set to 0.4 at the end. You can tweak that value as well. At the backend, it essentially reduces overfitting, i.e the phenomenon when the model performs really well on the training data but miserably fails with the test data. In the final line, you can see ‘25’ under the ‘Dense’ bracket. 25 here signifies the number of outcomes (or the number of classes) for my dataset. It may vary depending on the shape of your dataset.
This is how a typical CNN architecture looks like in code. Remember that this same architecture can be used in any dataset with just changing the input and output variables.

#Building Our CNN
model = Sequential()
model.add(Conv2D(8, (3,3), input_shape=(28,28,1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(16, (3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(25, activation='softmax'))

model.summary()

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 26, 26, 8)         80        
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 8)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 11, 11, 16)        1168      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               51328     
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 25)                3225      
=================================================================
Total params: 55,801
Trainable params: 55,801
Non-trainable params: 0
_________________________________________________________________

(A typical CNN architecture. Image Credits: Wikipedia)

Once you have your CNN ready, you can train your model by feeding in the training set we made earlier. There are several optimizers that can be used but ‘adam’ is the preferred one here. The number of epochs and batch size can be decided by you. Try tweaking with those values. A larger number of epochs leads to more training time but can produce better accuracy.

model.compile(loss = 'categorical_crossentropy', 
optimizer='adam', metrics=['accuracy'])

#Code for Training our Model
history = model.fit(X_train, y_train, validation_data = 
(X_val, y_val), epochs=50, batch_size=512)

Wait for the model to train. Once it’s done, we can plot the variation of accuracy with the epoch to visualize how our model is improving with each epoch.

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title("Accuracy")
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend(['train','test'])
plt.show()

Now it’s time to test how well our model performs on the test dataset. Observe all lines of codes carefully instead of blindly copying it. You will surely be able to make some good sense of the flow of the program.

y_test = test.pop('label')
y_test = to_categorical(y_test)
y_test.shape
X_test = test.values
X_test = np.array([np.reshape(i, (28,28)) for i in X_test])
X_test = X_test / 255
X_test = X_test.reshape(X_test.shape[0], 28,28,1)
X_test.shape

#Recognizing images on the test dataset
predictions = model.predict(X_test)
test_accuracy = accuracy_score(np.argmax(y_test, axis=1), np.argmax(predictions, axis=1))
print("The test accuracy is: ", test_accuracy)

#Result
The test accuracy is:  0.9223368655883993

The trained model got an accuracy of about 92.23%. That is pretty good. It is also possible to get much higher accuracy by fine-tuning the parameters we used.

What does this signify in reality? Our model can correctly recognize what alphabet a hand gesture refers to in sign language almost 92 times for every 100 images! Given a 28*28 image of a hand gesture as input, it is highly likely that our model will identify what it is correctly.

A more advanced version of this approach can be used to build a system where sign language can be converted into text in real time, and that text converted into speech, enabling mute people to speak effectively. Such is the potential of Machine learning.

With that we can conclude this project. If you had been following this series, by now your second application would have been ready. Using the same approach you could have built a flower recognition model, or an animal recognition model, etc. The possibilities are limited to the dataset you choose.

In case of any doubts or clarifications in applied machine learning or if you get stuck somewhere in implementing your model, feel free to ask down in the comments below.

Stay tuned for the next article where we will explore more diverse models and their application in real life scenarios.

This article was first published by me on The Research Nest's blog here.

#IndiaStudents #StudentVoices #ImageRecognition #CNN #MachineLearning

Trishanan D.

Product Manager | Data-Driven Approach | Cloud Computing (AWS) | Agile Development

5 年

Aditya,the most important thing about your article is simplicity.Well done brother!

1 次回应

Atikul Islam

Trailblazer at KuiperZ

5 年

Cool project

Pawan Kumar Verma

PMP?, Development Manager - DISHA Project at Ministry of Rural Development

5 年

WOW!

Jason Viljoen

Junior Business Analyst at Huawei Mobile South Africa

5 年

Very interesting. Looks like next step on top of micro expressions. Would be interesting to see the result in practice

1 次回应

Anshumaan Dash

Applied Deep Learning | Transforming Pixels into Insights ??

5 年

Interesting

查看更多评论

要查看或添加评论，请登录

Aditya Vivek Thota的更多文章

Coding for kids, teens, and adults- the pros, cons, and myths

2021年1月9日

Coding for kids, teens, and adults- the pros, cons, and myths

Coding has become a modern-day gold rush. It does make sense when you think about it.

18 条评论
Free resources and subscriptions for real-time knowledge

2021年1月2日

Free resources and subscriptions for real-time knowledge

"Awareness is power in a world where knowledge is everywhere" An autodidact is someone who learns by themselves. They…

20 条评论
Women In Tech And Beyond- The Change We Need

2019年12月28日

Women In Tech And Beyond- The Change We Need

After reflecting for a while with all the questions and perspectives I put forth in my previous article, "Where Are All…

2 条评论
Where Are All The Women In Tech? Looking Through The Lens Of Statistics

2019年12月28日

Where Are All The Women In Tech? Looking Through The Lens Of Statistics

The story of women in tech has more sides than you know. Many a time, perceptions are like cancer, caused by your own…

5 条评论
How To Make Your Chatbot Tweet News For You

2019年10月20日

How To Make Your Chatbot Tweet News For You

We are starting off right at where we left in Part 1 of this series, Building Yuki, A Level 3 Conversational AI Using…
Building Yuki, A Level 3 Conversational AI Using Rasa 1.0 And Python For Beginners

2019年10月5日

Building Yuki, A Level 3 Conversational AI Using Rasa 1.0 And Python For Beginners

Fascinated by AI-powered chatbots or natural language understanding and processing? but not sure where to start at and…

9 条评论
10 Startup Ideas And Domains To Focus On, For Aspiring Entrepreneurs

2019年7月15日

10 Startup Ideas And Domains To Focus On, For Aspiring Entrepreneurs

"If Entrepreneurship was a religion, will Elon Musk be the God of it?" I start with a reference to Elon Musk and a…

4 条评论
A Brief Literature Review on the Application of Deep Learning in Building Self Driving?Cars

2019年5月28日

A Brief Literature Review on the Application of Deep Learning in Building Self Driving?Cars

Introduction Self-driving cars are expected to have a revolutionary impact on multiple industries fast-tracking the…
Color Coding Of Emotions in Newspaper Articles: A Qualitative Analysis

2019年4月16日

Color Coding Of Emotions in Newspaper Articles: A Qualitative Analysis

Picking up from where we left off in my previous article, 'Understanding Color Psychology: A Literature Review' (Do…

2 条评论
Understanding Color Psychology: A Literature Review

2019年4月14日

Understanding Color Psychology: A Literature Review

“Color is a power which directly influences the soul.” ~Wassily Kandinsky The relation between colors and emotions is…

4 条评论

See all articles

Applied Machine Learning: CNNs for Image Recognition

Aditya Vivek Thota

Technical Content Creator for Developer Blogs | React, Nextjs, Python, Applied AI

Aditya Vivek Thota的更多文章

社区洞察

其他会员也浏览了

Top Libraries for Building My AI Project

Tensorflow

Tensorflow Extended (TFX) - Towards End to End Machine Learning pipeline - Part 1

Object Detection Using EfficientNet in Tensorflow 2

Some possible interview questions for Machine Learning Engineer with Python

Machine Learning Libraries

Artificial Intelligence Course ( Basic)

MLBP 9: ONNX Shakes up the Deep Learning Landscape and Numpy Drops Support for Python 2.7

SVM

top 10 AI tools and frameworks

Aditya Vivek Thota的更多文章

Coding for kids, teens, and adults- the pros, cons, and myths

Free resources and subscriptions for real-time knowledge

Women In Tech And Beyond- The Change We Need

Where Are All The Women In Tech? Looking Through The Lens Of Statistics

How To Make Your Chatbot Tweet News For You

Building Yuki, A Level 3 Conversational AI Using Rasa 1.0 And Python For Beginners

10 Startup Ideas And Domains To Focus On, For Aspiring Entrepreneurs

A Brief Literature Review on the Application of Deep Learning in Building Self Driving?Cars

Color Coding Of Emotions in Newspaper Articles: A Qualitative Analysis

Understanding Color Psychology: A Literature Review

社区洞察

其他会员也浏览了

Top Libraries for Building My AI Project

Tensorflow

Tensorflow Extended (TFX) - Towards End to End Machine Learning pipeline - Part 1

Object Detection Using EfficientNet in Tensorflow 2

Some possible interview questions for Machine Learning Engineer with Python

Machine Learning Libraries

Artificial Intelligence Course ( Basic)

MLBP 9: ONNX Shakes up the Deep Learning Landscape and Numpy Drops Support for Python 2.7

SVM

top 10 AI tools and frameworks