Hand-Written Digit Classification

Hand-Written Digit Classification

In this article, I will show you how to classify hand written digits from the MNIST database using the python , Keras an open-source neural-network library written in Python and a machine learning technique called Convolutional Neural Networks!

Lets Begin....

I need to install the dependencies / packages. If you don’t already have these packages installed, run the following command in your terminal, command prompt or, Google Colab (depending on where you have your python programming language installed).

!pip install tensorflow keras

Keras is a neural network library while TensorFlow is the open source library for a number of various tasks in machine learning. TensorFlow provides both high-level and low-level APIs while Keras provides only high-level APIs.

Importing Libraries

Now that I’m done installing all of the necessary packages, I want to import the packages into my program.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

This are some basic libraries , Numpy is an open source module of Python which offers fast mathematical computation on arrays and matrices. PandasPandas is one of the most widely used python libraries in data science.The matplotlib Python library, is used to create high-quality graphs, charts, and figures.

from keras.datasets import mnist
from tensorflow.keras.layers import Dense,Flatten, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import normalize, to_categorical
from sklearn.preprocessing import OneHotEncoder
  • We are importing the MNIST digits classification dataset from keras.
  • Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True). If inputs are shaped (batch,) without a channel dimension, then flattening adds an extra channel dimension and output shapes are (batch, 1). The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.
  • Sequential provides training and inference features on this model.
  • tensorflow.keras.utils import to_categorical converts a class vector (integers) to binary class matrix. tensorflow.keras.utils import normalize Normalizes a Numpy array.
  • from sklearn.preprocessing import OneHotEncoder : Encode categorical features as a one-hot numeric array.The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse parameter)By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually.This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.
 (x_train, y_train), (x_test, y_test) = mnist.load_data()

 Next, load the data set into the variables x_train (the variable that contains the images to train on) , y_train (the variable that contains the labels of the images in the training set), x_test (the variable that contains the images to test on), and the y_test (the variable that contains the labels of the images in the test set)

#Get the image shape
print(X_train.shape)
print(X_test.shape)

Output:
(60000, 28, 28)
(10000, 28, 28)

Get the image shape of the feature data sets. Notice the x_train shape contains 60,000 rows of 28 x 28 pixel images. The x_test shape contains 10,000 rows of 28 x 28 pixel images.

X_train[0]
No alt text provided for this image

We, observed that our data is in 2-D format.Take a look at the first image in the training data set as a numpy array. This shows the image as a series of pixel values.

Now, Lets look at some digit in our data

some_digit = x_train[0]
plt.imshow(some_digit, cmap=plt.cm.binary)
plt.axis("off")
plt.show()
No alt text provided for this image

Show the image not as a series of pixel values, but as an actual image.

Processing our Data

X_train = normalize(x_train)
X_test = normalize(x_test)
Y_train = to_categorical(y_train)
Y_test = to_categorical(y_test)

Lets, see what is in Y_train

print(Y_train[0])

Output:
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

As we know Y_train[0] = 5 , that's why in output at index 5 we got 1 , else 0.

Deep Learning Without Dropout
model = Sequential()
model.add(Flatten(input_shape = (28,28)))
model.add(Dense(784, activation="relu"))
model.add(Dense(10, activation="softmax"))

Check model summary:

model.summary()
No alt text provided for this image

Complie the model

You need the "loss" for training too, so you can't train without compiling. And you can compile a model as many times as you want, and even change the parameters. You input something, the model calculates the output. In the end of everything, this is all that matters.

model.compile(loss = "categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

Categorical crossentropy is a loss function that is used for single label categorization. This is when only one category is applicable for each data point. In other words, an example can belong to one class only. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. We need to compile the model. We will use the adam optimizer which controls the learning rate, a loss function called categorical_crossentropy which is used for a number of classes greater than 2 (like the 10 different labels in the target data set), and metrics to see the accuracy score on the validation set when we train the model.

Training the model

Train the model on the training data set ( X_train and Y_train). I will iterate 10 times over the entire data set to train on, with a number of 50 samples per gradient update for training. Then store this trained model into the variable history.

history = model.fit(X_train, Y_train, epochs=10, batch_size=50, validation_split=0.2)

Batch: Total number of training samples present per gradient update

Epoch:The number of iterations when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

Fit: Another word for train

No alt text provided for this image

Looks like the model was 99.71% accurate on the training data and 97.56% accurate on the test data. Let’s visualize the models accuracy.

loss = history.history["loss"]
val_loss = history.history["val_loss"]
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
n = np.arange(1, 11)

Visualize Training and Testing Loss:

plt.plot(n, loss, label = "Training Loss")
plt.plot(n, val_loss, label = "Testing Loss")
plt.legend()
plt.show()
No alt text provided for this image

Visualize of Testing and Training accuracy

No alt text provided for this image

The first 4 images let’s print the actual values / labels of each image to see how they match up.

predictions = model.predict(X_test[:4])
#Print our predicitons as number labels for the first 4 images
print( np.argmax(predictions, axis=1))
#Print the actual labels
print(y_test[:4])

Output:
[7 2 1 0]
[7 2 1 0]

Top: The predicted labels, Bottom: The actual labels

Let’s show the first four images as pictures !

for i in range(0,4):   
   image = X_test[i]   
   image = np.array(image, dtype='float')   
   pixels = image.reshape((28,28))  
   plt.imshow(pixels, cmap='gray')   
   plt.show()
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
Deep Learning with Dropout
No alt text provided for this image

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.

Dropout is easily implemented by randomly selecting nodes to be dropped-out with a given probability (e.g. 20%) each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.

model_d = Sequential()
model_d.add(Flatten(input_shape = (28,28)))
model_d.add(Dense(784, activation="relu"))
model_d.add(Dropout(0.5))
model_d.add(Dense(130, activation="relu"))
model_d.add(Dropout(0.5))
model_d.add(Dense(10, activation="softmax"))

Again complie and train the model:

model_d.compile(loss = "categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
history = model_d.fit(X_train, Y_train, epochs=10, batch_size=50, validation_split=0.2)
No alt text provided for this image

The model was 97.80% accurate on the training data and 97.83% accurate on the test data. Let’s visualize the models accuracy.

loss = history.history["loss"]
val_loss = history.history["val_loss"]
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
n = np.arange(1, 11)

Visualizing Loss:

plt.plot(n, loss, label = "Training Loss")
plt.plot(n, val_loss, label = "Testing Loss")
plt.legend()
plt.show()
No alt text provided for this image

Visualizing Accuracy:

plt.plot(n, accuracy, label = "Training Accuracy")
plt.plot(n, val_accuracy, label = "Testing Accuracy")
plt.legend()
plt.show()
No alt text provided for this image

We are done creating the program !


Kaushal Bhavsar

Machine Learning Researcher

4 年

When you add a dropout layer, you can try it with early dropouts technique to get the global minima of the loss. The model might look converging but it might be a local minima. Try it out!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了