Hand-Written Digit Classification
In this article, I will show you how to classify hand written digits from the MNIST database using the python , Keras an open-source neural-network library written in Python and a machine learning technique called Convolutional Neural Networks!
Lets Begin....
I need to install the dependencies / packages. If you don’t already have these packages installed, run the following command in your terminal, command prompt or, Google Colab (depending on where you have your python programming language installed).
!pip install tensorflow keras
Keras is a neural network library while TensorFlow is the open source library for a number of various tasks in machine learning. TensorFlow provides both high-level and low-level APIs while Keras provides only high-level APIs.
Importing Libraries
Now that I’m done installing all of the necessary packages, I want to import the packages into my program.
import numpy as np import pandas as pd import matplotlib.pyplot as plt
This are some basic libraries , Numpy is an open source module of Python which offers fast mathematical computation on arrays and matrices. Pandas: Pandas is one of the most widely used python libraries in data science.The matplotlib Python library, is used to create high-quality graphs, charts, and figures.
from keras.datasets import mnist from tensorflow.keras.layers import Dense,Flatten, Dropout from tensorflow.keras.models import Sequential from tensorflow.keras.utils import normalize, to_categorical from sklearn.preprocessing import OneHotEncoder
- We are importing the MNIST digits classification dataset from keras.
- Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True). If inputs are shaped (batch,) without a channel dimension, then flattening adds an extra channel dimension and output shapes are (batch, 1). The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.
- Sequential provides training and inference features on this model.
- tensorflow.keras.utils import to_categorical converts a class vector (integers) to binary class matrix. tensorflow.keras.utils import normalize Normalizes a Numpy array.
- from sklearn.preprocessing import OneHotEncoder : Encode categorical features as a one-hot numeric array.The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse parameter)By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually.This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Next, load the data set into the variables x_train (the variable that contains the images to train on) , y_train (the variable that contains the labels of the images in the training set), x_test (the variable that contains the images to test on), and the y_test (the variable that contains the labels of the images in the test set)
#Get the image shape print(X_train.shape) print(X_test.shape) Output: (60000, 28, 28) (10000, 28, 28)
Get the image shape of the feature data sets. Notice the x_train shape contains 60,000 rows of 28 x 28 pixel images. The x_test shape contains 10,000 rows of 28 x 28 pixel images.
X_train[0]
We, observed that our data is in 2-D format.Take a look at the first image in the training data set as a numpy array. This shows the image as a series of pixel values.
Now, Lets look at some digit in our data
some_digit = x_train[0] plt.imshow(some_digit, cmap=plt.cm.binary) plt.axis("off") plt.show()
Show the image not as a series of pixel values, but as an actual image.
Processing our Data
X_train = normalize(x_train) X_test = normalize(x_test) Y_train = to_categorical(y_train) Y_test = to_categorical(y_test)
Lets, see what is in Y_train
print(Y_train[0]) Output: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
As we know Y_train[0] = 5 , that's why in output at index 5 we got 1 , else 0.
Deep Learning Without Dropout
model = Sequential() model.add(Flatten(input_shape = (28,28))) model.add(Dense(784, activation="relu")) model.add(Dense(10, activation="softmax"))
Check model summary:
model.summary()
Complie the model
You need the "loss" for training too, so you can't train without compiling. And you can compile a model as many times as you want, and even change the parameters. You input something, the model calculates the output. In the end of everything, this is all that matters.
model.compile(loss = "categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
Categorical crossentropy is a loss function that is used for single label categorization. This is when only one category is applicable for each data point. In other words, an example can belong to one class only. Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. We need to compile the model. We will use the adam optimizer which controls the learning rate, a loss function called categorical_crossentropy which is used for a number of classes greater than 2 (like the 10 different labels in the target data set), and metrics to see the accuracy score on the validation set when we train the model.
Training the model
Train the model on the training data set ( X_train and Y_train). I will iterate 10 times over the entire data set to train on, with a number of 50 samples per gradient update for training. Then store this trained model into the variable history.
history = model.fit(X_train, Y_train, epochs=10, batch_size=50, validation_split=0.2)
Batch: Total number of training samples present per gradient update
Epoch:The number of iterations when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.
Fit: Another word for train
Looks like the model was 99.71% accurate on the training data and 97.56% accurate on the test data. Let’s visualize the models accuracy.
loss = history.history["loss"] val_loss = history.history["val_loss"] accuracy = history.history["accuracy"] val_accuracy = history.history["val_accuracy"] n = np.arange(1, 11)
Visualize Training and Testing Loss:
plt.plot(n, loss, label = "Training Loss") plt.plot(n, val_loss, label = "Testing Loss") plt.legend() plt.show()
Visualize of Testing and Training accuracy
The first 4 images let’s print the actual values / labels of each image to see how they match up.
predictions = model.predict(X_test[:4]) #Print our predicitons as number labels for the first 4 images print( np.argmax(predictions, axis=1)) #Print the actual labels print(y_test[:4]) Output: [7 2 1 0] [7 2 1 0]
Top: The predicted labels, Bottom: The actual labels
Let’s show the first four images as pictures !
for i in range(0,4): image = X_test[i] image = np.array(image, dtype='float') pixels = image.reshape((28,28)) plt.imshow(pixels, cmap='gray') plt.show()
Deep Learning with Dropout
Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.
Dropout is easily implemented by randomly selecting nodes to be dropped-out with a given probability (e.g. 20%) each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.
model_d = Sequential() model_d.add(Flatten(input_shape = (28,28))) model_d.add(Dense(784, activation="relu")) model_d.add(Dropout(0.5)) model_d.add(Dense(130, activation="relu")) model_d.add(Dropout(0.5)) model_d.add(Dense(10, activation="softmax"))
Again complie and train the model:
model_d.compile(loss = "categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) history = model_d.fit(X_train, Y_train, epochs=10, batch_size=50, validation_split=0.2)
The model was 97.80% accurate on the training data and 97.83% accurate on the test data. Let’s visualize the models accuracy.
loss = history.history["loss"] val_loss = history.history["val_loss"] accuracy = history.history["accuracy"] val_accuracy = history.history["val_accuracy"] n = np.arange(1, 11)
Visualizing Loss:
plt.plot(n, loss, label = "Training Loss") plt.plot(n, val_loss, label = "Testing Loss") plt.legend() plt.show()
Visualizing Accuracy:
plt.plot(n, accuracy, label = "Training Accuracy") plt.plot(n, val_accuracy, label = "Testing Accuracy") plt.legend() plt.show()
We are done creating the program !
Machine Learning Researcher
4 年When you add a dropout layer, you can try it with early dropouts technique to get the global minima of the loss. The model might look converging but it might be a local minima. Try it out!