Creating Your First Neural Network

Creating Your First Neural Network

Intro

In this article, we will build a simple Neural Network to provide an outline of the process of creating a Neural Network. We will use Keras/TensorFlow to implement this neural network. Additionally, we will leverage the popular MNIST dataset from Yann LeCun and Corinna Cortes, which is conveniently available through keras.datasets.

Load the dataset.

MNIST dataset has 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images

import tensorflow as tf

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data(path="mnist.npz")

print(x_train.shape) #Outputs (60000, 28, 28)
print(x_test.shape) #Outputs (10000, 28, 28)
print(y_test.shape) #Outputs (10000,)
print(y_test.shape) #Outputs (10000,)        

You can view any of the loaded images by using calling image show on pyplot.

import matplotlib.pyplot as plt
plt.imshow(x_train[20000], cmap='binary_r')        

This function can visualize 2D array gray scale or color. Here we have a gray scale image of number 5.

Fig 1: Number 5 at index 20,000


Normalize the data

Normalize the train and test data between 0 - 1. Normalizing the data speeds up learning and leads to faster convergence.

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)        

Define the Neural Network Model

First we Flatten the image from 2D (28, 28) to 1D (28*28)

After that we have 1 hidden layer with 128 neurons

The final output layer is 10 neurons which indicate the output for 10 digits. Activation function 'softmax' has been used to give probability

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

model = Sequential(
    [Flatten(input_shape=(28, 28)),
     Dense(units = 128, activation="relu"),
     Dense(units = 10, activation="softmax")]
)        

Compile the model

Below we use 'Adam 'as the optimizer which is a Stochastic Gradient Descent method. Default learning rate of 0.001 is retained.

model.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])        

Train the model

We run this for 10 epochs so the weights will be updated across the neural network 10 times.

In each epoch, the loss function is calculated using batches of size 10.

model.fit(x=x_train, y=y_train, epochs=10, batch_size=10, verbose=2)        

Predict the Output for Test Data

predictions = model.predict(x=x_test, batch_size=10, verbose=0)        

Check the Output

Let us look at the test data at index 5000

plt.imshow(x_test[5000], cmap='binary_r')        

It is Digit 3

Fig 2: Test data at index 5000


Now let us check our prediction at index 5000. It displays the output array with the values of 10 neurons. We can see the higest probability out of all 10 is for digit 3 at index 3 with value of 9.9999994e-01

predictions[5000]

array([5.7262279e-13, 4.6293011e-11, 1.5742700e-09, 9.9999994e-01,
       2.1653295e-13, 2.8630273e-08, 1.4112982e-13, 2.4477271e-09,
       3.2406788e-08, 1.2772884e-15], dtype=float32)        

predictions output currenctly has a shape of (10000, 10)

Reshape Output to Max Probability

Let us select only the max from the 10 and reduce the array to (10000,) only with each output showing the predicted value only.

import numpy as np
rounded_predictions = np.argmax(predictions, axis=-1)        

You can print now the value of the index 5000 and you will see the output as number 3

rounded_predictions[5000]

#Output is 3        

Performance of the Network

We can display the confusion matrix to see the performance of the classification

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import itertools
import matplotlib.pyplot as plt
ConfusionMatrixDisplay.from_predictions(y_test, rounded_predictions, cmap="Blues")        

Here is the output

Fig 3: Confusion Matrix

he main diagonal of the matrix shows the number of correct predictions for each class, while off-diagonal elements represent instances that were misclassified.

If you look at the diagonal the Neural Network did quite well.

Outro

In conclusion, we've taken a hands-on approach to building a simple neural network using Keras/TensorFlow and exploring its functionality. We delved into the process of creating a neural network and applied it to the well-known MNIST dataset. As you've seen, creating a neural network doesn't have to be overly complex, and Keras provides a user-friendly interface for such tasks. This article serves as a starting point, and there's much more to explore in the vast field of deep learning. Whether it's experimenting with different architectures, exploring advanced features, or applying neural networks to other datasets, the journey into the world of neural networks is both exciting and limitless.

I look forward to your inputs, suggestions or any corrections.

要查看或添加评论,请登录

Amit Juneja的更多文章

  • Brief of Serverless Keynote by Peter DeSantis @ AWS re:Invent 2023

    Brief of Serverless Keynote by Peter DeSantis @ AWS re:Invent 2023

    What is the promise of Serverless? Serverless removes the muck of caring for servers. What are the key attributes of a…

  • Introduction to Neural Networks

    Introduction to Neural Networks

    Intro In this article, my attempt is to summarize the intuition behind the neural networks for a higher level…

    8 条评论
  • Generative AI: Disruption in Content Creation

    Generative AI: Disruption in Content Creation

    Generative AI is a disruptive technology in the field of content generation. So far knowledge and creative work has…

    1 条评论
  • Talking to Strangers - Malcolm Gladwell

    Talking to Strangers - Malcolm Gladwell

    I had the pleasure of reading the book “Talking to Strangers” by Malcolm Gladwell over the spring break. I always found…

    6 条评论

社区洞察

其他会员也浏览了