Machine Learning: How Does a Neural Network Recognize Handwritten Digits?
The human brain can easily recognize differences between images after being given just a few examples. Neural networks have more difficultly recognizing the same images and often require thousands of data-set inputs to be able determine what an image represents. To understand how neural networks recognize images, it is helpful to break down their structure.
Neural networks are inspired by the brain. Information is linked by neurons, which hold a number between 0 and 1. For this example, picture an image of a handwritten 9 on a 28x28 grid consisting of 784 pixels. The level of brightness of each pixel on a greyscale is the neuron’s “activation,” with 0 representing black and 1 representing white.
Figure 1: Each neuron is lit up when it’s activation is a higher number
Each of these 784 neurons make up the first layer of the neural network. The last layer consists of 10 neurons, each representing a single digit. The way the network operates activation patterns in one layer determines the activation patterns of the next layer. Each neuron represents how much the system thinks that a given image corresponds with a given digit. The last layer of the network determines which digit an image represents based on which neuron has the highest activation.
Figure 2: The last layer of the neural network determines which digit is represented by the given image, which in this case is a 9.
There are hidden layers in between the first and last that are key in recognizing digits. A nine, for example, has one loop and one line that makes up that digit. Each neural network layer must recognize different components of digits in order to determine that the digit in our example is a nine.
Recognizing the various edges of a digit is one way this can be done. Each layer in the network determines an increasingly higher-level view of the edges of a number, which should ultimately determine the digit.
The goal is for the neural network to move from pixels to edges, edges to patterns, and patterns to digits.
Parameters are used to help neural networks recognize edges and patterns. The parameters are based off of a weighted sum of the weights and activations of neurons from previous layers. Weights are added to the connections between a neuron and the neurons from the previous layer. By adding the activations from the previous layer, a weighted sum can be computed. The weighted sum is largest when pixels are bright, but the surrounding pixels are darker, helping the neural network determine the edges of a digit.
The weighted sum can be any number, but for this example the activations should all be between 0 and 1. The sigmoid function squishes the real number line into the range between 0 and 1, and can use a “bias” to define how high the weighted sum needs to be for the neuron to be meaningfully active. Each of the 784 neuron connections have their own weights and biases associated with them. The activations and weights can be organized into a matrix-vector product, added to a vector of biases, and multiplied by the sigmoid.
Figure 3: The Sigmoid Function
This introductory explanation of how a neural network can recognize a handwritten digit can be summarized as a series of functions. Each neuron, represented by a number between 0 and 1, is just a function that takes the output from the previous layer and spits out a new output to be used in the next layer. The network itself is a function involving 13,002 parameters taking in 784 numbers as an input and spitting out 10 numbers as an output. Getting the computer to find the right weights and biases to solve the problem at hand is the goal of machine learning. Understanding the weights and biases helps to avoid the black box problem of not knowing how a machine makes decisions, and it helps for changing the network structure to improve later on down the line.