Neural Network
In this article I am going back to the basics, Neural Networks!
Most of the readers must have seen the picture above and heard of neural networks, perceptron’s, neurons, hidden layers etc. I would like to take this opportunity to explain each of those terms and how do they connect back to the network.
Some Background -
The Neural Network concept has been around for almost a century now, it was initially coined in 1944, not until the invention of graphic chips with increased processing power did, they really pick up steam.
Neural networks are in loose terms a model architecture intended to do machine learning, in which model learns to perform a task by analyzing training data. Example, you have pictures of handwritten number '4' & have labelled them, by using these as training data, your model will identify patterns in the handwritten pictures when trained and associate to number '4'. If you show next time a picture of '5' if will be able to tell it’s not '4', simple classification. Just like any other model like decision trees, logistic regression etc.
Then what makes them powerful?
What are Neural Networks?
In simple terms, NN is a grouping of processing nodes that are densely interconnected. Nodes are organized into layers to allow data to move thorough them to the next layer. Individual nodes inherently can add no value, but their strength is in numbers. To each of the connections, a node will assign a weight, when network is active, node receives data, for each of the data point passing through a new weight is assigned. A single number is derived as a product of multiple associated input nodes weights, if the number exceeds the threshold value, the node will send the sum of weighted inputs along all its outgoing connections.
NN at the beginning has random values assigned to weights & thresholds of nodes which will be optimized during the training process and learning. Training data is fed thru input layer, data passes thru the succeeding layer until it reaches to output layer, during this process weights & thresholds are adjusted until training labels are consistently predicted for the feature set combinations.
you might ask what nodes/weights/thresholds/layers etc. I got you covered.
Nodes - It has multiple names Neuron/Perceptron/Nodes, they all point to the basic unit of neural network. Each node receives a set of inputs and bias values, when the input arrives it gets multiplied by a weight value.
Connection - Each node might have connections to its input layer or within its layer. The transfer of input from one layer node to receiver layer is called a connection.
Input (x) - Can be the function output from a node or training dataset value received by a node. Each input is associated with a corresponding weight.
Weight (w) - It represents the magnitude of influence of an input on the node.
Activation Function f(z) - It is a nonlinear transformation of input values. There are various types of functions, most used are sigmoid, Tanh, ReLU, SoftMax etc. These functions help boundary the input value between a range 0 to 1.
Bias - It ensures nodes are activated even when there are no input values to a node. It is an extra input to neurons, and it is always 1 with its own weights.
Layers - Layers are a logical representation of the nodes based on their input and output connections. There are broadly three major layer types - Input, Hidden & output.
Forward Propagation - It is the process of samples at input from training dataset moving thru the nodes at each hidden layer based on the weights, bias & activation function transformation until it reaches the final output layer to predict the label in training set.
Loss/Cost Function - A function to estimate deviation of the estimated values from the actual values. Neural network efficiency depends on the optimization of the loss function and total loss value (Error). Loss Function depends on the use case we are trying to solve, for regressions we can think of MSE, classification then Entropy's etc.
Backward Propagation - Is the process of tracing steps back to the input layer from output layer to adjust weights in a way that it reduces the error from loss function. For the Error from loss function post forward propagation, we can calculate the derivation with respect to weights from the last layer, these derivatives are called Gradients. Gradients from one layer can be used to derive gradients of its previous layer and so on till we reach input layer. This allows us to derive gradients for every weight, to reduce error we subtract gradient from weights and rerun the forward propagation. This allows models to descent to local minima. Gradient Descent is one type of simple optimization algorithm, there are other types like stochastic?etc. based on the Cost Function, learning rate & Regularization used.
Batch Size - If you are using keras/TensorFlow etc. you must have seen Batch being used, it is the number of training examples in one forward/backward pass. The higher the batch size more memory space.
Epoch - One forward pass & one backward pass of all the training examples. Training Epochs is the number of times a model is exposed to the training dataset.
I have generalized and simplified the language for ease of understanding, please take it with a grain of salt. There are multiple variants of neural networks based on the use case. Hope this helps with one stop reference for most of the neural network terminology, I have also added references to the picture & Gif's which can be a resourceful if interested in deeper dive.