Friendly?? Guide to Neural Net!
Deep Learning is a growing field in the last 10 years and it was growing too fastly. Because it solves more complex problems in the real world. In simple words, Deep Learning is like a human brain. It can learn about music, images, pattern, and more. So, it solves more complex real-world problems easily.
How it's learning?: It internally uses the NN (Neural Networks), NN helps to learn more complex data, and it solves the various kinds of problems using simple neural networks.
In this?post, we will explore the ins and outs of a simple neural network. And by the end, hopefully, you (and I) will have gained a deeper and more intuitive understanding of how neural networks do what they do.
???? We will learn what is the neural network, and how the NN learns. Before that, we will refresh some math for NN.
Basics of Neural Network:
Vector
Feature Vector - n dimension vector contains the information on labels
Dot Product:
Matrix Multiplication
Exponent
Graph:
Logarithm
Linear :
Non-Linear:
Now you can able to understand the Simple Neural Network. Let's break it??!
Let’s start with a really high-level overview so we know what we are working with.?Neural networks are multi-layer networks of neurons (the blue and magenta nodes in the chart below) that we use to classify things, make predictions, etc. Below is the diagram of a simple neural network with five inputs, 5 outputs, and two hidden layers of neurons.
The arrows that connect the dots show how all the neurons are interconnected and how data travels from the input layer all the way through to the output layer.
The line connects with each other neurons; it means it connected each node with each other. Each node shares the output to all neighbor nodes.
In fact, all the neurons are independent of each other, because they don't have any relationship with each other, each node will receive the input from other neighbor nodes on behalf, they don't do any things with each other.
Each node is individual with each other, they just get input from neighbor nodes, it is very important. Because each node will learn individually with the input of neighbor nodes.
Let me break the entire architecture ??
I will take a single node and we will learn what each node does and how they learn.
Single node referred to a single perceptron. The first image is the human neuron. Human neurons contain dendrites, nuclei, and axons. This helps to learn new things, same as this, each perceptron will learn by weights, bias, and loss function. Don't worry, we will see each thing clearly!
Single Perceptron contains weights, bias, summation, and non-linear function.
SIGMA - Activation Function , b - bias
X - Input, Wo - weights
Single perceptron does two functions, summation, and adding nonlinearity (sigma) to the values.
The output of this will send to the next node, and the same process will continue until the last node.
I hope you understand this, this is forward propagation. It means value starting from input to output, same backpropagation means value pass output to input and it has many loops.
Forward Propagation, weights initialized randomly for each node, bias is 0 for each node and it does the operation and the last layer will give the output.
Let’s Add a Bit of Complexity Now
Now that we have our basic framework, let’s go back to our slightly more complicated neural network and see how it goes from input to output. Here it is again for reference:
The first hidden layer comprises two neurons. So, connecting all five inputs to the neurons in Hidden Layer 1, we need ten connections. The next image (below) shows just the connections between Input 1 and Hidden Layer 1.
Note our notation for the weights that live in the connections — W1,1 denotes the weight that lives in the connection between Input 1 and Neuron 1 and W1,2 denotes the weight in the connection between Input 1 and Neuron 2. So the general notation that I will follow is Wa,b?denotes the weight on the connection between Input?a?(or Neuron?a) and Neuron?b.
领英推荐
Now let’s calculate the outputs of each neuron in Hidden Layer 1 (known as the activations). We do so using the following formulas (W?denotes weight,?In?denotes input).
Z1 = W1*In1 + W2*In2 + W3*In3 + W4*In4 + W5*In5 + Bias_Neuron1
Neuron 1 Activation =?Sigmoid(Z1)
We can use matrix math to summarize this calculation (remember our notation rules — for example, W4,2 denotes the weight that lives in the connection between Input 4 and Neuron 2):
For any layer of a neural network where the prior layer is?m?elements deep and the current layer is?n?elements deep, this generalizes to:
[W] @ [X] + [Bias] = [Z]
Where [W] is your?n by m?matrix of weights (the connections between the prior layer and the current layer), [X] is your?m?by 1?matrix of either starting inputs or activations from the prior layer, [Bias] is your?n by 1?matrix of neuron biases, and [Z] is your?n by 1?matrix of intermediate outputs.?In the previous equation, I follow Python notation and use @ to denote matrix multiplication. Once we have [Z], we can apply the activation function (sigmoid in our case) to each element of [Z] and that gives us our neuron outputs (activations) for the current layer.
Finally before we move on, let’s visually map each of these elements back onto our neural network chart to tie it all up ([Bias] is embedded in the blue neurons).
By repeatedly calculating [Z] and applying the activation function to it for each successive layer, we can move from input to output. This process is known as forward propagation. Now that we know how the outputs are calculated, it’s time to evaluate the quality of the outputs and train our neural network.
Back Propagation:
Neural Networks are learned during backpropagation using optimizers, optimizers. help to reduce the cost function (actual value - output value). If the difference is very high, the optimizer helps to reduce the cost function value.
Optimizer is a very vast topic, but here we will build some intuition for the optimizer. Just understand optimizer helps to reduce the cost function value, and it also helps to make the learning faster and slower based on the cost value.
Actually, Backpropagation is exactly a gradient descent but more function is evolving here (multidimensional space) for that we will use the chain rule to update the weights.
Attention?? Read Carefully!
Step1: Forward pass (compute the output and find the cost value)
The formula for calculating the forward pass.
Here, X is constant, W is a learning parameter.
Step2: Backward Pass (update the weight using chain rule)
Weight updation formula.
Etta- learning rate, and derivate L - a derivative of the loss
Let's see how to calculate the derivative for loss or cost value.
If you see carefully, it is an embedded function.
Partial derivatives
dw - weights, dl (y-y) - loss or cost function
This is a formula for the derivative of loss.
After calculating the derivatives, the weight updation will occur for every neuron in the network and again it will do forward and backward pass based on the iteration. What you have set.
Small reminder:
Another:
I hope you understood!
Did you like this article? Don't forget to share:
Look at our latest articles:
Activation Functions
Gentle Introduction to Inferential Statistics!
+
Name: R.Aravindan
Company: Artificial Neurons.AI
Position: Content Writer