Neural Learning with Tensorflow2.0 Part-2 (Overview of Gradient Descent and building simple model with Tensorflow)
In Part1 we have seen basics of Neural networks, how perceptron model and multi-layer perceptron model can be represented and finally use of some activation functions.
In this post I would like to explain Cost Function, Gradient Descent and Back Propagation with an example by building a model with tensorflow.
Let's say we have built a neural network for a classification problem and predicted an output. But the questions we might come across are
how do we evaluate the output ? & How do we updated the weights and biases ?
The answer for first question is by having a Cost Function.
What we do is we will take the estimated/predicted output and compare it to the real values of the label. Cost Function which also called as Loss Function outputs a single value often the average. We can keep track of this value to monitor model performance.
For example let 'y' be the true value and 'a' represent the model prediction. One very common cost function is quadratic cost function represented as
If there is some x input y(x) will be the true value and we subtract our predicted value (here L represent last layer meaning : activation function output of Lth layer). To have absolute measurement of error we are squaring the difference. Remember the activation a^L has all the weights and biases. So we need to choose certain weights to minimize the cost function C(W).
So we start with random value of weight and compute the cost function, then update the weights until we reach point 'B' which is minimum. If we choose slower steps to move it will increases the model training time. If we choose larger steps we may overshoot the minimum. So we need to be careful while choosing this step size. This step size is known as Learning Rate. We can be clever while choosing the step size. We could start of with larger step size and then go smaller as we realize slope gets closer to zero. This is known as Adaptive Gradient Descent. 'ADAM' a method for stochastic optimization is much more efficient way of searching for these minimums.
For classification problems instead of using Quadratic Cost Function we use Cross Entropy Loss Function. Basically we assume that model predicts a probability distribution p(y=i) for each class i=1,2,3..., C. For binary classification this results in -(ylog(p)+(1-y)log(1-p)), to compare how close model predictions are with true value.
Coming to our second question on How do we update weights and biases ? we already discussed that we can achieve this with gradient descent. But in a complex network of layers, there are different number of neurons in each layer. Depending on the cost estimation of last layer, we will back propagate and update the weights of neurons in all the layers. This is where complexity happens. I believe understanding back propagation is hardest part in deep learning. The terms gradient descent and backpropagation are not same. Backpropagation is an algorithm for computing the gradient. The algorithm representation of this concept as follows
- Initialize random weights
- Loop Until convergence
- Compute gradient : differentiate our loss function, J(θ), with respect to the weights of our model (θ)
- Update weights and biases : ?C/?w and ?C/?b (partial derivative)
- Return weights
Following is the graph visualization
Algorithm allows to take steps down the loss hyperplane, until model converges to some optimal parameters.
Lets quickly jump into writing some code in tensorflow2.0 to build a simple model.
Building Models in Tensorflow
To understand syntax lets build very simple neural network model for calculating some of 2 input values.
Step 1 : Imports
import tensorflow as tf import numpy as np from tensorflow import keras
Step 2 : Creating train and test data
# train_data = np.array([[1.0,1.0]]) # two inputs 1.0,1.0 train_targets = np.array([2.0]) # result 1.0 +1.0 = 2.0 for i in range(3,10000,2): train_data= np.append(train_data,[[i,i]],axis=0) train_targets= np.append(train_targets,[i+i]) #Similarly create test data test_data = np.array([[2.0,2.0]]) test_targets = np.array([4.0]) for i in range(4,8000,4): test_data = np.append(test_data,[[i,i]],axis=0) test_targets = np.append(test_targets,[i+i] )
Step 3. Creating Model
model = keras.Sequential([ keras.layers.Flatten(input_shape=(2,)), keras.layers.Dense(5,activation='relu'), keras.layers.Dense(1,activation='relu'), keras.layers.Dense(1) ]) model.compile(optimizer='adam', loss='mse', metrics=['mae']) model.fit(train_data, train_targets, epochs=5, batch_size=1)
The first layer is input layer where expected shape of input array (2,). Then we added a dense layer of 5 neurons with activation function as 'relu', then we added another dense layer with single neuron and finally a single neuron for output( no activation function)
The model we built looks like
After training, predict with some sample numbers
# Input 10,2, Output should be 12 model.predict([[10,2]]) Result : array([[21.646023]], dtype=float32)
We can see the result as 21.64 which is certainly not the right answer(12).
Lets increase the number of neurons and train the model again.
model = keras.Sequential([ keras.layers.Flatten(input_shape=(2,)), keras.layers.Dense(50,activation='relu'), keras.layers.Dense(20,activation='relu'), keras.layers.Dense(1) ]) model.compile(optimizer='adam', loss='mse', metrics=['mae']) model.fit(train_data, train_targets, epochs=5, batch_size=1)
So we have added 50 and 20 neurons in 2nd and 3rd layers
After training, lets predict with same input
# Input 10,2, Output should be 12 model.predict([[10,2]]) Result : array([[12.811337]], dtype=float32)
We have result 12.88, so the model performing far better than before.
You can play with number of neurons and changing the activation functions and see the model performance.
Step 4 : Saving the trained model
You can the save the trained model, and reload the model without any python code.
model.save('first_test_model.h5') # Loading model from keras.models import load_model savedModel = load_model("first_test_model.h5") savedModel.predict([10,2])
In the next part, I will show how a tensorflow model looks like in Neo4j and Linkurious.
Thanks for reading !!