Neural Learning with Tensorflow2.0 Part-2 (Overview of Gradient Descent and building simple model with Tensorflow)

Neural Learning with Tensorflow2.0 Part-2 (Overview of Gradient Descent and building simple model with Tensorflow)

In Part1 we have seen basics of Neural networks, how perceptron model and multi-layer perceptron model can be represented and finally use of some activation functions.

In this post I would like to explain Cost Function, Gradient Descent and Back Propagation with an example by building a model with tensorflow.

Let's say we have built a neural network for a classification problem and predicted an output. But the questions we might come across are

how do we evaluate the output ? & How do we updated the weights and biases ?

The answer for first question is by having a Cost Function.

What we do is we will take the estimated/predicted output and compare it to the real values of the label. Cost Function which also called as Loss Function outputs a single value often the average. We can keep track of this value to monitor model performance.

For example let 'y' be the true value and 'a' represent the model prediction. One very common cost function is quadratic cost function represented as

No alt text provided for this image

If there is some x input y(x) will be the true value and we subtract our predicted value (here L represent last layer meaning : activation function output of Lth layer). To have absolute measurement of error we are squaring the difference. Remember the activation a^L has all the weights and biases. So we need to choose certain weights to minimize the cost function C(W).

No alt text provided for this image

So we start with random value of weight and compute the cost function, then update the weights until we reach point 'B' which is minimum. If we choose slower steps to move it will increases the model training time. If we choose larger steps we may overshoot the minimum. So we need to be careful while choosing this step size. This step size is known as Learning Rate. We can be clever while choosing the step size. We could start of with larger step size and then go smaller as we realize slope gets closer to zero. This is known as Adaptive Gradient Descent. 'ADAM' a method for stochastic optimization is much more efficient way of searching for these minimums.

For classification problems instead of using Quadratic Cost Function we use Cross Entropy Loss Function. Basically we assume that model predicts a probability distribution p(y=i) for each class i=1,2,3..., C. For binary classification this results in -(ylog(p)+(1-y)log(1-p)), to compare how close model predictions are with true value.

Coming to our second question on How do we update weights and biases ? we already discussed that we can achieve this with gradient descent. But in a complex network of layers, there are different number of neurons in each layer. Depending on the cost estimation of last layer, we will back propagate and update the weights of neurons in all the layers. This is where complexity happens. I believe understanding back propagation is hardest part in deep learning. The terms gradient descent and backpropagation are not same. Backpropagation is an algorithm for computing the gradient. The algorithm representation of this concept as follows

  1. Initialize random weights
  2. Loop Until convergence
  3. Compute gradient : differentiate our loss function, J(θ), with respect to the weights of our model (θ)
  4. Update weights and biases : ?C/?w and ?C/?b (partial derivative)
  5. Return weights 

Following is the graph visualization

No alt text provided for this image

Algorithm allows to take steps down the loss hyperplane, until model converges to some optimal parameters.


Lets quickly jump into writing some code in tensorflow2.0 to build a simple model.

Building Models in Tensorflow

To understand syntax lets build very simple neural network model for calculating some of 2 input values.

Step 1 : Imports

import tensorflow as tf
import numpy as np
from tensorflow import keras

Step 2 : Creating train and test data

#

train_data = np.array([[1.0,1.0]])  # two inputs 1.0,1.0
train_targets = np.array([2.0])     # result 1.0 +1.0 = 2.0


for i in range(3,10000,2):
    train_data= np.append(train_data,[[i,i]],axis=0)
    train_targets= np.append(train_targets,[i+i])

#Similarly create test data

test_data = np.array([[2.0,2.0]])
test_targets = np.array([4.0])


for i in range(4,8000,4):

    test_data = np.append(test_data,[[i,i]],axis=0)
    test_targets = np.append(test_targets,[i+i] )
    


Step 3. Creating Model

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(2,)),
    keras.layers.Dense(5,activation='relu'),
    keras.layers.Dense(1,activation='relu'),
    keras.layers.Dense(1)
])

model.compile(optimizer='adam', 
              loss='mse',
              metrics=['mae'])


model.fit(train_data, train_targets, epochs=5, batch_size=1)

The first layer is input layer where expected shape of input array (2,). Then we added a dense layer of 5 neurons with activation function as 'relu', then we added another dense layer with single neuron and finally a single neuron for output( no activation function)

The model we built looks like

No alt text provided for this image

After training, predict with some sample numbers

# Input 10,2, Output should be 12

model.predict([[10,2]])

Result : array([[21.646023]], dtype=float32)

We can see the result as 21.64 which is certainly not the right answer(12).

Lets increase the number of neurons and train the model again.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(2,)),
    keras.layers.Dense(50,activation='relu'),
    keras.layers.Dense(20,activation='relu'),
    keras.layers.Dense(1)
])


model.compile(optimizer='adam', 
              loss='mse',
              metrics=['mae'])


model.fit(train_data, train_targets, epochs=5, batch_size=1)

So we have added 50 and 20 neurons in 2nd and 3rd layers

After training, lets predict with same input

# Input 10,2, Output should be 12

model.predict([[10,2]])



Result : array([[12.811337]], dtype=float32)

We have result 12.88, so the model performing far better than before.

You can play with number of neurons and changing the activation functions and see the model performance.

Step 4 : Saving the trained model

You can the save the trained model, and reload the model without any python code.

model.save('first_test_model.h5')

# Loading model

from keras.models import load_model

savedModel = load_model("first_test_model.h5")

savedModel.predict([10,2])


In the next part, I will show how a tensorflow model looks like in Neo4j and Linkurious.

Thanks for reading !!

要查看或添加评论,请登录

pradeep ponduri的更多文章

  • Optimize your Spark Jobs

    Optimize your Spark Jobs

    As the volume of data increases, we always find bottlenecks dealing with it. Although spark has its own catalyst to…

  • Big Data Storage Formats

    Big Data Storage Formats

    An important task of any platform that processes big data is to decide on the type of format to store data. Hadoop has…

  • Concurrent Read Write Capability

    Concurrent Read Write Capability

    In the previous post, we have seen how transaction logs keep track of commits in delta lake. Now let’s talk about…

  • Data skipping and zorder in delta

    Data skipping and zorder in delta

    In this post, we take a look at how delta under the hood is capable of sifting through petabytes of data within…

  • Transaction Logs in Delta Lake

    Transaction Logs in Delta Lake

    Understanding the transaction log in Delta Lake is key in understanding the concept of the delta. This log is…

    3 条评论
  • Data Lifecycle to Delta Lake Lifecycle

    Data Lifecycle to Delta Lake Lifecycle

    We’re always told to ‘Go for the Gold!’ but how do we get that? This article is about how data can be moved in stages…

  • Delta Lake To Prevent Data Corruption

    Delta Lake To Prevent Data Corruption

    Delta lake or simply Delta is my go-to big data storage format these days. Storage formats are continuously evolving…

  • Static models in a rapidly changing dynamic world

    Static models in a rapidly changing dynamic world

    We always develop a machine learning solution to solve real-life problems. The data that we use to train the models is…

  • Blockchain - As I See It

    Blockchain - As I See It

    Block chain is a technology that enables moving digital coins or assets from one place/individual to other. The terms…

    1 条评论
  • Neural Learning with Tensorflow2.0 Part-3 ( Tensorflow Model Graph in Neo4j and Linkurious)

    Neural Learning with Tensorflow2.0 Part-3 ( Tensorflow Model Graph in Neo4j and Linkurious)

    In Part-2 of Neural Learning, we built a simple model for computing sum of two numbers. In this part we will using…

社区洞察

其他会员也浏览了