登录查看更多内容

Neural Learning with Tensorflow2.0 Part-2 (Overview of Gradient Descent and building simple model with Tensorflow)

pradeep ponduri

Develops on AWS...... Data Engineer @Amazon

发布日期: 2020年2月3日

In Part1 we have seen basics of Neural networks, how perceptron model and multi-layer perceptron model can be represented and finally use of some activation functions.

In this post I would like to explain Cost Function, Gradient Descent and Back Propagation with an example by building a model with tensorflow.

Let's say we have built a neural network for a classification problem and predicted an output. But the questions we might come across are

how do we evaluate the output ? & How do we updated the weights and biases ?

The answer for first question is by having a Cost Function.

What we do is we will take the estimated/predicted output and compare it to the real values of the label. Cost Function which also called as Loss Function outputs a single value often the average. We can keep track of this value to monitor model performance.

For example let 'y' be the true value and 'a' represent the model prediction. One very common cost function is quadratic cost function represented as

If there is some x input y(x) will be the true value and we subtract our predicted value (here L represent last layer meaning : activation function output of Lth layer). To have absolute measurement of error we are squaring the difference. Remember the activation a^L has all the weights and biases. So we need to choose certain weights to minimize the cost function C(W).

So we start with random value of weight and compute the cost function, then update the weights until we reach point 'B' which is minimum. If we choose slower steps to move it will increases the model training time. If we choose larger steps we may overshoot the minimum. So we need to be careful while choosing this step size. This step size is known as Learning Rate. We can be clever while choosing the step size. We could start of with larger step size and then go smaller as we realize slope gets closer to zero. This is known as Adaptive Gradient Descent. 'ADAM' a method for stochastic optimization is much more efficient way of searching for these minimums.

For classification problems instead of using Quadratic Cost Function we use Cross Entropy Loss Function. Basically we assume that model predicts a probability distribution p(y=i) for each class i=1,2,3..., C. For binary classification this results in -(ylog(p)+(1-y)log(1-p)), to compare how close model predictions are with true value.

Coming to our second question on How do we update weights and biases ? we already discussed that we can achieve this with gradient descent. But in a complex network of layers, there are different number of neurons in each layer. Depending on the cost estimation of last layer, we will back propagate and update the weights of neurons in all the layers. This is where complexity happens. I believe understanding back propagation is hardest part in deep learning. The terms gradient descent and backpropagation are not same. Backpropagation is an algorithm for computing the gradient. The algorithm representation of this concept as follows

Initialize random weights
Loop Until convergence
Compute gradient : differentiate our loss function, J(θ), with respect to the weights of our model (θ)
Update weights and biases : ?C/?w and ?C/?b (partial derivative)
Return weights

Following is the graph visualization

Algorithm allows to take steps down the loss hyperplane, until model converges to some optimal parameters.

Lets quickly jump into writing some code in tensorflow2.0 to build a simple model.

Building Models in Tensorflow

To understand syntax lets build very simple neural network model for calculating some of 2 input values.

Step 1 : Imports

import tensorflow as tf
import numpy as np
from tensorflow import keras

Step 2 : Creating train and test data

#

train_data = np.array([[1.0,1.0]])  # two inputs 1.0,1.0
train_targets = np.array([2.0])     # result 1.0 +1.0 = 2.0


for i in range(3,10000,2):
    train_data= np.append(train_data,[[i,i]],axis=0)
    train_targets= np.append(train_targets,[i+i])

#Similarly create test data

test_data = np.array([[2.0,2.0]])
test_targets = np.array([4.0])


for i in range(4,8000,4):

    test_data = np.append(test_data,[[i,i]],axis=0)
    test_targets = np.append(test_targets,[i+i] )

Step 3. Creating Model

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(2,)),
    keras.layers.Dense(5,activation='relu'),
    keras.layers.Dense(1,activation='relu'),
    keras.layers.Dense(1)
])

model.compile(optimizer='adam', 
              loss='mse',
              metrics=['mae'])


model.fit(train_data, train_targets, epochs=5, batch_size=1)

The first layer is input layer where expected shape of input array (2,). Then we added a dense layer of 5 neurons with activation function as 'relu', then we added another dense layer with single neuron and finally a single neuron for output( no activation function)

The model we built looks like

After training, predict with some sample numbers

# Input 10,2, Output should be 12

model.predict([[10,2]])

Result : array([[21.646023]], dtype=float32)

We can see the result as 21.64 which is certainly not the right answer(12).

Lets increase the number of neurons and train the model again.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(2,)),
    keras.layers.Dense(50,activation='relu'),
    keras.layers.Dense(20,activation='relu'),
    keras.layers.Dense(1)
])


model.compile(optimizer='adam', 
              loss='mse',
              metrics=['mae'])


model.fit(train_data, train_targets, epochs=5, batch_size=1)

So we have added 50 and 20 neurons in 2nd and 3rd layers

After training, lets predict with same input

# Input 10,2, Output should be 12

model.predict([[10,2]])



Result : array([[12.811337]], dtype=float32)

We have result 12.88, so the model performing far better than before.

You can play with number of neurons and changing the activation functions and see the model performance.

Step 4 : Saving the trained model

You can the save the trained model, and reload the model without any python code.

model.save('first_test_model.h5')

# Loading model

from keras.models import load_model

savedModel = load_model("first_test_model.h5")

savedModel.predict([10,2])

In the next part, I will show how a tensorflow model looks like in Neo4j and Linkurious.

Thanks for reading !!

要查看或添加评论，请登录

pradeep ponduri的更多文章

Optimize your Spark Jobs

2022年5月16日

Optimize your Spark Jobs

As the volume of data increases, we always find bottlenecks dealing with it. Although spark has its own catalyst to…
Big Data Storage Formats

2021年8月11日

Big Data Storage Formats

An important task of any platform that processes big data is to decide on the type of format to store data. Hadoop has…
Concurrent Read Write Capability

2021年8月9日

Concurrent Read Write Capability

In the previous post, we have seen how transaction logs keep track of commits in delta lake. Now let’s talk about…
Data skipping and zorder in delta

2021年8月7日

Data skipping and zorder in delta

In this post, we take a look at how delta under the hood is capable of sifting through petabytes of data within…
Transaction Logs in Delta Lake

2021年8月6日

Transaction Logs in Delta Lake

Understanding the transaction log in Delta Lake is key in understanding the concept of the delta. This log is…

3 条评论
Data Lifecycle to Delta Lake Lifecycle

2021年8月5日

Data Lifecycle to Delta Lake Lifecycle

We’re always told to ‘Go for the Gold!’ but how do we get that? This article is about how data can be moved in stages…
Delta Lake To Prevent Data Corruption

2021年8月4日

Delta Lake To Prevent Data Corruption

Delta lake or simply Delta is my go-to big data storage format these days. Storage formats are continuously evolving…
Static models in a rapidly changing dynamic world

2021年8月2日

Static models in a rapidly changing dynamic world

We always develop a machine learning solution to solve real-life problems. The data that we use to train the models is…
Blockchain - As I See It

2021年1月20日

Blockchain - As I See It

Block chain is a technology that enables moving digital coins or assets from one place/individual to other. The terms…

1 条评论
Neural Learning with Tensorflow2.0 Part-3 ( Tensorflow Model Graph in Neo4j and Linkurious)

2020年2月3日

Neural Learning with Tensorflow2.0 Part-3 ( Tensorflow Model Graph in Neo4j and Linkurious)

In Part-2 of Neural Learning, we built a simple model for computing sum of two numbers. In this part we will using…

See all articles

Neural Learning with Tensorflow2.0 Part-2 (Overview of Gradient Descent and building simple model with Tensorflow)

pradeep ponduri

Develops on AWS...... Data Engineer @Amazon

pradeep ponduri的更多文章

社区洞察

其他会员也浏览了

Model Sub-Classing and Custom Training Loop from Scratch in TensorFlow 2

Introduction to Neural Networks, from scratch for practical learning (Part 1)

Convolutional Neural Nets - Part 2 - Max Pooling

Neural Networks: Deciphering the Puzzle of Patterns

Damaged Road Detection: Mask R-CNN on Supervisely

Advancing Object Detection: Unveiling the Evolution of R-CNN

Solving a Puzzle - Machine Learning Algorithm Selection

Neural Networks-Deep Learning

What is special about the gradient descent used in GBM

pradeep ponduri的更多文章

Optimize your Spark Jobs

Big Data Storage Formats

Concurrent Read Write Capability

Data skipping and zorder in delta

Transaction Logs in Delta Lake

Data Lifecycle to Delta Lake Lifecycle

Delta Lake To Prevent Data Corruption

Static models in a rapidly changing dynamic world

Blockchain - As I See It

Neural Learning with Tensorflow2.0 Part-3 ( Tensorflow Model Graph in Neo4j and Linkurious)

社区洞察

其他会员也浏览了

Model Sub-Classing and Custom Training Loop from Scratch in TensorFlow 2

Introduction to Neural Networks, from scratch for practical learning (Part 1)

Convolutional Neural Nets - Part 2 - Max Pooling

Neural Networks: Deciphering the Puzzle of Patterns

Damaged Road Detection: Mask R-CNN on Supervisely

Advancing Object Detection: Unveiling the Evolution of R-CNN

Solving a Puzzle - Machine Learning Algorithm Selection

Neural Networks-Deep Learning

What is special about the gradient descent used in GBM