登录查看更多内容

Baby steps with TensorFlow

Alexandre Patry

Senior Applied Science Manager in Demand Utilization for Sponsored Products

发布日期: 2016年12月20日

Learning TensorFlow has been on my TODO list for a long time and I finally got to it. As Bill Murray wisely pointed out, All I have to do is to take one little step at a time and I can do anything:

So let’s start small and do three things:

Create a simple logistic regression model to learn the logical OR function.
Once it works, make sure it fails miserably at modeling the non-linearly separable XOR function.
Save the day with a neural network.

Learning OR using logistic regression

The OR function is really simple. It takes as input two bits and return 1 if any of them is 1. Visually, it looks like this:

Our goal is to learn a function that will assign a value near 0 to the blue dot and values near 1 to the red dots. Here is small script training a logistic regression model doing so:

import numpy as np
import tensorflow as tf

# 1. Let's prepare our dataset. The first two colums are the 
# ?input bits and the last column is the label

dataset = np.array(
    [0, 0, 0,
     0, 1, 1,
     1, 0, 1,
     1, 1, 1]).reshape(4, 3)

instances = dataset[:, 0:2]
labels = dataset[:, 2:3]


# 2. Now that we have a dataset, let's define our model
#
# We will use a standard logistic regression model
?
x = tf.placeholder(tf.float32, shape=[None, 2]) # placeholder for 2 input bits
w = tf.Variable(tf.random_normal([2, 1]))       # weights of each bit  
b = tf.Variable(tf.random_normal([1, 1]))       # fix bias 
y = tf.sigmoid(tf.matmul(x, w) + b)             # logistic regression formula


# 3. Now what we have a model, let's optimize it using gradient
# descent on the Mean Square Error.

reference = tf.placeholder(tf.float32, shape=[None, 1]) # placeholder for our reference labels
mse = tf.reduce_mean(tf.square(y - reference))          # Mean Square Error

trainer = tf.train.GradientDescentOptimizer(0.05).minimize(mse)


# 4. We are now ready to train.

init = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(init)
    
    # train for 10K iterations
    for i in range(10000):
        error, _ = session.run([mse, trainer], feed_dict={x: instances, reference: labels})
        print("Iteration {}, MSE {:.5f}".format(i, error))

    w_value, b_value, y_value = 
        session.run([w, b, y], feed_dict={x: instances})

    print("Final model w={}, b={}".format(w_value, b_value))
    print("Predictions on the training set:\n {}".format(y_value))

Let's break down the different parts of the script:

Dataset Creation TensorFlow plays very nicely with NumPy matrices. We created a feature matrix (instances) where each row contains an instance and a label matrix (labels) where each row contains a label.
Model Creation Our model is defined using three kinds of nodes: placeholders for observed variables, Variables for variables that are learned and computation nodes to create the model itself.
Learning Objective Definition The learning objective defines an error that we want to optimize on. In this small script, we added one more placeholder for the reference labels and a new computation label to compute the MSE (Mean Square Error). Optimization is done using gradient descent with a learning rate of 0.05.
Learning Routine This is where we iterate over our training dataset to learn our model. In our example, we train to 10,000 epochs and then stop.

Running this script will produce an output similar to:

Iteration 1, MSE 0.46215
Iteration 2, MSE 0.45954
Iteration 3, MSE 0.45691
[snip]
Iteration 9998, MSE 0.00613
Iteration 9999, MSE 0.00613
Final model w=[[ 4.55628872][ 4.55603266]], b=[[-2.00899649]]
Predictions on the training set: [[ 0.11826158][ 0.92737412][ 0.92739147][ 0.99917823]]

Looks like success! We reduced our MSE over iterations, the model predicts values near 0 for (0, 0) and values near one for (1, 0), (0, 1) and (1, 1).

Here is what the model predictions look like as we iterate during training (code):

We can also run it on the AND function and get the expected results:

Neural networks to the rescue

Logistic regression is all fun and game until we try to learn a non linearly separable function like XOR:

Since no lines can separate our labels, the model cannot learn anything and assigns 0.5 to all instances.

To deal with such problems, we include an hidden layer with a non-linear activation function by replacing section 2 in our code by:

x = tf.placeholder(tf.float32, shape=[None, 2])
b = tf.Variable(tf.random_normal([1, 4]))
w1 = tf.Variable(tf.random_normal([2, 4]))
h = tf.tanh(tf.matmul(x, w1) + b)
w2 = tf.Variable(tf.random_normal([4, 1]))
y = tf.sigmoid(tf.matmul(h, w2))

where h is the hidden layer with 4 neurons using the tanh activation function.

Results are much more satisfying:

Our model is now powerful enough to handle the non-linearity.

Conclusion

Coding these toy examples was a real pleasure. TensorFlow is expressive, compact and leaves us in control of the training session.

I am still searching my style though. I tried refactoring the code in reusable pieces and the result was less than satisfying. The abstraction cost was always higher than simple copy/pasting. I am also uneasy about the coupling of variables definition and initialization like in this example:

w = tf.Variable(tf.random_normal([2, 1]))

where we specify both the shape of the variable and the use of a normal distribution to initialize the training session.

That being said, the experience was very enjoyable. It was refreshing to get so much done in so few lines of code.

Milad Alem

PhD in AI, Principal Software Engineer

8 年

On my TODO list as well. :)

1 次回应

要查看或添加评论，请登录

Alexandre Patry的更多文章

My Partial Review of Digital Minimalism by Cal Newport

2019年7月8日

My Partial Review of Digital Minimalism by Cal Newport

One of my recent struggles is spending meaningful leisure time. I often end up browsing aimlessly and feeling…

6 条评论
Generating Date Ranges in Scala

2019年5月15日

Generating Date Ranges in Scala

If you ever need to generate date ranges in Scala, here is an easy way to get it done: import java.time.

4 条评论
Configure emacs using org mode and git

2016年11月28日

Configure emacs using org mode and git

I spend most of my programming time in Intellij IDEA (java and scala) or emacs (everything else ranging from org mode…

7 条评论
Counting the TODOs I've introduced in my code base

2016年10月19日

Counting the TODOs I've introduced in my code base

I was curious about the number of TODO items that I introduced in a code base. I thus decided to write a small script…

7 条评论

Baby steps with TensorFlow

Alexandre Patry

Senior Applied Science Manager in Demand Utilization for Sponsored Products

Learning OR using logistic regression

Neural networks to the rescue

Conclusion

Alexandre Patry的更多文章

社区洞察

其他会员也浏览了

Getting started with Responsible Machine Learning - Recap

Implementing AdaGrad Optimizer in Spark

TensorFlow.js Monthly #8: Exponential growth, JAX to JS conversion, new videos to watch

Kaggle “Dogs vs. Cats” Challenge?—?Complete Step by Step Guide?—?Part 1

?? Machine Learning Needs Fundamental Math—Here’s Why! ??

Theory behind Machine Learning: Deep learning repository from different Blogs, YouTube channels, Web resources, GitHub source codes, Colab files

The Dawn of a New Era: OpenAI’s o3 Model Surpasses the Best of Us

Unveiling the Enigma: An Introduction to the Mathematics of Machine Learning

PRACTICAL LOSS FUNCTIONS EASY, NO FORMULAS

XGBoost — The Undisputed GOAT!

Learning OR using logistic regression

Neural networks to the rescue

Conclusion

Alexandre Patry的更多文章

My Partial Review of Digital Minimalism by Cal Newport

Generating Date Ranges in Scala

Configure emacs using org mode and git

Counting the TODOs I've introduced in my code base

社区洞察

其他会员也浏览了

Getting started with Responsible Machine Learning - Recap

Implementing AdaGrad Optimizer in Spark

TensorFlow.js Monthly #8: Exponential growth, JAX to JS conversion, new videos to watch

Kaggle “Dogs vs. Cats” Challenge?—?Complete Step by Step Guide?—?Part 1

?? Machine Learning Needs Fundamental Math—Here’s Why! ??

Theory behind Machine Learning: Deep learning repository from different Blogs, YouTube channels, Web resources, GitHub source codes, Colab files

The Dawn of a New Era: OpenAI’s o3 Model Surpasses the Best of Us

Unveiling the Enigma: An Introduction to the Mathematics of Machine Learning

PRACTICAL LOSS FUNCTIONS EASY, NO FORMULAS

XGBoost — The Undisputed GOAT!