Baby steps with TensorFlow

Baby steps with TensorFlow

Learning TensorFlow has been on my TODO list for a long time and I finally got to it. As Bill Murray wisely pointed out, All I have to do is to take one little step at a time and I can do anything:

So let’s start small and do three things:

  1. Create a simple logistic regression model to learn the logical OR function.
  2. Once it works, make sure it fails miserably at modeling the non-linearly separable XOR function.
  3. Save the day with a neural network.

Learning OR using logistic regression

The OR function is really simple. It takes as input two bits and return 1 if any of them is 1. Visually, it looks like this:

Our goal is to learn a function that will assign a value near 0 to the blue dot and values near 1 to the red dots. Here is small script training a logistic regression model doing so:

import numpy as np
import tensorflow as tf

# 1. Let's prepare our dataset. The first two colums are the 
# ?input bits and the last column is the label

dataset = np.array(
    [0, 0, 0,
     0, 1, 1,
     1, 0, 1,
     1, 1, 1]).reshape(4, 3)

instances = dataset[:, 0:2]
labels = dataset[:, 2:3]


# 2. Now that we have a dataset, let's define our model
#
# We will use a standard logistic regression model
?
x = tf.placeholder(tf.float32, shape=[None, 2]) # placeholder for 2 input bits
w = tf.Variable(tf.random_normal([2, 1]))       # weights of each bit  
b = tf.Variable(tf.random_normal([1, 1]))       # fix bias 
y = tf.sigmoid(tf.matmul(x, w) + b)             # logistic regression formula


# 3. Now what we have a model, let's optimize it using gradient
# descent on the Mean Square Error.

reference = tf.placeholder(tf.float32, shape=[None, 1]) # placeholder for our reference labels
mse = tf.reduce_mean(tf.square(y - reference))          # Mean Square Error

trainer = tf.train.GradientDescentOptimizer(0.05).minimize(mse)


# 4. We are now ready to train.

init = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(init)
    
    # train for 10K iterations
    for i in range(10000):
        error, _ = session.run([mse, trainer], feed_dict={x: instances, reference: labels})
        print("Iteration {}, MSE {:.5f}".format(i, error))

    w_value, b_value, y_value = 
        session.run([w, b, y], feed_dict={x: instances})

    print("Final model w={}, b={}".format(w_value, b_value))
    print("Predictions on the training set:\n {}".format(y_value))

Let's break down the different parts of the script:

  1. Dataset Creation TensorFlow plays very nicely with NumPy matrices. We created a feature matrix (instances) where each row contains an instance and a label matrix (labels) where each row contains a label.
  2. Model Creation Our model is defined using three kinds of nodes: placeholders for observed variables, Variables for variables that are learned and computation nodes to create the model itself.
  3. Learning Objective Definition The learning objective defines an error that we want to optimize on. In this small script, we added one more placeholder for the reference labels and a new computation label to compute the MSE (Mean Square Error). Optimization is done using gradient descent with a learning rate of 0.05.
  4. Learning Routine This is where we iterate over our training dataset to learn our model. In our example, we train to 10,000 epochs and then stop.

Running this script will produce an output similar to:

Iteration 1, MSE 0.46215
Iteration 2, MSE 0.45954
Iteration 3, MSE 0.45691
[snip]
Iteration 9998, MSE 0.00613
Iteration 9999, MSE 0.00613
Final model w=[[ 4.55628872][ 4.55603266]], b=[[-2.00899649]]
Predictions on the training set: [[ 0.11826158][ 0.92737412][ 0.92739147][ 0.99917823]]

Looks like success! We reduced our MSE over iterations, the model predicts values near 0 for (0, 0) and values near one for (1, 0), (0, 1) and (1, 1).

Here is what the model predictions look like as we iterate during training (code):

We can also run it on the AND function and get the expected results:

Neural networks to the rescue

Logistic regression is all fun and game until we try to learn a non linearly separable function like XOR:

Since no lines can separate our labels, the model cannot learn anything and assigns 0.5 to all instances.

To deal with such problems, we include an hidden layer with a non-linear activation function by replacing section 2 in our code by:

x = tf.placeholder(tf.float32, shape=[None, 2])
b = tf.Variable(tf.random_normal([1, 4]))
w1 = tf.Variable(tf.random_normal([2, 4]))
h = tf.tanh(tf.matmul(x, w1) + b)
w2 = tf.Variable(tf.random_normal([4, 1]))
y = tf.sigmoid(tf.matmul(h, w2)) 

where h is the hidden layer with 4 neurons using the tanh activation function.

Results are much more satisfying:

Our model is now powerful enough to handle the non-linearity.

Conclusion

Coding these toy examples was a real pleasure. TensorFlow is expressive, compact and leaves us in control of the training session.

I am still searching my style though. I tried refactoring the code in reusable pieces and the result was less than satisfying. The abstraction cost was always higher than simple copy/pasting. I am also uneasy about the coupling of variables definition and initialization like in this example:

w = tf.Variable(tf.random_normal([2, 1]))

where we specify both the shape of the variable and the use of a normal distribution to initialize the training session.

That being said, the experience was very enjoyable. It was refreshing to get so much done in so few lines of code.

Milad Alem

PhD in AI, Principal Software Engineer

8 年

On my TODO list as well. :)

要查看或添加评论,请登录

Alexandre Patry的更多文章

  • My Partial Review of Digital Minimalism by Cal Newport

    My Partial Review of Digital Minimalism by Cal Newport

    One of my recent struggles is spending meaningful leisure time. I often end up browsing aimlessly and feeling…

    6 条评论
  • Generating Date Ranges in Scala

    Generating Date Ranges in Scala

    If you ever need to generate date ranges in Scala, here is an easy way to get it done: import java.time.

    4 条评论
  • Configure emacs using org mode and git

    Configure emacs using org mode and git

    I spend most of my programming time in Intellij IDEA (java and scala) or emacs (everything else ranging from org mode…

    7 条评论
  • Counting the TODOs I've introduced in my code base

    Counting the TODOs I've introduced in my code base

    I was curious about the number of TODO items that I introduced in a code base. I thus decided to write a small script…

    7 条评论

社区洞察

其他会员也浏览了