A Neural Network In Under 4KB

Introduction

Years ago, I began writing C++ template libraries with the goal of instantiating and running neural networks and machine learning algorithms within embedded systems. I did not have an FPU (floating point unit), GPU (graphics processing unit), or any vectorized instructions(e.g. SIMD) at my disposal so I needed to ensure I could run these algorithms with only a very simple CPU. Memory to store code and data was also highly constrained in my environment. I didn’t have room to store code and data that I wasn’t going to use.

My inspiration for these libraries was Andrei Alexandrescu’s?Modern C++ Design. This book was my first exposure to the power of C++ templates and?template metaprogramming. In the book he describes how to design your code using policy classes as template parameters to customize behavior. This fit my needs perfectly since I wanted the code to be scalable from one problem to the next (e.g. resolution, activation function(s), etc.) I also used this technique to make the code within these libraries as small and efficient as possible.

To demonstrate what?my neural network library?can do, I chose to implement a classic “hello world” kind of program for neural networks (xor.cpp). Here I will demonstrate how to instantiate and train a neural network to predict the outcome of the mathematical XOR function. We will generate and feed known data into the neural network and train it to become an XOR predictor. By inspecting the generated output file during compilation, we can prove to ourselves that the whole neural network and associated code and data occupies a bit under 4KB in the final image, assuming we turn on compiler optimizations.

Q Format

To make the neural network super small and fast, we can’t afford the luxury of floating point. Plus, my hardware doesn’t have a floating point unit, which makes using floating point even more difficult. We need to use fixed-point numbers to solve this problem. Tinymind contains a C++ template library to help us here,?Q Format. See?this article for a full explanation of how it works. We declare the size and resolution of our Q-Format type like this:

// Q-Format value type
static constexpr size_t NUMBER_OF_FIXED_BITS = 8;
static constexpr size_t NUMBER_OF_FRACTIONAL_BITS = 8;
typedef tinymind::QValue<NUMBER_OF_FIXED_BITS, NUMBER_OF_FRACTIONAL_BITS, true> ValueType;        

We have now declared a signed fixed-point type which can represent the values -128 to 127.99609375. Our resolution is 2^(-8) or 0.00390625 since we are using 8 bits as the fractional part of our q-format number. You’ll see from this example that we’ll have plenty enough dynamic range and resolution to solve this problem without floating point. The fixed and fractional portion of the value are concatenated into a 16-bit field. As an example. the floating point number 1.5 will be represented at 0x180 in the q-format chosen above. This is because the integer portion (1) is shifted up by the number of fractional bits (8), resulting in 0x100. We the OR in the number which represents half of the dynamic range for 8 bits (128 or 0x80 in hex).

Neural Network

We need to specify our neural network architecture. We do it as follows:

// Neural network architecture
static constexpr size_t NUMBER_OF_INPUTS = 2;
static constexpr size_t NUMBER_OF_HIDDEN_LAYERS = 1;
static constexpr size_t NUMBER_OF_NEURONS_PER_HIDDEN_LAYER = 3;
static constexpr size_t NUMBER_OF_OUTPUTS = 1;        

We want a neural network with 2 input neurons, 1 hidden layer with 3 neurons in it, and 1 output neuron. The 2 input neurons will receive the streams of 1s and 0s from the training data. The output neuron outputs the predicted result of the operation. The hidden layer is the transfer function which learns to predict the result, given the input values and the known result.

A graphical view of the neural network is provided here:

No alt text provided for this image

We typedef the neural network transfer functions policy as well as the neural network itself:

// Typedef of transfer functions for the fixed-point neural network
typedef tinymind::FixedPointTransferFunctions<  ValueType,
RandomNumberGenerator,
tinymind::TanhActivationPolicy<ValueType>,
tinymind::TanhActivationPolicy<ValueType>> TransferFunctionsType;
// typedef the neural network itself
typedef tinymind::MultilayerPerceptron< ValueType,
NUMBER_OF_INPUTS,
NUMBER_OF_HIDDEN_LAYERS,
NUMBER_OF_NEURONS_PER_HIDDEN_LAYER,
NUMBER_OF_OUTPUTS,
TransferFunctionsType> NeuralNetworkType;        

The TransferFunctionsType specifies the ValueType (our Q-format type), the random number generator for the neural network, the activation policy for Input->Hidden layers, as well as the activation policy for the Hidden->Output layer. The neural network code within?tinymind?makes no assumptions about its environment. It needs to be provided with template parameters which encapsulate policy.

Lookup Tables

Since I didn’t have floating point, I also did not have the luxury of the built in math functions for tanh, sigmoid, etc. To get around this limitation,?tinymind?generates and utilizes lookup tables (LUTs) for these activation functions (activationTableGenerator.cpp). This generator generates header files for every supported activation function and Q-Format resolution. Because we don’t want to generate code we’re not going to use, we need to define preprocessor symbols to compile in the LUT(s) we need. If you look at the compile command you’ll see the following:

-DTINYMIND_USE_TANH_8_8=1        

This instructs the compiler to build the tanh activation LUT for the Q-Format type Q8.8, which is what we’re using in this example. We need to do this for every LUT we plan to use.

Generating Training Data

We’ll generate the training data programmatically. The function generateXorTrainingValue is called to generate a single training sample. We use the built-in random number generator to generate the 2 inputs. We generate a random number and then AND it with 0x1. If the random number was even, the result will be 0, if the random number was odd, the result will be a 1.

static void generateXorTrainingValue(ValueType& x, ValueType& y, ValueType& z)
{
const int randX = rand() & 0x1;
const int randY = rand() & 0x1;
const int result = (randX ^ randY);
x = ValueType(randX, 0);
y = ValueType(randY, 0);
z = ValueType(result, 0);
}        

We feed the data into the neural network, predict an output, and then train if we have an error outside of our zero-tolerance.:

for (auto i = 0; i < TRAINING_ITERATIONS; ++i)
{
generateXorTrainingValue(values[0], values[1], output[0]);
testNeuralNet.feedForward(&values[0]);
error = testNeuralNet.calculateError(&output[0]);
if (!NeuralNetworkType::NeuralNetworkTransferFunctionsPolicy::isWithinZeroTolerance(error))
{
testNeuralNet.trainNetwork(&output[0]);
}
testNeuralNet.getLearnedValues(&learnedValues[0]);        

We’ll train the neural network on every iteration where the prediction error is not within our zero tolerance. The error of the neural network’s predicted value(s) will virtually never be zero. We need a way to decide when we’re close enough. If we’re close enough to zero error then it’s best to leave the weights alone. The zero-tolerance policy is specified when declaring the transfer functions policy for the neural network. In this example, we’re just using default values for these policies:

template<
typename ValueType,
class TransferFunctionRandomNumberGeneratorPolicy,
class TransferFunctionHiddenNeuronActivationPolicy,
class TransferFunctionOutputNeuronActivationPolicy,
unsigned NumberOfOutputNeurons = 1,
class TransferFunctionNetworkInitializationPolicy = tinymind::DefaultNetworkInitializer<ValueType>,
class TransferFunctionErrorCalculatorPolicy = tinymind::MeanSquaredErrorCalculator<ValueType, NumberOfOutputNeurons>,
class TransferFunctionZeroTolerancePolicy = tinymind::ZeroToleranceCalculator<ValueType> >
struct FixedPointTransferFunctions
{        

Compiling The Example

Here I will show how I compiled the example on my Linux box. I’ve also compiled the code on Windows using Visual Studio. To compile the example without adding unwanted files to the repo, I simply create a directory to hold the output executable and move it after compile and link.

mkdir -p ~/xor
g++ -O3 -o ~/xor/xor xor.cpp xornet.cpp ../../cpp/lookupTables.cpp -DTINYMIND_ENABLE_OSTREAMS=1 -DTINYMIND_USE_TANH_8_8=1 -I../../cpp -I../../include        

To run the example, just cd to ./output and run the generated executable file.

Plotting Results

After running the example, you will see that you have a text file output, nn_fixed_xor.txt. I have provided a python script to plot the results of neural network training in the neural network unit test (nn_plot.py). To plot the neural network weights, expected output, predicted output, as well as the calculated error.

No alt text provided for this image

You can see the weights training thru time, as well as the error reducing thru time, as the predicted value approximates the expected value. Notice the magnitude of the numbers. Expected values are varying between 0 and 256. This maps to the representation of 0 and 1 for our Q-Format type. The representation of 1 in a signed Q8.8 format (8 bits of fractional resolution, 8 bits of integer representation) is 0x100. This is simple the number 1 left-shifted by the number of bits of fractional resolution. Also notice the weighs changing. It almost looks like floating point but it’s not. We have 2^(-8) or 0.00390625 resolution between values.

To use the plotting script, just pass it the path to your results file from running the example program.

Determining Neural Network Size

To more easily determine how much space the neural network code and data are occupying in this example, I made sure to place all of the neural network code into its own file,?xornet.cpp. To determine how much space code and data took up in this program, we generate and parse an output file during compilation. The command line I used to compile xornet.cpp and move the resulting output file is here:

g++ -c xornet.cpp ../../cpp/lookupTables.cpp -O3 -DTINYMIND_USE_TANH_8_8=1 -I../../cpp && mv xornet.o ./output/. && mv lookupTables.o ./output/.        

Change directories to ./output so that we can evaluate the size of the neural network and LUT. We can use the size utility under Linux to parse the output file from compiling xornet.cpp:

size lookupTables.o
text    data     bss     dec     hex filename
 192       0       0     192      c0 lookupTables.o
size xornet.o
text    data     bss     dec     hex filename
3326       8     388    3722     e8a xornet.o        

You can see that the total space occupied by this neural network, Q-format support, and LUTs was just under 4KB. The neural network itself occupies 388 bytes while the associated code takes up 3326 bytes. The LUT occupies 192 bytes for a grand total of 3914 bytes. The neural network could be made smaller by saving off the weights and instantiating a new copy of the type as untrainable, but that's another article.

Conclusion

Using?tinymind, neural networks can be instantiated using fixed-point numbers so that they take up a very small amount of code and data space. They can be trained to predict the outcome of a basic function in under 2,000 training samples. I hope that this inspires you to clone tinymind and give the unit tests and examples a look. Maybe even generate one of your own.

要查看或添加评论,请登录

Dan McLeran的更多文章

  • Euclid’s GCD Using C++ Templates

    Euclid’s GCD Using C++ Templates

    Introduction One way to find the greatest common divisor of 2 numbers is Euclid’s algorithm. Khan Academy has a great…

    2 条评论
  • DQN Maze Solver

    DQN Maze Solver

    Introduction In my previous post I showed how to build a maze-solving Q-learner using a table-based Q-learning method…

    1 条评论
  • Table-Based Q-Learning In Under 1KB

    Table-Based Q-Learning In Under 1KB

    Introduction Q-learning is an algorithm in which an agent interacts with its environment and collects rewards for…

  • QFormat

    QFormat

    Introduction Q-Format is a binary, fixed-point number format. Tinymind contains a template library which allows us to…

社区洞察

其他会员也浏览了