NanoNeuron — 7 simple JS functions that explain how machines learn

NanoNeuron — 7 simple JS functions that explain how machines learn

7 simple JavaScript functions that will give you a feeling of how machines can actually "learn".

TL;DR

NanoNeuron is over-simplified version of a Neuron concept from the Neural Networks. NanoNeuron is trained to convert a temperature values from Celsius to Fahrenheit.

NanoNeuron.js code example contains 7 simple JavaScript functions (model prediction, cost calculation, forward and backwards propagation, training) that will give you a feeling of how machines can actually "learn". No 3rd-party libraries, no external data-sets and dependencies, only pure and simple JavaScript functions.

???These functions by any means are NOT a complete guide to machine learning. A lot of machine learning concepts are skipped and over-simplified there! This simplification is done in purpose to give the reader a really basic understanding and feeling of how machines can learn and ultimately to make it possible for the reader to call it not a "machine learning MAGIC" but rather "machine learning MATH" ??.

What NanoNeuron will learn

You've probably heard about Neurons in the context of Neural Networks. NanoNeuron that we're going to implement below is kind of it but much simpler. For simplicity reasons we're not even going to build a network on NanoNeurons. We will have it all by itself, alone, doing some magic predictions for us. Namely we will teach this one simple NanoNeuron to convert (predict) the temperature from Celsius to Fahrenheit.

By the way the formula for converting Celsius to Fahrenheit is this:

No alt text provided for this image

But for now our NanoNeuron doesn't know about it...

NanoNeuron model

Let's implement our NanoNeuron model function. It implements basic linear dependency between x and y which looks like y = w * x + b. Simply saying our NanoNeuron is a "kid" that can draw the straight line in XY coordinates.

Variables w, b are parameters of the model. NanoNeuron knows only about these two parameters of linear function.

These parameters are something that NanoNeuron is going to "learn" during the training process.

The only thing that NanoNeuron can do is to imitate linear dependency. In its predict() method it accepts some input x and predicts the output y. No magic here.

function NanoNeuron(w, b) {
  this.w = w;
  this.b = b;
  this.predict = (x) => {
    return x * this.w + this.b;
  }
}

(...wait... linear regression is it you?) ??

Celsius to Fahrenheit conversion

The temperature value in Celsius can be converted to Fahrenheit using the following formula: f = 1.8 * c + 32, where c is a temperature in Celsius and f is calculated temperature in Fahrenheit.

function celsiusToFahrenheit(c) {
  const w = 1.8;
  const b = 32;
  const f = c * w + b;
  return f;
};

Ultimately we want to teach our NanoNeuron to imitate this function (to learn that w = 1.8 and b = 32) without knowing these parameters in advance.

This is how the Celsius to Fahrenheit conversion function looks like:

No alt text provided for this image


Generating data-sets

Before the training we need to generate training and test data-sets based on celsiusToFahrenheit() function. Data-sets consist of pairs of input values and correctly labeled output values.

In real life in most of the cases this data would be rather collected than generated. For example we might have a set of images of hand-drawn numbers and corresponding set of numbers that explain what number is written on each picture.

We will use TRAINING examples data to train our NanoNeuron. Before our NanoNeuron will grow and will be able to make decisions by its own we need to teach it what is right and what is wrong using training examples.

We will use TEST examples to evaluate how well our NanoNeuron performs on the data that it didn't see during the training. This is the point where we could see that our "kid" has grown and can make decisions on its own.

function generateDataSets() {
  // xTrain -> [0, 1, 2, ...],
  // yTrain -> [32, 33.8, 35.6, ...]
  const xTrain = [];
  const yTrain = [];
  for (let x = 0; x < 100; x += 1) {
    const y = celsiusToFahrenheit(x);
    xTrain.push(x);
    yTrain.push(y);
  }

  // xTest -> [0.5, 1.5, 2.5, ...]
  // yTest -> [32.9, 34.7, 36.5, ...]
  const xTest = [];
  const yTest = [];
  for (let x = 0.5; x < 100; x += 1) {
    const y = celsiusToFahrenheit(x);
    xTest.push(x);
    yTest.push(y);
  }

  return [xTrain, yTrain, xTest, yTest];
}

The cost (the error) of prediction

We need to have some metric that will show how close our model's prediction to correct values. The calculation of the cost (the mistake) between the correct output value of y and prediction that NanoNeuron made will be made using the following formula:

No alt text provided for this image

This is a simple difference between two values. The closer the values to each other the smaller the difference. We're using power of 2 here just to get rid of negative numbers so that (1 - 2) ^ 2 would be the same as (2 - 1) ^ 2. Division by 2 is happening just to simplify further backward propagation formula (see below).

The cost function in this case will be as simple as:

function predictionCost(y, prediction) {
  return (y - prediction) ** 2 / 2; // i.e. -> 235.6
}

Forward propagation

To do forward propagation means to do a prediction for all training examples from xTrain and yTrain data-sets and to calculate the average cost of those prediction along the way.

We just let our NanoNeuron say its opinion at this point, just ask him to guess how to convert the temperature. It might be stupidly wrong here. The average cost will show how wrong our model is right now. This cost value is really valuable since by changing the NanoNeuron parameters w and b and by doing the forward propagation again we will be able to evaluate if NanoNeuron became smarter or not after parameters changes.

The average cost will be calculated using the following formula:

No alt text provided for this image

Where m is a number of training examples (in our case is 100).

Here is how we may implement it in code:

function forwardPropagation(model, xTrain, yTrain) {
  const m = xTrain.length;
  const predictions = [];
  let cost = 0;
  for (let i = 0; i < m; i += 1) {
    const prediction = nanoNeuron.predict(xTrain[i]);
    cost += predictionCost(yTrain[i], prediction);
    predictions.push(prediction);
  }
  cost /= m;
  return [predictions, cost];
}

Backward propagation

Now when we know how right or wrong our NanoNeuron's predictions are (based on average cost at this point) what should we do to make predictions more precise?

The backward propagation is the answer to this question. Backward propagation is the process of evaluating the cost of prediction and adjusting the NanoNeuron's parameters w and b so that next predictions would be more precise.

This is the place where machine learning looks like a magic ???♂?. The key concept here is derivative which show what step to take to get closer to the cost function minimum.

Remember, finding the minimum of a cost function is the ultimate goal of training process. If we will find such values of w and b that our average cost function will be small it would mean that NanoNeuron model does really good and precise predictions.

Derivatives are big separate topic that we will not cover in this article. MathIsFun is a good resource to get a basic understanding of it.

One thing about derivatives that will help you to understand how backward propagation works is that derivative by its meaning is a tangent line to the function curve that points out the direction to the function minimum.

No alt text provided for this image


Image source: MathIsFun

For example on the plot above you see that if we're at the point of (x=2, y=4) than the slope tells us to go left and down to get to function minimum. Also notice that the bigger the slope the faster we should move to the minimum.

The derivatives of our averageCost function for parameters w and b looks like this:

No alt text provided for this image
No alt text provided for this image

Where m is a number of training examples (in our case is 100).

You may read more about derivative rules and how to get a derivative of complex functions here.

function backwardPropagation(predictions, xTrain, yTrain) {
  const m = xTrain.length;
  let dW = 0;
  let dB = 0;
  for (let i = 0; i < m; i += 1) {
    dW += (yTrain[i] - predictions[i]) * xTrain[i];
    dB += yTrain[i] - predictions[i];
  }
  dW /= m;
  dB /= m;
  return [dW, dB];
}

Training the model

Now we know how to evaluate the correctness of our model for all training set examples (forward propagation), we also know how to do small adjustments to parameters w and b of the NanoNeuron model (backward propagation). But the issue is that if we will run forward propagation and then backward propagation only once it won't be enough for our model to learn any laws/trends from the training data. You may compare it with attending a one day of elementary school for the kid. He/she should go to the school not once but day after day and year after year to learn something.

So we need to repeat forward and backward propagation for our model many times. That is exactly what trainModel() function does. it is like a "teacher" for our NanoNeuron model:

  • it will spend some time (epochs) with our yet slightly stupid NanoNeuron model and try to train/teach it,
  • it will use specific "books" (xTrain and yTrain data-sets) for training,
  • it will push our kid to learn harder (faster) by using a learning rate parameter alpha

A few words about learning rate alpha. This is just a multiplier for dW and dB values we have calculated during the backward propagation. So, derivative pointed us out to the direction we need to take to find a minimum of the cost function (dW and dB sign) and it also pointed us out how fast we need to go to that direction (dW and dB absolute value). Now we need to multiply those step sizes to alpha just to make our movement to the minimum faster or slower. Sometimes if we will use big value of alpha we might simple jump over the minimum and never find it.

The analogy with the teacher would be that the harder he pushes our "nano-kid" the faster our "nano-kid" will learn but if the teacher will push too hard the "kid" will have a nervous breakdown and won't be able to learn anything ??.

Here is how we're going to update our model's w and b params:

No alt text provided for this image
No alt text provided for this image

And here is out trainer function:

function trainModel({model, epochs, alpha, xTrain, yTrain}) {
  const costHistory = [];

  // Let's start counting epochs.
  for (let epoch = 0; epoch < epochs; epoch += 1) {
    // Forward propagation.
    const [predictions, cost] = forwardPropagation(model, xTrain, yTrain);
    costHistory.push(cost);

    // Backward propagation.
    const [dW, dB] = backwardPropagation(predictions, xTrain, yTrain);

    nanoNeuron.w += alpha * dW;
    nanoNeuron.b += alpha * dB;
  }

  return costHistory;
}

Putting all pieces together

Now let's use the functions we have created above.

Let's create our NanoNeuron model instance. At this moment NanoNeuron doesn't know what values should be set for parameters w and b. So let's set up w and b randomly.

const w = Math.random(); // i.e. -> 0.9492
const b = Math.random(); // i.e. -> 0.4570
const nanoNeuron = new NanoNeuron(w, b);

Generate training and test data-sets.

const [xTrain, yTrain, xTest, yTest] = generateDataSets();

Let's train the model with small (0.0005) steps during the 70000 epochs. You can play with these parameters, they are being defined empirically.

const epochs = 70000;
const alpha = 0.0005;
const trainingCostHistory = trainModel({model: nanoNeuron, epochs, alpha, xTrain, yTrain});

Let's check how the cost function was changing during the training. We're expecting that the cost after the training should be much lower than before. This would mean that NanoNeuron got smarter. The opposite is also possible.

console.log('Cost before the training:', trainingCostHistory[0]); // i.e. -> 4694.3335043
console.log('Cost after the training:', trainingCostHistory[epochs - 1]); // i.e. -> 0.0000024

This is how the training cost changes over the epochs. On the x axes is the epoch number x1000.

No alt text provided for this image

Let's take a look at NanoNeuron parameters to see what it has learned. We expect that NanoNeuron parameters w and b to be similar to ones we have in celsiusToFahrenheit() function (w = 1.8 and b = 32) since our NanoNeuron tried to imitate it.

console.log('NanoNeuron parameters:', {w: nanoNeuron.w, b: nanoNeuron.b}); // i.e. -> {w: 1.8, b: 31.99}

Evaluate our model accuracy for test data-set to see how well our NanoNeuron deals with new unknown data predictions. The cost of predictions on test sets is expected to be be close to the training cost. This would mean that NanoNeuron performs well on known and unknown data.

[testPredictions, testCost] = forwardPropagation(nanoNeuron, xTest, yTest);
console.log('Cost on new testing data:', testCost); // i.e. -> 0.0000023

Now, since we see that our NanoNeuron "kid" has performed well in the "school" during the training and that he can convert Celsius to Fahrenheit temperatures correctly even for the data it hasn't seen we can call it "smart" and ask him some questions. This was the ultimate goal of whole training process.

const tempInCelsius = 70;
const customPrediction = nanoNeuron.predict(tempInCelsius);
console.log(`NanoNeuron "thinks" that ${tempInCelsius}°C in Fahrenheit is:`, customPrediction); // -> 158.0002
console.log('Correct answer is:', celsiusToFahrenheit(tempInCelsius)); // -> 158

So close! As all the humans our NanoNeuron is good but not ideal :)

Happy learning to you!

How to launch NanoNeuron

You may clone the repository and run it locally:

git clone https://github.com/trekhleb/nano-neuron.git
cd nano-neuron
node ./NanoNeuron.js

Skipped machine learning concepts

The following machine learning concepts were skipped and simplified for simplicity of explanation.

Train/test sets splitting

Normally you have one big set of data. Depending on the number of examples in that set you may want to split it in proportion of 70/30 for train/test sets. The data in the set should be randomly shuffled before the split. If the number of examples is big (i.e. millions) then the split might happened in proportions that are closer to 90/10 or 95/5 for train/test data-sets.

The network brings the power

Normally you won't notice the usage of just one standalone neuron. The power is in the network of such neurons. Network might learn much more complex features. NanoNeuron alone looks more like a simple linear regression than neural network.

Input normalization

Before the training it would be better to normalize input values.

Vectorized implementation

For networks the vectorized (matrix) calculations work much faster than for loops. Normally forward/backward propagation works much faster if it is implemented in vectorized form and calculated using, for example, Numpy Python library.

Minimum of cost function

The cost function that we were using in this example is over-simplified. It should have logarithmic components. Changing the cost function will also change its derivatives so the back propagation step will also use different formulas.

Activation function

Normally the output of a neuron should be passed through activation function like Sigmoid ot ReLU or others.

要查看或添加评论,请登录

Oleksii Trekhleb的更多文章

  • # ?? Interactive Machine Learning Experiments

    # ?? Interactive Machine Learning Experiments

    TL;DR Hey readers! I've open-sourced new ?? Interactive Machine Learning Experiments project on GitHub. Each experiment…

    2 条评论
  • State-of-the-Art Shitcode Principles

    State-of-the-Art Shitcode Principles

    This a list of state-of-the-art shitcode principles your project should follow. ?? Full version of the list on GitHub…

    7 条评论
  • ?? Creating React usePosition() hook for getting browser's geolocation

    ?? Creating React usePosition() hook for getting browser's geolocation

    TL;DR In this article we’ll create a React usePosition() hook to fetch and follow browser’s location. Under the hood…

    1 条评论
  • Homemade Machine Learning in Python

    Homemade Machine Learning in Python

    I’ve recently launched Homemade Machine Learning repository that contains examples of popular machine learning…

    4 条评论
  • Machine Learning in MatLab/Octave

    Machine Learning in MatLab/Octave

    Hello Readers! Recently I’ve created Machine Learning in Octave repository that contains MatLab/Octave examples of…

    1 条评论
  • New Playground and Cheatsheet for Learning Python

    New Playground and Cheatsheet for Learning Python

    I’m learning Python and I decided to create a repository where I could put Python script samples with standard Python…

  • Top 33 JavaScript Projects on?GitHub (August)

    Top 33 JavaScript Projects on?GitHub (August)

    Hello Readers! Here is a list of 33 most starred open-sourced JavaScript repositories on GitHub as for August 17th…

  • Playing with Discrete Fourier Transform Algorithm in JavaScript

    Playing with Discrete Fourier Transform Algorithm in JavaScript

    TL;DR You may find Discrete Fourier Transform algorithm in JavaScript Algorithms repository. Discrete Fourier Transform…

  • JS Algorithms Repo is Over 30K Stars Now

    JS Algorithms Repo is Over 30K Stars Now

    It’s over 30k starts now! Two months ago I couldn’t even believe that JavaScript Algorithms repository would become…

  • Extend your LIMITS rather than your AMBITIONS

    Extend your LIMITS rather than your AMBITIONS

    You all heard that to grow you constantly need to go out of your comfort zone. You probably also heard that out of 10…

社区洞察

其他会员也浏览了