Machine Learning Explained
Image by Gerd Altmann from Pixabay

Machine Learning Explained

Machine learning (ML) has been out there for quite a few years now, and there is an infinite number of information sources, including lectures, courses, and more.

As a developer you probably heard by now about ML and even had a few lectures from colleges about it, and still… for me at least, it took more time to dig into it before I fully understood what this "ML" concept is all about.

I wrote this article to make it easy for someone to understand ML without the need to spend hours or even days of learning.

Part 1 – what is machine learning?

One of the known definitions of machine learning is:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E (Tom Mitchell/1998).

Sounds complicated?

To me it is….

Let's make it simpler:

The basic concepts of ML:

  1. There should be a task for your algorithm. It needs to be designed to perform a specific task. So, there is no voodoo science fiction in it in which it learns what to do like an AI in the movies. The task can be recognizing a car in a picture, translating human speech to text, or many more other tasks.
  2. An ML algorithm needs training data. The data helps it "learn" the mathematical formula which can best perform the task. This is usually called the training set of data.
  3. An ML algorithm will need to be able to evaluate success or failure (or how far was it from performing the task). That is called "performance measurement".

OK, so we need to define a task, have data to train on, and a way to recognize how well we have done the task in hand so we can improve.

Let's take a simple example and try to solve the following formula:

Y = a*X + b

And with it the following training set:

  • X=1 Y=3
  • X=2 Y=5
  • X=3 Y=7

While we can easily solve this formula using the mathematical tools we have, let's assume for a second this is a complicated formula that we cannot solve easily.

So, let’s perform the following:

 1.      Let's first guess some kind of "a" and "b" to start with

2.      Now for each example in the training set:

a.      Calculate the result with those "a" and "b" values

Ypred = a*X + b

b.      See how far are we from the truth. This can be called the delta from the prediction or the loss function

| YPred – Y |

3.      Now we can average the loss function values for all training set examples. This value is called the "cost" of our algorithm and gives us a rough estimate of our performance.

4.      We can now try one the following changes

a.      Increase or decrease a by 0.1

b.      Increase or decrease b by 0.1

5.      After that, we will calculate again the "cost" of our algorithm using the new factors, and if we got a better result, we will make the change.

6.      We will go back and perform steps 4-5 over and over until we are satisfied with the result.


 Once we have finished the training we can go and try our mathematical formula with new data to get a prediction of expected Y value.

Is this a machine learning algorithm?

Yes, it is absolutely a machine learning algorithm. We will talk about more complex mathematical structures on the later parts of this article but you have just experienced your first machine learning algorithm.

In step #4 we tried blindly to change the parameters for our ML model, and we have paid for it in some additional calculations so our learning performance is not going to be optimal with this approach.

Most ML learning algorithms use a method called gradient descent (or more complex algorithms like Adam for example) which are based on the derivatives of the mathematical function to advance to the optimal points.

To be a successful machine learning engineer you won't need to know complex derivatives algebra because nowadays tools (like TensorFlow for example) will do that math for you.

We used a step of 0.1 in this example. This is called the learning rate and it is also something that usually you can configure during the training stage.

This algorithm is called linear regression (image below from Wikipedia).

Linear regression (source Wikipedia)

The mathematical function can also have more than one input feature.

For example, here is a training set with 4 features for each training example:

  • X1=… X2=… X3=… X4=… Y=…
  • X1=… X2=… X3=… X4=… Y=…
  • X1=… X2=… X3=… X4=… Y=…
  • X1=… X2=… X3=… X4=… Y=…

We call those input parameters "features", and in this example the mathematical value will have a weight factor for each feature, meaning it will be:

Y = a*X1 + b*X2 + c*X3 + d*X4 + e

This algorithm we just saw is considered supervised machine learning. Supervised machine learning algorithms require training data with calculated Y values (also called labeled data). There are also unsupervised machine learning algorithms in which the classification of the data is not required to perform learning. This article focus on supervised machine learning algorithms.

Part 2 – More mathematical models

Logistic regression

Let's assume that we have a binary classification model (1=Yes/0=No) in which if our X value is bigger then 10 the answer should be Yes otherwise it should be No.

Linear regression will not be able to fit this properly as we can have values such as X=100000 Y=1 and a straight line cannot work.

Think about the following values:

  • X=0 Y=0
  • X=5 Y=0
  • X=11 Y=1
  • X=100000 Y=1

A straight line like we saw above on the linear regression illustration will not work.

Comes to the rescue, logistic regression.

Logistic regression uses a logistic function (hence the name).

Let's look at the graphical representation of the "sigmoid" function which is quite common to be used for logistic regression (source Wikipedia):

No alt text provided for this image

We can define the following thresholds for binary classification using the sigmoid function:

  • If Y>0.5 predict Yes
  • Otherwise, predict No

Other thresholds can be chosen depends on what is the ratio between false positives and false negatives that best suite our algorithm use case.

Neural networks

So, what are neural networks?

You can think about the logistic regression unit we just saw as one neuron inside a network.

The features (or the inputs) for a neuron are the outputs of the previous layer in the network or if it is the first layer, the input features (from the training example).

No alt text provided for this image

In the picture (from Wikipedia), the red circles are the inputs (let's assume X1, X2, X3 features of our input example). The blue circles are neuron that gets their inputs from the input features (X1, X2, X3) and passes their outputs to the next layer and the green circles are the final layer which gets inputs from the blue layer and returns the estimation.

There are more complex NN models such as convolutional neural networks (which are best used in machine vision) and reoccurring neural networks (which are best used for voice recognition or NLP tasks).

Part 3 – High Bias and High Variance

To evaluate a machine learning algorithm performance post-training, we will usually keep a small amount of labeled data which we will call the "test set". We will not use this test set data until we want to evaluate the performance of our algorithm.

We will divide the performance cases of our ML algorithm to 3 states: "High Bias", "High Variance", or "Exactly Right".

High bias

High bias is also called underfitting.

In this case, your model will fail to work well even on the training set and It will fail to significantly improve the performance over the training set.

In this case, you will notice during training your cost function will fail to improve or reduce overtime, and your predictions will fail to satisfy your needs (for example your algorithm will fail to properly detect the car in the image).

The usual cause for this is that your model is not sophisticated enough to perform the task.

Think about it this way: had I tried to detect a car in an image by using a simple linear regression algorithm, my algorithm will probably fail. Machine vision requires a much more complicated algorithm than just a one linear regression computation unit such as deep NN.

Another reason can be that the data you have simply does not reflect the outcome (for example trying to learn the weather from your appetite may not work if those are not connected).

High Variance

Another very common fail case for ML algorithms is called high variance or overfitting.

You can think about high variance as the state in which the ML algorithm instead of learning to generalize from the training set, learned exactly how to calculate just the training set.

In this case, the performance on the training set will be very good but when we go back and try our predictions on the test set, you will see a significant decrease in performance.

I will illustrate this with an example.

Let's look again at the training set from our first example:

  • X=1 Y=3
  • X=2 Y=5
  • X=3 Y=7

Now let's think about a more complicated ML structure (such as ML) which can learn the following:

  • If X=1 return Y=3
  • Else if X=2 return Y=5
  • Else if X=3 return Y=7
  • Otherwise, return 0

As you can see the algorithm will excel on the training set but will fail to perform meaningful predictions on any values outside of the training set.

So, what can we do if we have a high variance problem?

There are a few possible solutions for this state, one is to increase our amount of training set data. As large is the training set the harder it is for the algorithm to overfit.

But, sometimes it is very hard to get more data, especially because the data has to be labeled. So, another option is to synthesize new data. For example, if the ML is trying to classify cats out of pictures, we can try to enlarge, distort, or change the lighting in the picture to introduce more training examples.

Another possible solution for overfitting is introducing noise into the learning algorithm. Those normalization technics or dropouts.

Those techniques allow keeping the same size of training set but still forces the algorithm to generalize for it.

If your algorithm does not suffer from high bias or high variance it is considered exactly right, meaning it is optimized for the task.

Conclusion

ML algorithms are becoming the standard nowadays for a lot of computational tasks.

In the past algorithms for voice recognition, machine vision natural and language processing were very complicated and were comprised of many tailored designed sub-algorithms.

Modern implementation of those are now using somewhat much simpler ML models and will perform much better on those tasks than the classic algorithms.

In this article, I described what is ML and showed a few basic concepts around ML.

There are many other types of ML algorithms and creating the one which is best for a task is sometimes just a matter of trial and error until getting it right.

When designing and training an ML algorithm there are many more factors that I have not shown here that can determine the efficiency and speed of learning and the resulting performance of the algorithm.

If you like to learn more, I highly recommend the courses provided by deeplearning.ai

Amichai Oron

I Help Tech companies transform their vision into paying products. Proven success with $100M+ Industry Leaders, Align your product with customers and investors in 90 days

1 个月

???? ??? ?? ?? ???????? ??? ????? ???? ?????? ???: ?????? ????? ??? ??????? ?????? ??????, ?????? ?????? ??????,?????? ????? ????????. https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU

回复
Svetlana Ratnikova

CEO @ Immigrant Women In Business | Social Impact Innovator | Global Advocate for Women's Empowerment

3 个月

???? ??? ?? ?? ???????? ??? ?????? ???? ?????? ???: ?????? ????? ??? ??????? ????? ????? ?????? ??????. https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU

回复
Jason Maughan

Passionate Problem Solver using Software, IoT, and Data

4 年

Excellent article, Roy. Thanks for taking the time to write and share it.

回复
Amir Vashkover

Cybersecurity Executive | Business Mentor | Keynote Speaker

4 年

Very clear article for a change for such a sophisticated area. Thanks Roy!

回复

要查看或添加评论,请登录

Roy Dar的更多文章

  • Zooming to security

    Zooming to security

    Zoom (https://zoom.us/) has been around for almost 10 years.

    3 条评论

社区洞察

其他会员也浏览了