What really is machine learning?

What really is machine learning?

In order to grasp the actual nature of machine learning, and be ready to understand how those fancy and popular Generative AI tools really work, we’re gonna start with some basics – let’s try to understand how does classical software work.

Every single (classical) computer program, regardless of programming language that had been used to implement it, works more-or-less the same way:

  1. It takes some input data
  2. It transforms the input data
  3. It returns output data

The second step in the process – transformation of the data – is obviously the crucial part, and we even have a name for it – an algorithm.

An algorithm is simply a well-defined recipe that our computer follows step-by-step in order to produce output data (on the basis of input data).

Just like you have a cooking recipe: you have well-defined list of ingredients (input data), you have a step-by-step process of how to manipulate the ingredients (algorithm) to finally achieve desired dish (output data).

There are plenty of well-known algorithms, from simplest academic-grade sorting algorithms to very sophisticated ones like for example FFT (fast-fourier transformation) or RSA cryptography. Algorithms are brains of the software we use. And we can obviously combine those algorithms together to build really outstanding software (pick whatever you like: operating system like Windows or Linux, Adobe Photoshop, Excel, or maybe latest Call of Duty).

However, there are types of problems and use cases where classical algorithms are failing, either because we still didn’t discovered an algorithm for solving given problem (in reasonable computing time), or because the given task is non-algorithmic (meaning that we know that those kind of problems cannot be solved be a specific set of rules).

Non-algorithmic problems are for example: true random number generation (yeah, computers can only generate pseudo-random numbers), creativity processes simulation (composing music or art, writing poetry, copywriting) or human-decision making – I guess you see where we are going with this...

I personally often refer to this simple thought process when trying to visualize the actual challenge with non-algorithmic problems - imagine you have a input data in a form of greyscale 2D image (let’s say 400x400 pixels). You have to solve two use cases:

Use case 1: Count how many black pixels are in the image

Before we solve this problem, think about it for a second – can you as human answer this question just by looking at the image?

Remember, we are talking about 160000 pixels here. I mean maybe there are some genius savants somewhere out there (joking, it’s impossible). Noone could possible solve this task, at least without spending hours, days or even weeks of manual labor. And what if this was a 4k image instead (8294400 pixels)..?

Fortunately, we have computers, and this is super simple algorithmic task. All we need to do is just iterate over whole bitmap and check whether given pixel is black. Example pseudo code could look like this:

	count = 0
	for each pixel in image:
		if pixel is black then increase count by 1
	display “count” variable on the screen        

Job done. We’ve written an recipe (algorithm) that takes the image (input data) and returns exact number of black pixels (output data). Our computers will answer this question in a couple miliseconds, even for super big resolution images.


So now let’s move on to the second, slightly more sophisticated use case.


Use case 2: Assuming that provided 2D image is a photograph of either cat or dog, write down an algorithm that classify the image (software must decide, whether provided input image represents a dog or a cat)

And now it starts getting tricky. There is no known algorithm that answers whether on provided image we have a cat, or a dog. How to even approach it? Maybe there are some typical features of cats that distinguish them from dogs (i.e. pointed ears), or that dogs are having long snouts, but how to exploit this knowledge programmatically? How to write and algorithm that will detect where are exactly ears or noses in the image (this is actually generally speaking algorithmically possible, but definitely not easy), also take into account that there are different breeds of dogs and cats, how to mathematically define those differences, etc.

Let’s put it like this: even though I’m programming for more than 20 years I would not tackle this task with classical software engineering approach, meaning that I would definitely not try to write an algorithm for this.

Does it mean that we are doomed? You already know that we are not, and you most likely already seen a lot of demos of software that can detect variety of things in images, much more sophisticated that simple binary classification of dogs and cats. So what is the trick? You guess it – AI and machine learning!

Image recognition example, generated with ChatGPT


I will deep dive into how AI/ML can be exactly used in our dog/cats classification, but for now I will just say probably the most important sentence of this whole article: machine learning is used to approximate complex algorithms.

To be more specific, machine learning is used to mimic algorithms that are not easy or are simply are not possible to be implemented using classical approach (set of rules), achieving it by creating so-called machine learning models.

You don’t know how to write an algorithm for a given problem? You can use ML to generate such algorithm for your – it will not be deterministic and 100% accurate algorithm, but if the training of the model is done well it’s accuracy surely will be high enough. Stay tuned, we will come back to ML in a second.

Another aspect to be consider about this use case is ‘human factor’ – can you answer the original question (is there a dog, or a cat in the picture) just be looking at the image instantaneously? I guess your answer is yes. So how did you decide whether it is a dog or cat? What was your decision making process? Probably you’ve looked at the nose, eyes, tails, pose of the animal and made a decision, but can you write down ‘the recipe’ your brain performed during this judgement? I guess not. You just “feel and know” that you are looking at the cat (or a dog), don’t you?

If you ever used Excel, you've most likely aready used machine learning on your own, and you not even know it!

If we take one step back and take a look from a different angle, we notice that an algorithm can be also defined as mathematical function. You know: y = f(x) where:

  1. x is input data
  2. f is an algorithm (a function that takes x as input and generates the output: y)
  3. y is output data

Algorithm can be considered as a “box” that transforms input data into output data.

We also already know that there are algorithms (functions) that are relatively easy to define and implement (like counting down black pixels in the image), and that there are different families of problems for which defining such algorithm is hard or impossible (like deciding whether we have dog or cat in the image). As already mentioned, machine learning can be used to approximate the unknown algorithm. So how does it work and what do we need?

Well, there are different flavors of machine learning types (supervised, semi-supervised, unsupervised, reinforced) and techniques (linear/logistic regression, decision trees, neural networks, SVMs, and many others) and each type requires different kind of prerequisites and preparation. Today I will focus on probably most popular type of machine learning – supervised machine learning, as it has most generic spectrum of application.

Long-story short is that in machine learning we want to train a model, which in essence will be our algorithm, our previously mentioned “box” (or rather “black box”) that will take any input data and transform it into desired output data.

To be more mathematically concrete, we want to approximate the “f(x)” function, that generates the “y”. The clever part of supervised machine learning (and the prerequisite) is that we train the model based on prepared dataset of “x” and corresponding “y”. Machine learning model “learns” the relationship between x and corresponding y thus in essence – approximates the function “f” - which is the desired algorithm.

Let’s consider simple example of supervised machine learning modeling (that probably you’ve already done at least couple of times before, and you’re not even aware of it). You have a dataset of ice cream sales with the relationship between outdoor temperate and the revenue generated during 12 days of sales:

Your job is to write a software that estimates the revenue based on given temperature. We don’t know the actual algorithm for it, right? We need to ‘learn it’ from the data itself. First let’s plot our dataset:

We can notice that there seems to be some linear relationship between the temperature variable, and the revenue. So let’s try to apply supervised machine learning technique (linear regression) and construct the model - I will use Excel build-in functionality of generating trend line to achieve it:

What we’ve got is a relationship function – algorithm, or basically a model – that can be used to forecast the value of revenue based on the temperate:

f(x) = y = 22.137 * x + 31.835

And now we can use this function to calculate the forecasted revenue:

One can say that the values are not 100% accurate (in the end we are approximating the algorithm), but for me they seems to be “good enough”. To prove it to you, below I plot the forecasted revenue - which was estimated based on only 12 initial data points – on much more bigger data set:

You can clearly see that the orange line (our model) is approximating the reality pretty well.

Believe it or not – this trivial example is exactly what supervised machine learning is all about – we’ve taken initial data set of x and y (12 data points) and based on that we’ve created a model (linear regression model) that represents the algorithm (f) that is able to determine the potential revenue (y) based on initial temperature (x). Without writing single line of ‘classical algorithm’. There is no ‘recipe’ in here.

Ok, but why I am talking about school-grade linear functions that everyone was solving on math classes? How does it relate to Generative AI, ChatGPT and discovering whether there is a dog or a cat in the image?

Well, it turns out that you cannot approximate any possible function f with just statistical-based model like linear regression.

There are certain limitations with such approach which I don’t want to bring here in this article to overwhelm you now – if you are interested about it, start googling “linear separability” problem.

Neural Networks for the rescue!

And that’s the moment when we introduce the big player – the almighty artificial neural networks. Yet again – my goal today is not to elaborate on technical aspects of the underlying mechanisms and mathematics of the neural nets (I will tackle it in another article), but rather describe them on the basis of the knowledge we already have.

So what makes neural networks such powerful?

Simply saying neural networks are considered as universal approximator, which means that they can approximate any given function f(x).

If there is any relationship existing, neural network theoretically can find “an algorithm” to model this relationship. And this gives almost limitless possibilities for computing, giving us ability to create software that can literally mimic our own brains (in certain pretrained scenarios). For more details: https://en.wikipedia.org/wiki/Universal_approximation_theorem)

Let’s come back to our use case of classification of dogs vs cats. Knowing that neural networks can approximate any relationship we should be able to use them to create an algorithm (or rather a model) that will be able to distinguish the difference between the animals. In the end, our problem can be narrowed down to finding function “f” that will take as a input a 2D image, and return the output representing either dog or a cat:

f(image) = {0: cat, 1: dog}

Images are simply bitmaps, consisting of numbers that represents pixel colors, so in the end – they are simple numeric inputs. Function f – our “black box” - is a neural network, and our output is represented by single digit (0 represents that we have a cat in the image, or 1 if there is a dog).

Considering that we have some initial data set of images of dogs and cats we can process them by the neural network (teach the neural network the relationship), and after the training process the neural network will be our model (algorithm) that for any given image will return 0 for cat, and 1 for dog.

It will probably not be 100% accurate again (just like our simple example with linear regression of ice cream sales), but it will be “good enough” (let’s say 98-99%).

Job done! Without writing single line of code of the classical algorithm, we have developed a software capable to understand whether it see a dog, or a cat in the image. Amazing, isn’t?

Obviously I am simplifying a lot the whole process, machine learning engineering consists of many steps like data preparation, features selection and engineering, model selection, training, hyper tuning etc., but my goal is to share with you the general ideas that stands behind machine learning.

The rise of Generative AI

Finally, we can let our imagination run wild and start considering more interesting use cases for AI and machine learning (knowing it can literally do anything), and how Generative AI was born.

Some wise data scientists from time to time are discovering new types of neural networks, new ways to process signals, new architectures and new ways to train the models, leveraging GPU and TPU for computing etc. which eventually – together with the growth of available computational power – let us tackle the holy grail of machine learning – natural language processing.

Knowing what you already know from this article you should now have at least an idea of how tools like ChatGPT works. In the end – the way how it works is really no different from the ice cream sales or dog/cat use cases:

  1. we have an input data (prompt in natural langauge)
  2. we have an output (response in a form generated text or image)
  3. we have an algorithm (in case of ChatGPT we have a GPT model, which in essence is sophisticated neural network) that transforms the prompt into the response

The key aspect of this kind of generative models (LLM – large language model) is it’s tremendous size (number of trainable parameters of the neural network), vast initial data set used to train the relationship between prompt and desired output, horrendous training time and super clever architecture of the neural network with so-called attention mechanism behind (for those who are interested here is more about Transformer neural network: https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))

I hope you now know understand a little bit more about the fancy, "not-really-magical" AI world ??

Randy Thompson

I help companies define and implement their IoT vision

9 个月

Very good description of something we all talk about, but don't always fully understand or appreciate.

回复
Sanjoy Dey

Engineer????Real-Estate Pro| MultiFamily Syndicator??| Wealth Strategist??| Traveller??| Reader??| Ex-Qualcomm

10 个月

thought-provoking insights. let's dig deeper into the nuances.

回复

要查看或添加评论,请登录

Jacek Gralak的更多文章

社区洞察