Artificial Intelligence #13: An easy maths-based strategy to understand machine learning and deep learning
Welcome to Artificial Intelligence #13
For this episode, I was originally going to post on a different theme, but I got quite a few comments on a post I made about maths on LinkedIn.
Because a few people found that post useful, I thought of expanding it a bit more on my approach of teaching AI using a maths based approach
I use a similar approach in my teaching #artificialintelligence at the #universityofoxford ?
Previously, I discussed about the significance of maths in learning AI.
So, to recap, there are mainly four things you need to understand machine learning and deep learning
·??????Probability theory
·??????Statistics
·??????Linear Algebra
·??????Optimization
So, in this post, I am going to show you a simple approach to understand machine learning deep learning based on maths knowledge that most of you already know (as a student in year 12 / A levels if you took a maths/ science-based degree)
Here is a chain of thought I use
The idea is you start with simple concepts and gradually add to them using familiar maths
Considering the limits of this article, I will illustrate a small number of steps – but even these can be hopefully useful to you.
What is a function
Let’s start with functions
In mathematics, a function is a binary relation between two sets that associates each element of the first set to exactly one element of the second set.
Image source: https://xaktly.com/MathFunctions.html
Our job, is the find this function …
Whether you consider statistics, machine learning or deep learning – it’s the same problem. But as we see below, the approach varies.
Function approximation
In the case of supervised machine learning, the concept of finding the missing function which maps two domains is called function approximation. Given a dataset comprised of inputs and outputs, we assume that there is an unknown underlying function that is consistent in mapping inputs to outputs in the target domain and resulted in the dataset. A function approximation problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a task-specific way.?
Examples of simple functions and complex functions
Let’s start with the simplest case. In the diagram below, the simplest case is a straight line ie a linear relationship between x and y
Now for the graph below, there is no apparent functional relationship that exists between x and y. However, we know, from function approximation, that there is a function that maps the x to the y(f(x)
(source unknown)
To contrast to the linear relationship, in the diagram below, there is a function that separates the blue from the green dots (on the right) – but it’s a bit more complex than the first case because it is non linear
领英推荐
Stochastic vs deterministic
Now, there is one more important complication in the quest to find this missing function
In data science, we have stochastic processes – as opposed to deterministic functions. In fact, in general, most functions in real life are stochastic. In science, you may find deterministic functions (for example Celsius to Fahrenheit conversion – which has only one answer). In deterministic models, the output of the model is fully determined by the parameter values and the initial conditions. Stochastic models possess some inherent randomness. The same set of parameter values and initial conditions will lead to an ensemble of different outputs. Stochastic models are considerably more complicated. ?Hence, real-life models are complex because they have to cater for random noise
Bias Variance tradeoff
Extending the idea of finding a function mapping the inputs to the outputs, we have another idea i.e. a tradeoff between how much noise your function learns v.s. the risk of missing valid inputs. This is a tradeoff between two errors – bias and variance – hence bias / variance tradeoff. The variance is an error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modelling the random noise in the training data (overfitting). ?High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
Image source: https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76
This also leads to two other well-known terms – overfitting and underfitting. So, overfitting means the function has learnt the noise. So high variance (sensitivity to noise). Underfitting means the function has not learnt enough relationships between the inputs and the outputs (high bias). This is a tradeoff.
Inference
There is of course no point in just learning a function. The whole point of learning is to infer / predict ie. to extrapolate the knowledge, you have learnt into new areas. That brings us into the realm of statistical inference. But that’s a whole complex subject for another time. But even now you see how connections can be built incrementally to learn new ideas from ones you already know.
Neural networks
So far, we have seen certain types of functions i.e. ?linear or non-linear. These are handled by traditional statistics and machine learning methods.
But what if you have a relationship which is complex and hierarchical i.e. ?images or text or video (as opposed to traditional tabular data).
To learn this hidden relationship, we need to provided examples to the neural network at a higher level of abstraction (from a corpus of text - it can detect relationship between words / from a set of images - we can detect what makes up the object - ex a cat has fur, whiskers etc)
The caveat of course is that the data must reflect the problem (capture the features) and you need many examples) because now you are learning both the function but also the structure of the data (and the structure maybe hierarchical). The number of training examples increases the more features you have and / or more hierarchal the relationship between the features because every layer of the neural network is learning one element and the subsequent layer is building on it. Ex the lowest layer learns pixels .. the next layer learns edges etc etc
Image source: https://www.deeplearningbook.org/
This brings us to the fact that ‘deep learning’ is best described as representation learning
In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.
PS I don’t know the source of this diagram
If you have followed me so far, congrats. While there is much more to go, you would have learnt even from this flow – far more than most people know.?I enjoyed writing this because its also the approach I use for teaching
I enjoyed writing this
If you come from a pure development background, don’t ignore the maths as I said before significance of maths in learning AI.
And the good news is .. there are only four main ideas you need to know to understand the maths of AI
·???????Probability theory
·???????Statistics
·???????Linear Algebra
·???????Optimization
Cloud Engineer and Data Scientist
3 年Great Article Ajit Jaokar
Strategic B2B SaaS Product Leader | Advisor | Legal Tech | Compliance
3 年Excellent article Ajit Jaokar! Will be keen on seeing more from you on statistical inference in deep machine learning and how businesses can gain trust in deep learning flow outputs.
Managing Director at Alvarez & Marsal
3 年Very interesting Ajit Jaokar, thanks for sharing it Steven Coates!
CSO at Chthonian | Experienced in AI Deployment at Scale | Available for Interim Advisory Roles
3 年Ajit, I love this article. Very clear and relatively straightforward explanation. I shall direct my clients to this when they want to dig a bit deeper into AI. Thanks for sharing.
Principal Consultant - Data Engineering
3 年Very clear explanation. Thanks a lot Ajit. Helped me better understand the application of maths in AI.