Black and white boxes: explaining the maths of machine learning

Black and white boxes: explaining the maths of machine learning

Background - black box models

I am considering using this idea to explain the maths of machine learning to students.?

We are used to calling machine learning and deep learning algorithms as ‘black boxes’

However, to understand the maths behind machine learning and deep learning algorithms, we may need to consider the idea of ‘black and white boxes’ - as I explain below

Machine learning algorithms can be expressed as a hidden function between x and y ie inputs and outputs?

In layman’s terms: Imagine you have a magic box. You can put something into this box (let's call it 'X'), and the box will give you something back (let's call it 'Y'). The magic box is doing something inside, but you can't see what it is. All you know is that whenever you put in a certain 'X', you'll get out a certain 'Y'.

So, saying "machine learning algorithms can be expressed as a hidden function between 'X' (inputs) and 'Y' (outputs)" is just a fancy way of saying: Machine learning is about figuring out the formula that transforms your inputs into the outputs you want, even if we can't see exactly how that formula works on the inside.

This is all well and good - but why can we not figure the mechanism of the back box?

Data driven approaches

Firstly, whatever the approach, the internal mechanism needs the parameters of the algorithm to be determined. In the simplest case of a straight line, there are two parameters (m and c) for the equation y = mix + c. In the case of deep learning and LLM models, the number of parameters are in the millions or the billions.

Now, black box approaches are data driven. Hence, they simultaneously work (based on model evaluation metrics) but also their mechanism is unknown (black box operations)

So, the next logical question is: what are the alternatives to a black box model??

Alternative to black box models

What is the alternative way of expressing a relationship between x and y?

That’s where the traditional / statistical approaches come in - and what we can see as the ‘white box’ i.e. we transformation is not hidden but is rather explicitly known to some degree.

Given x and y, you could express a relationship between them as?

Linear Regression: y = mx + c

Statistical Correlation: Correlation measures how closely two variables are related. For example, if? X increases and Y also tends to increase, they may have a positive correlation. This doesn't tell you exactly how? X causes? Y to change but indicates whether there's a relationship and how strong it is.

Rules-based Systems: Sometimes, the relationship between? X and? Y can be defined by a set of rules or logic. For instance, if X is "temperature," and Y is "state of water," then the rules could be simple: if? X is below 0°C,? Y is "ice"; if X is between 0°C and 100°C, Y is "liquid"; if? X is above 100°C,? Y is "steam".

Non-Linear Models: Sometimes, X and Y have a more complicated relationship that might involve curves, where increasing? X doesn't always increase Y in a straightforward way. This can involve polynomial equations, logarithmic or exponential functions etc

Decision Trees: These models use a tree-like graph or model of decisions and their possible consequences to express the relationship between inputs and outputs. Starting from a root, decision branches are created based on conditions or choices, leading to different outcomes or predictions for.?

Difference between the two approaches

The key difference between traditional machine learning approaches and the methods we discussed before (like linear regression, statistical correlation, rules-based systems, non-linear models, and decision trees) lies in how they learn and adapt, their complexity, and their interpretability.?

Learning and Adaptation: Machine Learning Approaches typically adjust their internal parameters based on the data. They're designed to learn complex patterns through a process of trial and error, using a large amount of data. This includes adapting to new data without being explicitly programmed to do so after the initial training. In contrast, statistical methiods dont ‘learn’ from data in the same way

Complexity: Machine Learning Approaches can be highly complex, especially with deep learning models, which can have millions of parameters. This allows them to capture very subtle and complicated patterns in data but at the cost of requiring a lot of computational resources. In contrast, traditional methods are generally simpler and more transparent. A linear regression model, for example, can be fully described by its slope and intercept. This simplicity can be an advantage when you need to explain your model's predictions clearly.

Flexibility and Application: Machine Learning Approaches are very flexible and can be applied to a wide range of complex tasks, such as image recognition, natural language processing, and predicting highly non-linear patterns. In contrast, while traditional algorithms have limitations in handling complex patterns as effectively as machine learning models, they are highly effective for simpler, well-defined problems. They are also useful when data is limited or when models need to be easily explained.

Implications - hidden functions and statistical tests

Thus, we have two options

  1. We can learn the function from data (black box) OR
  2. We can define the underlying mechanism as explicitly as we can

Now, once you see it in this way then hidden functions and statistical tests are two sides of the same aspect.?

Statistical tests are procedures used to make decisions or inferences about populations based on sample data. Statistical tests provide a framework to evaluate hypotheses, assess relationships between variables, and determine the significance of predictive features.?

Thus, statistical tests provide the ‘white box’ mechanism instead of the data driven hidden function

It’s not all (statistically)black and white

It’s not (statistically) black and white :) - pun intended

  1. Some algorithms are used in both statistics and machine learning - for example linear regression
  2. Some machine learning algorithms are interpretable - ex decision trees
  3. The comparison of statistical tests vs hidden functions is a simplification. It excludes some other cases (ex rule based)

Next steps?

If we extend the comparison of statistical tests vs hidden functions in machine learning, we need to list ML functions and statistical tests and see how statistical tests can be used with ML

Welcome thoughts

If you are a non developer and want to learn AI with me, please see Erdos Research Labs

You can meet me and our team at our Oxford AI summit

If you would like to study with me, see our courses

Low code AI course at the university of oxford? for non developers

AI and digital twins

If you found this useful, you can sign up for my book

Image source: dall-e

Rodney Beard

International vagabond and vagrant at sprachspiegel.com, Economist and translator - Fisheries Economics Advisor

7 个月

要查看或添加评论,请登录

社区洞察

其他会员也浏览了