Artificial Intelligence No 50: Machine learning v.s. Statistics
image source - mad magaine spy vs spy

Artificial Intelligence No 50: Machine learning v.s. Statistics

Welcome to the 50th edition of Artificial Intelligence

We crossed 45,000 subscribers this week !

In this edition, I will extend the discussion from the last episode.

But first, a couple of announcements

If you want to study with us at #universityofoxford for our next course see:?Developing Artificial Intelligence Applications using Python and TensorFlow . Also, our course on?Digital Twins ?at #universityofoxford is almost full.

Also, I follow interesting trends in AI for my future teaching at Oxford. ARTIFICIAL INTELLIGENCE EXECUTIVE BRIEFING 2021 IN REVIEW AND 2022 FORECASTS by snglr group highlights some interesting trends which I have been also following at Oxford.

?In the last edition, we discussed the difference between statistical inference and predictive inference

In this section, we will expand this idea more by considering Statistics vs Machine learning

?We are not really talking of machine learning is better than statistics or vice versa

Instead we are using this thought experiment as a way to learn the maths foundations of AI

We are also referring to a specific paper 2001 paper by Leo Breiman - statistical modelling – the two cultures

?The abstract says

?There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has???kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in ?elds outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a ?eld is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

?

?Quite strong stuff! Written back in 2001.

But I think it illustrates some key ideas and helps to unify maths thinking with stats thinking

I shall expand more on the key ideas of this paper

Statistics starts with data. Think of the data as being generated by a black box in which a vector of input variables x (independent variables) . The values of the parameters are estimated from the data and the model then used for information and/or prediction. This in the most general case, you have

No alt text provided for this image

?Image source Leo Breiman

In the statistical approach, we assume a ?stochastic data model. For example, ?data are generated by independent draws from

response variables = f(predictor variables, random noise, parameters)

?The values of the parameters are estimated from the data and the model then used for information and/or prediction. ?The belief is that a statistician, by imagination and by looking at the data, can invent a reasonably good parametric class of models for a complex mechanism devised by nature. Then parameters are estimated and conclusions are drawn. So, in this case, you have

No alt text provided for this image

Image source Leo Breiman?

In the machine learning case, we consider the inside of the box complex and unknown. The approach is to ?nd a function f (x) to predict the responses y. Here, we focus on ?nding a good solution based on predictive accuracy.

So, we have:

No alt text provided for this image

?

?Image source Leo Breiman?

Now, the machine learning community started to evolve and develop separately from the pure statistical community through the development of two new powerful algorithms in the mid 1980s i.e. neural nets and decision trees and later SVM by Vladimir Vapnik. This allowed for the development of complex prediction problems where there was no chance of knowing the underlying data models.

?The paper also argues that the machine learning community benefited from three revolutions that allowed it to progress even further. ?

?1.????Rashomon:?There is often not a single model that fits a data set best but there usually is a multiplicity of models that are similarly appropriate.

2.????Occam:?The simplest model is often the most interpretable. However, the simplest model is also often not the most accurate model for a given task.?

3.????Bellman:?Large feature spaces are often seen as a disadvantage. However, modern machine learning methods heavily rely on expanding the feature space in order to improve predictive accuracy. (curse of dimensionality)

?So to recap,

In the machine learning community, we evaluate a variety of models, select the best performing model and empirically determine loss on test set with the goal of predicting the outcome for new/unseen samples.?

In the statistical community, we try to?understand the data generation process and select a model whose assumptions seem most reasonable for that distributions. Using goodness of fit tests, we use the model to explain the data generation process and understand the parameters

Post this paper (2001) of course in 2010, neural networks formed the basis of deep learning and in the later half of 2010s, we saw an increase in unsupervised models based on deep learning and also reinforcement learning models and large language models.

So, the conclusion is that these two cultures (statistical and machine learning) evolved separately but converged post 2010.?However, appreciating their emphasis in evolution allows us to understand the mathematical foundations of data science.

Image source: Mad magazine - spy vs spy could not resist it :) maybe an opportunity to introduce to younger readers!


Jennifer Wines, JD, CPWA?

#1 Amazon Bestselling Author | Speaker | Invisible Wealth Consulting

2 年

I really appreciate how you make these topics approachable, Ajit Jaokar

Ridwanullahi Abdulrauf

Embedded Software Engineer

2 年

Thank you very much sir. So the strength and weakness of statistical model and machine learning models is as a result of their underlining approach to data analysis.

Luca Sambucci

AI Security Innovator | 30+ Years in Cybersecurity | Protecting the Future of AI

2 年

Dear Ajit, I follow your very interesting newsletter since the beginning, and it's an honour to see our AI brief being mentioned there today. ??

Atul Shukla

Connecting the dots, delivering IT solutions

2 年

So essentially we are saying - in statistics get data in a fixed format and start building models but in ML build models first, get data to validate that model and then keep refining the model based on the available data set...I am sure both are going to converge at some point of time ....how and when we converge is worth exploring, no? Thoughts? Or we let them remain distinct and separate?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了