The difference between Statistical Modeling and Machine Learning, as I see it
Oliver Schabenberger
Professor of Practice, Virginia Tech | Data & Analytics | Lifelong Learner
I frequently get asked about the differences between Statistics (statistical modeling in particular), Machine Learning and Artificial Intelligence. There is indeed overlap in goals, technologies and algorithms. Confusion arises not only from this overlap, but from the buzzword salad we are being fed in non-scientific articles.
Statistical Modeling
The basic goal of Statistical Modeling is to answer the question, “Which probabilistic model could have generated the data I observed?†So you:
- Select a candidate model from a reasonable family of models
- Estimate its unknown quantities (the parameters; aka fit the model to data)
- Compare the fitted model to alternative models
For example, if your data represent counts, such as the number of customers churned or cells divided, then a model from the Poisson family, or the Negative Binomial family, or a zero-inflated model might be appropriate.
Once a statistical model has been chosen, the estimated model serves as the device for inquiries: testing hypotheses, creating predicted values, measures of confidence. The estimated model becomes the lens through which we interpret the data. We never claim that the selected model generated the data but view it as a reasonable approximation of the stochastic process on which confirmatory inference is based.
Confirmatory inference is an important aspect of statistical modeling. For example, to decide which one of three possible medical devices has the greatest benefit to patients, you are interested in a model that captures the mechanism by which patient benefits are differentiated by treatment. It will often be the case that the model that captures the data-generating mechanism well is also a model that predicts well within the range of observed data—and possibly predicts new observations.
Classical machine learning
Classical machine learning is a data-driven effort, focused on algorithms for regression and classification, and motivated by pattern recognition. The underlying stochastic mechanism is often secondary and not of immediate interest. Of course, many machine learning techniques can be framed through stochastic models and processes, but the data are not thought in terms of having been generated by that model. Instead, the primary concern is to identify the algorithm or technique (or ensemble thereof) that performs the specific task: Are customers best segmented by k-means clustering, or DBSCAN, or a decision tree, or random forest, or SVM?
In a nutshell, for the Statistician the model comes first; for the Machine Learner the data are first. Because the emphasis in machine learning is on the data, not the model, validation techniques that separate data into training and test sets are very important. The quality of a solution lies not in a p-value, but in proving how well the solution performs on previously unseen data. Fitting a statistical model to a set of data and training a decision tree to a set of data involves estimation of unknown quantities. The best split points of the tree are determined from the data as are the estimates of the parameters of the conditional distribution of the dependent variable.
Neither technique can claim to be learning, in my opinion. Training is the process of shaping something. Learning, on the other hand, implies gaining a new skill and training is part of learning. By training a deep neural network—that is, determining its weights and biases given the input data—it has learned to classify, the network morphed into a classifier.
Modern Machine Learning
A machine learning system is truly a learning system if it is not programmed to perform a task, but is programmed to learn to perform the task. I refer to this as Modern Machine Learning. Like the classical variant, it is a data-driven exercise. Unlike the classical variant, modern machine learning does not rely on a rich set of algorithmic techniques. Almost all applications of this form of machine learning are based on deep neural networks.
This is the area we now tend to call Deep Learning, a specialization of Machine Learning, and frequently applied in weak Artificial Intelligence applications, where machines perform a human task.
Role of the data
We can now distinguish statistical modeling, classical machine learning and modern machine learning by the role of the data.
In statistical modeling, the data guide us to the selection of a stochastic model which serves as the abstraction for making probabilistic statements about questions of interest, such as hypotheses, predictions and forecasts.
In classical machine learning, the data drive the selection of the analytic technique to best perform the task at hand. The data trains the algorithms.
In modern machine learning, the data drive systems based on neural nets that self-determine the regularities in the data in order to learn a task. The process of training the neural network on the data learns the task. As someone put it, “The data does the programming.â€
Engineer with diverse skillset
5 å¹´Good differentiation highlighted
Chief Implementation Officer & Head of Global Recruitment
5 å¹´Nice article. Thanks!
Medical Doctor and Researcher - Design Thinking Coach - Open Group Distinguished Certified IT Architect - Healthcare AI Consultant
6 å¹´Thank you. Very well written even for my non mathematical background. You do highlight very clearly the different ways the model is built, but at the end of the day, once the model is there isn't there a similarity in using it to classify, predict or cluster new data?
Experienced Entrepreneurial Leader | Pioneering the Future of Artificial Intelligence (AI), High-Performance Computing (HPC) and Quantum Computing (QC) | Keynote Speaker | Major of Swiss Army in Electronic Warfare (Ret.)
7 å¹´I like your analysis...
Additional Director at Central Pollution Control Board
7 å¹´thanks