登录查看更多内容

All Models Are Wrong, But Some Are Useful: Navigating the Statistical Minefield

Sidharth Mahotra

Senior Principal Data and Computer Vision Scientist | IEEE member | Career Coach

发布日期: 2024年10月27日

The most that can be expected from any model is that it can supply a useful approximation to reality: All models are wrong; some models are useful - George Box [1]

The Map Is Not the Territory

This quote, by the famous statistician George Box, is a reminder that all models are simplifications of reality. They can never be "the truth" if truth means a complete representation of reality. They help us navigate complex systems and make predictions, but they can never capture every nuance of the real world. To Quote from [2], "A simple [map] might be good enough to drive between cities, but you'd want something more detailed if you're trying to find your way through the countryside."

This analogy beautifully illustrates the balance between simplicity and utility in modeling. Just as different maps serve different purposes, various statistical models are designed to address specific questions or problems.

The Danger of Overconfidence

While models are invaluable tools, there's a risk in placing too much faith in them. The text warns of the "danger of actually starting to believe in them too much". When we forget this, we can make serious mistakes. For example, the financial crisis of 2007-2008 was caused in part by an overreliance on complex financial models. These models were based on the assumption that there was only a moderate correlation between mortgage failures. This assumption was valid while the property market was booming. But when conditions changed, and mortgages started failing, the models failed to predict the risk. So, the issue with the models was being based on a simplification of reality.

The Box Paradox: Wrong but Useful

George Box's other statement [3] provides some context: "The practical question is how wrong do (models) have to be to not be useful." This is a reminder that models can still be useful even if they are not perfect. The key is to use models wisely and to be aware of their limitations. This highlights an important point: a model's value lies not in its absolute accuracy, but in its ability to provide insights and guide decision-making. for example, weather models are not perfect, but they can still be used to make reasonably accurate predictions about the weather.

The Evolution of the Concept

The idea that models are inherently imperfect isn't new. In 1960, Georg Rasch observed that no models are ever truly correct, not even Newton's laws of physics. He emphasized that models should be evaluated based on their applicability for a given purpose, not on their absolute truth

领英推荐

How to Deal with Multicollinearity?

Mohammad Arshad 2 年前

Elastic Net Regression: Combining Both Ridge & Lasso

Shakil Khan 1 个月前

Machine Learning and Stochastic Models for Predicting…

Adam Darmanin 3 个月前

The Paradox of Model Complexity

Interestingly, there's a paradox in model building known as Bonini's paradox or Valéry's paradox. It suggests that as models become more complete and accurate, they also become less understandable and thus less useful for reasoning . This underscores the need for balance between accuracy and simplicity in model design. This paradox is evident in the progression from simple perceptron to complex deep neural networks (DNNs). While DNNs achieve impressive predictive accuracy, their intricate architectures often obscure the reasoning behind their predictions, leading to concerns about black box models and driving research into explainable AI (XAI). This highlights the ongoing challenge of balancing predictive power with model transparency and interpretability.

Conclusion: Use Models Wisely

In the end, the value of a model lies not in its perfection, but in its utility. As statisticians and data scientists, our challenge is to create models that, while inevitably "wrong" in some sense, provide meaningful insights and guide effective decision-making. Here are a few tips for using models wisely :

Remember that the models are not perfect. Do not expect any model to be a perfect representation of reality. While looking at the model [4], consider its longevity and whether it has withstood scrutiny over time. Evaluate if the model accurately represents reality or relies on abstractions. Determine the model's versatility and its applicability across multiple disciplines. Investigate the model's origins and whether it stems from fundamental scientific or mathematical concepts. Additionally, check if the model is based on foundational principles that are self-evident and do not require further justification, as those relying on infinite regress often have limited practical use
Use models for what they are good for. Models can be used to make predictions, to understand relationships between variables, and to test hypotheses.
Be aware of the limitations of models. Models are based on assumptions and simplifications. These assumptions and simplifications may not always be valid.
Use multiple models. Do not rely on a single model to make decisions. Use multiple models to get a more complete picture. Different models may capture various aspects of a system, offering a more comprehensive view. Moreover, combining results of multiple models can increase the robustness of the predictions .
Use common sense. Do not let models override your common sense. If a model is telling you something that does not make sense, then it is probably wrong.

Citations:

[1] https://en.wikipedia.org/wiki/All_models_are_wrong

[2] Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data.

[3] https://www.kdnuggets.com/2019/06/all-models-are-wrong.html

[4] https://fs.blog/all-models-are-wrong/

要查看或添加评论，请登录

Sidharth Mahotra的更多文章

Understanding Class Imbalance in Real-World Applications

2024年11月13日

Understanding Class Imbalance in Real-World Applications

Class imbalance occurs when the distribution of classes in a dataset is significantly skewed. While this is a natural…

2 条评论
Deep Dive: Feature Engineering - The Art & Science Behind ML Success

2024年11月9日

Deep Dive: Feature Engineering - The Art & Science Behind ML Success

?? What is Feature Engineering? It's the process of transforming raw data into features that better represent the…
Understanding Sensitivity vs Specificity: From Medical Diagnostics to Drug Discovery

2024年11月3日

Understanding Sensitivity vs Specificity: From Medical Diagnostics to Drug Discovery

The intersection of healthcare and artificial intelligence has revolutionized how we approach both medical diagnostics…

2 条评论
Isolation Forest: Unmasking Anomalies in Your Data

2024年10月24日

Isolation Forest: Unmasking Anomalies in Your Data

In the era of big data, identifying anomalies is like finding a needle in a haystack. These outliers, often indicative…
Regression to the Mean

2024年10月22日

Regression to the Mean

We all know about linear regression, but have you heard of "regression to the mean"? Galton's Discovery This…
Critical Challenges in Modern Machine Learning

2024年10月21日

Critical Challenges in Modern Machine Learning

As we stand at the frontier of artificial intelligence, the landscape of machine learning isn't just evolving—it's…
Debunking the Myth: Noise Reduction & Smoothing - Not Just for DSP Folks!

2024年1月29日

Debunking the Myth: Noise Reduction & Smoothing - Not Just for DSP Folks!

Think noise reduction and smoothing are just for Audio engineers and DSP Aficionados? Think again! Data scientists and…

2 条评论
The Silent Code Chronicles: Why We Skip Comments (and How Figstack Saves Us)

2024年1月17日

The Silent Code Chronicles: Why We Skip Comments (and How Figstack Saves Us)

Ever heard your code whisper, "Why did you do that?" Yeah, we ALL know the struggle of cryptic code and missing…

See all articles

All Models Are Wrong, But Some Are Useful: Navigating the Statistical Minefield

Sidharth Mahotra

Senior Principal Data and Computer Vision Scientist | IEEE member | Career Coach

The Map Is Not the Territory

The Danger of Overconfidence

The Box Paradox: Wrong but Useful

The Evolution of the Concept

领英推荐

The Paradox of Model Complexity

Conclusion: Use Models Wisely

Sidharth Mahotra的更多文章

社区洞察

其他会员也浏览了

How to deal with Multicollinearity?

Predicting Credit Risk Using Machine Learning

Understanding P-values is essential for improving regression models

Fixed-Latency Models

Correlation, causation and vector autoregressions

Q. How to choose the best-fit among various Statistical Models ?

Advanced Analytics in Market Microstructure Analysis: Unlocking Hidden Insights

Cross-Validation: A Crucial Step Towards Robust Machine Learning Models

Look-ahead bias

Harnessing Pattern Recognition

The Map Is Not the Territory

The Danger of Overconfidence

The Box Paradox: Wrong but Useful

The Evolution of the Concept

领英推荐

The Paradox of Model Complexity

Conclusion: Use Models Wisely

Sidharth Mahotra的更多文章

Understanding Class Imbalance in Real-World Applications

Deep Dive: Feature Engineering - The Art & Science Behind ML Success

Understanding Sensitivity vs Specificity: From Medical Diagnostics to Drug Discovery

Isolation Forest: Unmasking Anomalies in Your Data

Regression to the Mean

Critical Challenges in Modern Machine Learning

Debunking the Myth: Noise Reduction & Smoothing - Not Just for DSP Folks!

The Silent Code Chronicles: Why We Skip Comments (and How Figstack Saves Us)

社区洞察

其他会员也浏览了

How to deal with Multicollinearity?

Predicting Credit Risk Using Machine Learning

Understanding P-values is essential for improving regression models

Fixed-Latency Models

Correlation, causation and vector autoregressions

Q. How to choose the best-fit among various Statistical Models ?

Advanced Analytics in Market Microstructure Analysis: Unlocking Hidden Insights

Cross-Validation: A Crucial Step Towards Robust Machine Learning Models

Look-ahead bias

Harnessing Pattern Recognition