All Models Are Wrong, But Some Are Useful: Navigating the Statistical Minefield
Sidharth Mahotra
Senior Principal Data and Computer Vision Scientist | IEEE member | Career Coach
The most that can be expected from any model is that it can supply a useful approximation to reality: All models are wrong; some models are useful - George Box [1]
The Map Is Not the Territory
This quote, by the famous statistician George Box, is a reminder that all models are simplifications of reality. They can never be "the truth" if truth means a complete representation of reality. They help us navigate complex systems and make predictions, but they can never capture every nuance of the real world. To Quote from [2], "A simple [map] might be good enough to drive between cities, but you'd want something more detailed if you're trying to find your way through the countryside."
This analogy beautifully illustrates the balance between simplicity and utility in modeling. Just as different maps serve different purposes, various statistical models are designed to address specific questions or problems.
The Danger of Overconfidence
While models are invaluable tools, there's a risk in placing too much faith in them. The text warns of the "danger of actually starting to believe in them too much". When we forget this, we can make serious mistakes. For example, the financial crisis of 2007-2008 was caused in part by an overreliance on complex financial models. These models were based on the assumption that there was only a moderate correlation between mortgage failures. This assumption was valid while the property market was booming. But when conditions changed, and mortgages started failing, the models failed to predict the risk. So, the issue with the models was being based on a simplification of reality.
The Box Paradox: Wrong but Useful
George Box's other statement [3] provides some context: "The practical question is how wrong do (models) have to be to not be useful." This is a reminder that models can still be useful even if they are not perfect. The key is to use models wisely and to be aware of their limitations. This highlights an important point: a model's value lies not in its absolute accuracy, but in its ability to provide insights and guide decision-making. for example, weather models are not perfect, but they can still be used to make reasonably accurate predictions about the weather.
The Evolution of the Concept
The idea that models are inherently imperfect isn't new. In 1960, Georg Rasch observed that no models are ever truly correct, not even Newton's laws of physics. He emphasized that models should be evaluated based on their applicability for a given purpose, not on their absolute truth
领英推荐
The Paradox of Model Complexity
Interestingly, there's a paradox in model building known as Bonini's paradox or Valéry's paradox. It suggests that as models become more complete and accurate, they also become less understandable and thus less useful for reasoning . This underscores the need for balance between accuracy and simplicity in model design. This paradox is evident in the progression from simple perceptron to complex deep neural networks (DNNs). While DNNs achieve impressive predictive accuracy, their intricate architectures often obscure the reasoning behind their predictions, leading to concerns about black box models and driving research into explainable AI (XAI). This highlights the ongoing challenge of balancing predictive power with model transparency and interpretability.
Conclusion: Use Models Wisely
In the end, the value of a model lies not in its perfection, but in its utility. As statisticians and data scientists, our challenge is to create models that, while inevitably "wrong" in some sense, provide meaningful insights and guide effective decision-making. Here are a few tips for using models wisely :
Citations:
[2] Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data.