Data modeling culture versus algorithmic modeling culture
Leo Breiman (1) wrote an interesting article about the two cultures in the use of statistical modeling to reach conclusions from data: the data modeling culture and the algorithmic modeling culture. The differences between these two models have recently come to the fore again in a discussion between Noam Chomsky (2) and Google's director of AI research, Peter Norvig (3). To recap these two cultures:
The difference between these two approaches us that the conclusions made by data modeling are about the model, not about the nature of phenomena.
Usually, simple parametric models (from data modeling culture) imposed on data generated by complex systems result in a loss of accuracy and information as compared to algorithmic models.
Leo Breiman (2001)
Breiman argues that data modeling culture has some limitations, such as its (sometimes) low accuracy, the inability to present a clear picture of nature’s mechanism when we have complex data, and the reasonable doubt about whether the chosen statistical model is the one that best reflects the nature of the phenomenon. Chomsky opposes the algorithmic model in his discussion because the function it produces is difficult to understand,which, in his opinion, makes no sense. He would rather think that the model used to explain this data must be relatively simple. Norving says that reality is messier and "we shouldn't accept a theoretical framework that places a priority on making the model simple over making it accurately reflect reality."
The majority of data science today is actually based on a culture of data modeling, perhaps as a result of statisticians' influence on the development of machine learning technologies. But there is evidence today that data science is becoming more and more an empirical science.?
"But if a method works, it should not be abandoned nor dismissed just because theorists haven’t yet figured out how to explain it".
领英推荐
Yann LeCun
Director of AI Research at Facebook and Professor at NYU
As a conclusion, data modeling culture can be very useful for a large set of problems. But it is not possible to ignore the evidence showing that machine learning technologies are becoming more empirical. It is challenging to comprehend all of the complex processes that form nature with the data modeling culture because this method produces statistical parameters rather than a thorough comprehension of the phenomena.
"This web of life, the most complex system we know of in the universe, breaks no law of physics, yet is partially lawless, ceaselessly creative."
Stuart Kauffman
Professor of Biological Sciences, Physics, Astronomy, University of Calgary
References: