Maybe you should be using Ordinary Least Squares Regression
Recently, I was curious about a simple question: How does Ordinary Least Squares (OLS) regression hold up against more complex models?
Here's what I tested and what I found.
OLS is sometimes recommended as a starting point, and it’s easy to overlook just how robust and competitive it can be—even when assumptions like linearity and normality don’t fully hold. More importantly, complex models like Random Forests and SVMs need to earn their place by proving they’re significantly better.
To put this to the test, I ran an experiment on datasets of varying types and sizes:
I compared OLS against Logistic Regression, Random Forests, SVMs, and SVR (Support Vector Regression), using appropriate metrics for binary and continuous outcomes.
The Results
Speed:
领英推荐
Performance:
Key Takeaways
Why does this matter?
In a world of increasingly complex models, it’s easy to forget that sometimes the simplest approach is also the best. Whether you’re working with millions of rows or imperfect data, OLS remains a reliable, fast, and effective tool. Start simple. Benchmark with OLS. If the complex model can’t beat it, maybe it’s not needed at all.
Check out the notebook here: https://colab.research.google.com/drive/1aLWiKd3g1MqNdF6p8LOuz6oeUTXUriea?usp=sharing
#MachineLearning #DataScience #Regression #Benchmarking #OccamsRazor #Efficiency
VP Research & Innovation | People Analytics, Workforce Planning, & Talent Intelligence | Directionally Correct - #1 People Analytics Podcast & Substack Newsletter | Prolific Author, Writer, Speaker | HR Tech Advisor
3 个月“The more you know, the less you use” one of my favorite stats quotes about OLS regression
Cloud Strategy & AGI Applied Data Science Leader | Author | Speaker | Mentor
3 个月Insightful article, will be interesting to see OLS applied to time series data.