登录查看更多内容

Maybe you should be using Ordinary Least Squares Regression

Jesse Russell, PhD

Senior Data & Research Scientist | Samsara | Ex-Meta

发布日期: 2024年12月22日

Recently, I was curious about a simple question: How does Ordinary Least Squares (OLS) regression hold up against more complex models?

Here's what I tested and what I found.

OLS is sometimes recommended as a starting point, and it’s easy to overlook just how robust and competitive it can be—even when assumptions like linearity and normality don’t fully hold. More importantly, complex models like Random Forests and SVMs need to earn their place by proving they’re significantly better.

To put this to the test, I ran an experiment on datasets of varying types and sizes:

Linear vs. Non-linear relationships
Continuous vs. Binary outcomes
Smaller (1,000 rows) vs. Larger (100,000 rows) datasets

I compared OLS against Logistic Regression, Random Forests, SVMs, and SVR (Support Vector Regression), using appropriate metrics for binary and continuous outcomes.

The Results

Speed:

OLS was consistently the fastest model, even on 100,000 rows.
SVMs and Random Forests slowed down significantly as the data scaled.

领英推荐

How to Deal with Multicollinearity?

Mohammad Arshad 2 年前

Elastic Net Regression: Combining Both Ridge & Lasso

Shakil Khan 5 个月前

Correlation, causation and vector autoregressions

Andrey Chirikhin 1 年前

Performance:

On linear data, OLS matched or outperformed more complex models.
On non-linear data, Random Forests performed best, but OLS still delivered reasonable results.
Even for binary outcomes, OLS produced meaningful coefficients and predictions, often close to logistic regression.

Key Takeaways

OLS is more than a starting point—it’s a benchmark. Before adding complexity, ask: “Is my model meaningfully better than OLS?”
Simplicity often wins. Like Occam’s razor, if a complex model doesn’t outperform OLS, it might not justify the extra cost.
OLS is robust. Even when assumptions don’t fully hold, it delivers interpretable and competitive results.

Why does this matter?

In a world of increasingly complex models, it’s easy to forget that sometimes the simplest approach is also the best. Whether you’re working with millions of rows or imperfect data, OLS remains a reliable, fast, and effective tool. Start simple. Benchmark with OLS. If the complex model can’t beat it, maybe it’s not needed at all.

Check out the notebook here: https://colab.research.google.com/drive/1aLWiKd3g1MqNdF6p8LOuz6oeUTXUriea?usp=sharing

#MachineLearning #DataScience #Regression #Benchmarking #OccamsRazor #Efficiency

Cole Napper

VP Research & Innovation | People Analytics, Workforce Planning, & Talent Intelligence | Directionally Correct - #1 People Analytics Podcast & Substack Newsletter | Prolific Author, Writer, Speaker | HR Tech Advisor

3 个月

“The more you know, the less you use” one of my favorite stats quotes about OLS regression

1 次回应

Manpreet(Manny) Sidhu

Cloud Strategy & AGI Applied Data Science Leader | Author | Speaker | Mentor

3 个月

Insightful article, will be interesting to see OLS applied to time series data.

2 次回应

查看更多评论

要查看或添加评论，请登录

Jesse Russell, PhD的更多文章

Balancing Risk and Return with Python: A Simple Approach ????

2024年12月27日

Balancing Risk and Return with Python: A Simple Approach ????

How do you decide where to invest your money? For portfolio managers, it’s not just about picking “winners”—it’s about…
?? Why OLS Regression is as Essential as Butter on Toast ??

2024年9月30日

?? Why OLS Regression is as Essential as Butter on Toast ??

One powerful tool for understanding statistical relationships is Ordinary Least Squares (OLS) Regression. It’s a…
From Stream Hopping to AI Mastery: Unlocking the Power of Reinforcement Learning

2024年9月16日

From Stream Hopping to AI Mastery: Unlocking the Power of Reinforcement Learning

Ever wondered how Spotify or Apple Music seem to know exactly what song you’ll love next? It’s not magic—it’s…
?? Understanding Neural Networks: Beyond the Mystery ??

2024年9月3日

?? Understanding Neural Networks: Beyond the Mystery ??

Neural networks might seem like black boxes, shrouded in mystery. But with a bit of exploration, they become clearer.
?? Create Your Own Custom Text Summarization Tool with Python!

2024年8月17日

?? Create Your Own Custom Text Summarization Tool with Python!

Recently, I explored how to build a custom text summarization tool using Python, which can use AI to summarize research…

1 条评论
I built a chatbot using a custom data source, powered by LlamaIndex, OpenAI, and Streamlit, and so can you!

2024年5月27日

I built a chatbot using a custom data source, powered by LlamaIndex, OpenAI, and Streamlit, and so can you!

Follow these steps to chat with the text of War and Peace or any other data source. Screenshot of the app Background I…
Race, Equity, and Ethics Questions on Child Welfare and Predictive Analytics

2017年3月23日

Race, Equity, and Ethics Questions on Child Welfare and Predictive Analytics

Predictive analytics has been heralded as a new solution to child protection efforts. This webinar considers how race…

1 条评论
Demographics, policy, and foster care rates; A Predictive Analytics Approach

2016年1月1日

Demographics, policy, and foster care rates; A Predictive Analytics Approach

Individual, family, and community-level factors have been suggested as explanations of foster care entry rates and…
Reflective Decision-Making

2015年5月23日

Reflective Decision-Making

This study (https://psycnet.apa.

2 条评论
Three Things You Can Do Now to Make Your Goals More Meaningful in 2015

2015年1月13日

Three Things You Can Do Now to Make Your Goals More Meaningful in 2015

Social and human services agencies and programs often set goals for themselves at the beginning of the year. In…

See all articles

Maybe you should be using Ordinary Least Squares Regression

Jesse Russell, PhD

Senior Data & Research Scientist | Samsara | Ex-Meta

The Results

领英推荐

Key Takeaways

Why does this matter?

Jesse Russell, PhD的更多文章

社区洞察

其他会员也浏览了

Effective XGBoost by Matt Harrison

L1, L2 Regularization – Why needed/What it does/How it helps?

A Tutorial on Ridge and Lasso Regression

Machine Learning Unveils House Price Predictions!

Interpreting the Intercept in a Regression Model

Stats vs ML

Simple Linear Regression

The Model-Exposed: Getting to know Linear Regression

CRRao: A Julia Statistical Modeling Package for All

Use SGD Regressor When Data is Constantly Updating: A Comprehensive Guide to Stochastic Gradient Descent in Machine Learning

The Results

领英推荐

Key Takeaways

Why does this matter?

Jesse Russell, PhD的更多文章

Balancing Risk and Return with Python: A Simple Approach ????

?? Why OLS Regression is as Essential as Butter on Toast ??

From Stream Hopping to AI Mastery: Unlocking the Power of Reinforcement Learning

?? Understanding Neural Networks: Beyond the Mystery ??

?? Create Your Own Custom Text Summarization Tool with Python!

I built a chatbot using a custom data source, powered by LlamaIndex, OpenAI, and Streamlit, and so can you!

Race, Equity, and Ethics Questions on Child Welfare and Predictive Analytics

Demographics, policy, and foster care rates; A Predictive Analytics Approach

Reflective Decision-Making

Three Things You Can Do Now to Make Your Goals More Meaningful in 2015

社区洞察

其他会员也浏览了

Effective XGBoost by Matt Harrison

L1, L2 Regularization – Why needed/What it does/How it helps?

A Tutorial on Ridge and Lasso Regression

Machine Learning Unveils House Price Predictions!

Interpreting the Intercept in a Regression Model

Stats vs ML

Simple Linear Regression

The Model-Exposed: Getting to know Linear Regression

CRRao: A Julia Statistical Modeling Package for All

Use SGD Regressor When Data is Constantly Updating: A Comprehensive Guide to Stochastic Gradient Descent in Machine Learning