登录查看更多内容

Piecewise Linear Regression

Srivatsan Srinivasan

Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)

发布日期: 2019年2月17日

Real world data is not always linear. Many cases it is very difficult to fit a line and get an perfect model on non linear and non monotonic datasets. While one can resort to complex models like SVM, Trees or even Neural Network, it comes with cost of interpret-ability and explain-ability

Is there a middle ground that can be used when decision boundary are not very complex?. Answer is in title.

Piecewise regression breaks data into individual segments and fits a linear regression within each segment. Location where one segment ends and other begins are called break points.

Let’s take a very simple dataset for illustration below and visualize output of Linear and Piecewise linear regression.

Refer to my repo for code on piecewise regression and plots above – https://github.com/srivatsan88/piecewise-regression/blob/master/piecewise_linear_regression.ipynb

If you check plot above linear fit results in larger standard error compared to piecewise fit. Piecewise plot above might look to be overfitting, while it is not. This technique generalizes well on new data points. In this case we segment the data point to 3 buckets and fit regression line within each segment

Piecewise works by finding optimal set of breakpoints that minimizes sum of square error. Within break point least square fit is used that minimizes sum of squared error. In case of problem with large number of segments multi start gradient based search is used to speed up detection of optimal break points.

Piecewise linear function can reduce model bias by segmenting on key decision variables and is used in highly regulated business cases like credit decisions and risk based simulation where model explain-ability is mandatory

How to use piecewise function?

Typical linear regression model expects relationship between independent and dependent variables to be linear. Piecewise can be considered as model within your final linear model that can segment your non linear variables to linear decision boundary

Using piecewise independent non linear variables is broken down into intervals and each interval is introduced as separate features into underlying linear regression models

There are other methods for dealing with non-linearity like polynomial function but in order to model variables with complicated structure one typically end up features of higher degree polynomial. This might result in unstable models.

References

https://en.wikipedia.org/wiki/Piecewise_linear_function

https://jekel.me/piecewise_linear_fit_py/

Piecewise Linear Regression

Srivatsan Srinivasan

Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)

更多精彩文章

社区洞察

其他会员也浏览了

Frequentist vs. Bayesian Thinking: Why Human Intelligence is Different from Artificial.

Regression: From Theory to ML

Linear Regression and Logistic Regression in Machine Learning

Understanding Bayesian Classification

BxD Primer Series: Spectral Clustering Models

Class 19 - REGRESSION Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Choosing the Right Time Series Model: A Blend of Data Science, Statistics, and Financial Understanding.

BxD Primer Series: Prophet Time Series Models

Expected Time of Arrival Predictor

Different Types of Machine Learning Algorithms Popular Machine Learning Algorithms

Journey into Data Science - Year of Learning Together

2020年9月8日

How to build a compelling data science portfolio?

2020年5月19日

AIEngineering - Inside Story

2020年2月18日

Course Launch - Scaling and Accelerating Machine Learning Models

2020年2月4日

Skill up on new age data technologies

2019年12月17日

Business and Data Understanding in Data Science Lifecycle

2019年11月18日

Data, Artificial Intelligence and Cloud Trends for 2020 and Beyond

2019年10月29日

Docker and Kubernetes for Data Science

2019年10月16日

A Day in the life of Data Analyst

2019年10月7日

How to stand out in Data Science Interview?

2019年10月1日