Piecewise Linear Regression
Source: wikipedia

Piecewise Linear Regression

Real world data is not always linear. Many cases it is very difficult to fit a line and get an perfect model on non linear and non monotonic datasets. While one can resort to complex models like SVM, Trees or even Neural Network, it comes with cost of interpret-ability and explain-ability

Is there a middle ground that can be used when decision boundary are not very complex?. Answer is in title.

Piecewise regression breaks data into individual segments and fits a linear regression within each segment. Location where one segment ends and other begins are called break points.

Let’s take a very simple dataset for illustration below and visualize output of Linear and Piecewise linear regression.

Refer to my repo for code on piecewise regression and plots above – https://github.com/srivatsan88/piecewise-regression/blob/master/piecewise_linear_regression.ipynb

If you check plot above linear fit results in larger standard error compared to piecewise fit. Piecewise plot above might look to be overfitting, while it is not. This technique generalizes well on new data points. In this case we segment the data point to 3 buckets and fit regression line within each segment

Piecewise works by finding optimal set of breakpoints that minimizes sum of square error. Within break point least square fit is used that minimizes sum of squared error. In case of problem with large number of segments multi start gradient based search is used to speed up detection of optimal break points.

Piecewise linear function can reduce model bias by segmenting on key decision variables and is used in highly regulated business cases like credit decisions and risk based simulation where model explain-ability is mandatory

How to use piecewise function?

Typical linear regression model expects relationship between independent and dependent variables to be linear. Piecewise can be considered as model within your final linear model that can segment your non linear variables to linear decision boundary

Using piecewise independent non linear variables is broken down into intervals and each interval is introduced as separate features into underlying linear regression models

There are other methods for dealing with non-linearity like polynomial function but in order to model variables with complicated structure one typically end up features of higher degree polynomial. This might result in unstable models.

References

https://en.wikipedia.org/wiki/Piecewise_linear_function

https://jekel.me/piecewise_linear_fit_py/

要查看或添加评论,请登录

社区洞察

其他会员也浏览了