登录查看更多内容

Linear Regression. Making Sense Of The Future Based On The Past.

Adriaan Stander

Head of Software Engineering - Digital Banking

发布日期: 2020年2月9日

When considering your vehicles fuel consumption as a predictor of how far you can travel once the empty sign goes on is a natural part of our lives. In fact, we probably assume that a vehicle should have this feature no questions asked. But what in fact is happening behind the scenes? How is the car deciding what to show you as an output?

Based on the last trip? Well that could be biased. You could have been driving at 120mph or hauling a caravan or driving backwards for 200 miles. Whatever it is you were up to, that is not enough.

Using linear regression, the computer can look at your last X number of trips, incorporate some basic attributes of the journey and infer a predictive model to estimate the distance that you can travel based on your previous behaviors. Now if you have been speeding mostly, your consumption will be higher, but should average back out once you start adhering to the speed limits.

But how does this work? Lets give it a go.

What is Linear Regression

Linear regression is a statistical method that is aimed at modeling the relationship between independent variables X and a dependent variable Y in a linear fashion (Preacher, et al., 2006). The outcome is a formula that given an input or set of input attributes, can predict an output value that is in line with the derived linear plot. It does not however guarantee that all training elements will fall on the derived line using the formula, and thus it is important to consider error measures between the observed dataset and predicted values to fine tune the algorithm (Hyndman & Koehler, 2006).

(Chakure, 2019)

Mathematically, what does it look like?

Below is a sample of what to expect when considering a linear regression formula with either one or more attributes and associated contributing weights.

(SuperDataScience, 2018)

What does this mean for me?

A simple example could be salary expectations based on experience as per below graph, which could be linear in nature such that one would expect to have a basic salary when you start, and then based on the number of years in the field, you would see linear increase. Alternatively, the experience can be represented by a kernel method that takes multiple features such as years of experience, specific field, geo-location, training and skillsets to produce an experience number which is then plotted to the salary expectations.

(SuperDataScience, 2018)

So many inputs, which to use?

Multiple input variables can greatly increase the accuracy and correlation of the prediction with the actual observed value but can also increase the complexity of the algorithm. Factors to consider is storage of inputs, calculation complexity and computational time. It is thus important to use feature selection that would pick out those attributes that offer the most value in terms of prediction (Ludwig, et al., 2015). Feature selection allows for simplification of visualization and facilitates a better understanding of the data, it reduces the complexity of the algorithm and possibly reduces the curse of dimensionality and improves prediction computational timings (Guyon & Elisseeff, 2003). The M5 algorithm uses an approach that uses standard deviation reduction and looks to leaf value approximations by linear regression models. It also improves the predictions by smoothing out the process and creating smaller models. The M5 Prime algorithm is an improvement on the original M5 by accommodating for missing values and managing enumerable features. It has been used in practical areas such as streamflow prediction, modeling sediment yield, approximating the breakwater scour depths and predicting concrete performance of compressive strength (Díaz, et al., 2017).

How do you know which models fit best?

Error metrics are used to establish the validity of the proposed model predictions against the observed values. Measures such as the Mean Absolute Error and Mean Square Error are aimed at understanding the overall error of the model whereas the correlation is for understanding the relationship between the predictions and observations with zero indicating no relationship and one or minus one a strong relationship.

(Pascual, 2019)

(mathsisfun, 2018)

References

Chakure, A., 2019. Types of Linear Regression. [Online] Available at: https://hackernoon.com/types-of-linear-regression-w4o227s5[Accessed 08 Feb 2020].

Díaz, I. et al., 2017. Machine learning applied to the prediction of citrus production. Spanish Journal of Agricultural Research, 15(2), pp. 1-12.

Guyon, I. & Elisseeff, A., 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research, Volume 3, p. 1157–1182.

Hyndman, R. J. & Koehler, A. B., 2006. Another look at measures of forecast accuracy. International Journal of Forecasting, Volume 22, p. 679–688.

Ludwig, N., Feuerriegel, S. & Neumann, D., 2015. Putting Big Data analytics to work: Feature selection for forecasting electricity prices using the LASSO and random forests. Journal of Decision Systems, 24(1), pp. 1-28.

mathsisfun, 2018. Correlation. [Online] Available at: https://www.mathsisfun.com/data/correlation.html [Accessed 08 Feb 2020].

Pascual, C., 2019. Tutorial: Understanding Regression Error Metrics in Python. [Online] Available at: https://www.dataquest.io/blog/understanding-regression-error-metrics/[Accessed 08 Feb 2020].

Preacher, K. J., Curran, P. J. & Bauer, D. J., 2006. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis. Journal of Educational and Behavioral Statistics, 31(3), pp. 437-448.

SuperDataScience, 2018. Regression & Classification - Logistic Regression. [Online] Available at: https://www.superdatascience.com/blogs/regression-classification-logistic-regression[Accessed 08 Feb 2020].

Anne Gieg

Front Desk Receptionist at Assisted Living Brightwater

5 年

Wow!

1 次回应

John Cotner

Glazier/fabricator at Self Employed

5 年

Yes hurricane predications? Yes seems it would be very helpful to get the general. Ha error metrics,mean absolute error and mean square error. ????????

1 次回应

Raj Kumar D

Sr.Technical Architect @ Confidential | Data Analytics, Business Analysis, AI/ML

5 年

Good informative post

2 次回应

查看更多评论

要查看或添加评论，请登录

Adriaan Stander的更多文章

Bot Programming - Where the low level performance is at

2025年1月8日

Bot Programming - Where the low level performance is at

During the 2024 festive season break, I was fortunate enough to take part in the CodinGame winter challenge. The aim of…

4 条评论
I am a Long Term Investor

2024年6月22日

I am a Long Term Investor

I met a friend for coffee recently. We haven’t seen each other in a while.

2 条评论
What do you mean an Internal Developer Platform Culture?

2023年11月22日

What do you mean an Internal Developer Platform Culture?

TLDR; and Key Takeaways Understanding IDP Culture: An IDP culture centres around viewing development teams as internal…

4 条评论
We Are All Connected

2023年3月9日

We Are All Connected

On a bright yellow sunny afternoon in early February I visited the lovely town of Franschhoek with my family. We often…

2 条评论
Be the best version of you

2022年2月4日

Be the best version of you

Yesterday, my 8 year old son had his first school athletics event in, well since ever for him. He is now Grade 3, and…

5 条评论
Scaling agile the right way

2022年1月29日

Scaling agile the right way

Equipped with a tool that works seemingly well at small scale in very specific settings, how does one scale the success…
Why you shouldn't "Let me show you"

2021年11月1日

Why you shouldn't "Let me show you"

THE LITTLE BOY, by Helen E. Buckley Once a little boy went to school.
From Why to Why Not

2021年7月21日

From Why to Why Not

“Why do we have to change”, “Why can’t we keep doing what we use to do?”, “Why is this happening?”, “Why me?”, “Why…
Your Data Lake On Amazon S3

2020年8月9日

Your Data Lake On Amazon S3

In God we trust. All others must bring data.
Four Simple Rules To Live By

2020年5月31日

Four Simple Rules To Live By

A seemingly very simple list, but one that can make a world of difference in your life. And they apply no matter who…

27 条评论

See all articles

Linear Regression. Making Sense Of The Future Based On The Past.

Adriaan Stander

Head of Software Engineering - Digital Banking

What is Linear Regression

Mathematically, what does it look like?

What does this mean for me?

So many inputs, which to use?

How do you know which models fit best?

References

Adriaan Stander的更多文章

社区洞察

其他会员也浏览了

Understanding Gradient Descent in Linear Regression.

What is a Logit Function and Why Use Logistic Regression?

A Tutorial on Ridge and Lasso Regression

Understanding Linear Regression

Multicollinearity in Regression Analysis

Understanding Multiple Linear Regression: A Comprehensive Guide

Regression: Evaluation Metrics/Loss Functions

Simple Linear Regression

New Aspects to consider while moving from Simple Linear Regression to Multiple Linear Regression

A Dive into Logistic Regression

What is Linear Regression

Mathematically, what does it look like?

What does this mean for me?

So many inputs, which to use?

How do you know which models fit best?

References

Adriaan Stander的更多文章

Bot Programming - Where the low level performance is at

I am a Long Term Investor

What do you mean an Internal Developer Platform Culture?

We Are All Connected

Be the best version of you

Scaling agile the right way

Why you shouldn't "Let me show you"

From Why to Why Not

Your Data Lake On Amazon S3

Four Simple Rules To Live By

社区洞察

其他会员也浏览了

Understanding Gradient Descent in Linear Regression.

What is a Logit Function and Why Use Logistic Regression?

A Tutorial on Ridge and Lasso Regression

Understanding Linear Regression

Multicollinearity in Regression Analysis

Understanding Multiple Linear Regression: A Comprehensive Guide

Regression: Evaluation Metrics/Loss Functions

Simple Linear Regression

New Aspects to consider while moving from Simple Linear Regression to Multiple Linear Regression

A Dive into Logistic Regression