登录查看更多内容

Linear Regression (Less Linear Than You Might Think)

Dr. Bernd Fritzke

AI @ DekaBank | Speaker

发布日期: 2024年2月12日

For a very long time I associated Linear Regression with fitting a straight line (or hyperplane in higher dimensions) to a number of data points as shown below:

Obviously, the line represents a linear relation between x and f(x), thus the name "Linear Regression" comes at no surprise. The line is a model of the underlying relationship and is fitted such, that it is "close" to the sample data according to the usual square error. The resulting linear model can be used to predict the value of f(x) for x-values not contained in the sample data set.

However, at some point I learned that linear regression can also look like this

or like this

and even like this:

What is going on here? Apart from the first case, these are definitely non-linear relationships between the underlying variable x and the function f(x) used for fitting the data.

The Explanation

We can solve this apparent paradox by looking at the following characterization (AI-generated but approved by yours truly):

Linear regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables, focusing on minimizing the sum of squared differences between observed and predicted values. Despite the name, linear regression can model both linear and non-linear relationships. The "linear" aspect refers to the model being linear in its coefficients, meaning it can incorporate variables in forms like polynomials f(x) = ax^4 + bx^3 + cx^2 +dx +e while remaining linear with respect to the coefficients a,b,c,d,e. Thus, linear regression applies to a wide range of scenarios, including those with non-linear data relationships, as long as the equation is linear in the parameters.

That sums it up nicely and, luckily, linear regression problems can be solved directly using standard matrix operations (like inverse and pseudoinverse) which I just verbally describe below since since linkedin does not exactly excel at displaying formulas.

Solving Linear Regression Problems

In linear regression, we deal with a system of equations derived from our data, where each data point contributes one equation. Here, n represents the number of data points, and m refers to the number of coefficients (including the intercept) we need to estimate to define our linear model.

When n>m, meaning we have more data points than coefficients, the system is overdetermined. In such cases, it's impossible to find a perfect fit for all data points due to inherent data noise. The objective shifts towards finding coefficients that minimize the error between the observed and predicted values, a process known as least squares minimization.

The solution involves using the pseudoinverse, a method that allows for the estimation of the most suitable coefficients under these conditions. This approach ensures that our linear regression model is as accurate and reliable as possible, capable of making predictions on new data while acknowledging the limitations imposed by having more data points than parameters to estimate.

Choice of Features

In linear regression, selecting appropriate features is critical for model accuracy and prediction quality. Any feature that combines linearly with coefficients is viable, offering the flexibility to model complex relationships. This includes not just polynomial features, which capture non-linear patterns through powers of variables, but also functions like sine, cosine, exponential, and logarithmic. These can model behaviors like cycles and growth patterns while adhering to the linear regression principle.

Feature selection should be informed by data analysis and domain expertise to ensure model relevance. Properly chosen features can significantly enhance model performance, whether by capturing the essence of the data with polynomial terms or by modeling specific behaviors with functional transformations. The balance is crucial: too simple a model may miss underlying patterns (under-fitting), while an overly complex model risks fitting noise rather than signal (over-fitting).

Revisiting the Examples

To reiterate a key point, linear regression is only required to be linear in the coefficients, but not in the so-called "features". For example, if we try to model a data set using the quadratic formula

f(x) = ax^2 +bx +c,

领英推荐

What are the assumptions that are needed to be take…

Korvage Information Technology 1 年前

Logistic Regression: Predicting Outcomes with Data

Dr. Tuhin Banik 5 个月前

Lasso Regression: A Game-Changer for Feature Selection

Shakil Khan 5 个月前

we have three coefficients a,b,c. We also have one quadratic feature (x^2), one linear feature (x) and one constant feature (1). As described above, every sample data pair (x,y) gives rise to one equation

y = ax^2 +bx +c.

100 data points (like in the examples shown) thus result in a highly overdetermined equation system with 100 equations and only three coefficients where we can compute a least-squares solution using the pseudoinverse.

Below, you see the same examples as in the beginning of the article but supplemented with the usually unkown "ground truth" function (green line) which was used to generate the data before Gaussian noise was added. Please note that for each example an "appropriate" model was chosen which is usually not possible since the relation of x and y is in practice unknown. Unsurprisingly, the results of the linear regression are excellent in all cases. The following three examples are based on polynomials of degree 1,2, and 3:

This example below is special since the generating function is the trigonometric sine function, in particular the ground truth function was f(x) = 3sin(2x). Knowing this, we chose the feature to be also sine and used the correct internal factor 2, i.e. we took sin(2x) as feature. That lead to a nearly perfect result of the linear regression identifying the coefficient value 3 very closely.

Please note that this is a very constructed example. If we use a different frequency of the sine wave, e.g. sin(10x), the linear regression results in a huge fitting error:

Similar problems occur with a phase shift like sin(2x+pi/2):

Automatically determining parameters like frequency and phase in this example exceeds the capabilities of linear regression, requiring advanced non-linear optimization or search methods that are beyond the scope of this article.

Confusing (but Common) Names for Special Cases of Linear Regression

In many books one can find the Term "Quadratic Regression" for cases where a quadratic function like f(x) = ax^2 + bx + c is fitted to the data using linear regression.

Similarly we can find the term "Cubic Regression" for cases where a cubic polynomial like f(x) = ax^3 + bx ^2 + cx +d is fitted to the data using linear regression.

And with "Polynomial Regression" many texts denote the general case that a polynomial function of any degree is used to model the relationship between x and y.

All these cases are still linear regression since f(x) is a linear combination (with coefficients a,b, ...) of the potentially no-linear features. The often shown linear regression using a polynomial of of degree 1 is just a special case, where linear regression is done using a linear model.

Summary

Linear regression computes an error-minimizing linear superposition of arbitrary (linear or non-linear!) functions ("features") to model the relation underlying the given data. The choice of functions determines the goodness of fit with the training data and the ability to predict values for new data. A good choice usually requires knowledge or assumptions of the phenomenon which generated the data.

Note: A good alternative for learning non-linear relations in unknown data are often neural networks who can model quite different relations in different parts of the data space.

Pascal Friedmann

Marketing Procurement @teamAMEX

1 年

Not fallen for it since my econometrics courses in grad school ?? Very good article!

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Bernd Fritzke的更多文章

AI Agents: To RAG or not to RAG?

2025年1月24日

AI Agents: To RAG or not to RAG?

?? AI agents are transforming industries—but can they reach their full potential without Retrieval-Augmented Generation…

2 条评论
KI vs. Adventskranz: Ein epischer Kampf

2025年1月10日

KI vs. Adventskranz: Ein epischer Kampf

Manchmal offenbaren kleine Beispiele die erstaunlichen Beschr?nkungen aktueller KI-Systeme. Im Dezember wollte ich ein…

9 条评论
Public Key Kryptographie und die Rolle von Zertifizierungsstellen (Certificate Authorities, CAs) bei der Verschlüsselung im Internet

2024年9月1日

Public Key Kryptographie und die Rolle von Zertifizierungsstellen (Certificate Authorities, CAs) bei der Verschlüsselung im Internet

(überarbeitete Version nach Feedback in den Kommentaren) In Zeiten von KI-generierten Falschinformationen immer…

4 条评论
Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

2024年8月20日

Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

Introduction Running a k-means algorithm on your numerical data is a common first step to get a compact representation…
The "Magical" Ingredient of LLMs: Vector Embeddings

2024年6月30日

The "Magical" Ingredient of LLMs: Vector Embeddings

In the field of machine learning, vector embeddings have emerged as a central component of large language models…

1 条评论
A Bird's Eye View of AI

2024年2月4日

A Bird's Eye View of AI

AI is more than a singular technological marvel; it's a symphony of capabilities that replicate and extend human…

4 条评论
Basic Machine Learning: The k-Nearest Neighbor (k-NN) Classifier

2024年1月28日

Basic Machine Learning: The k-Nearest Neighbor (k-NN) Classifier

Introduction Recently, I was in the situation to explain the foundations of machine learning to a group of students…

2 条评论
A Brief History Of AI (part 3)

2023年12月22日

A Brief History Of AI (part 3)

part 1, part 2 2017 Transformers ("Attention is all you Need") Illustration of a transformer model focusing its…
A Brief History Of AI (part 2)

2023年12月22日

A Brief History Of AI (part 2)

part 1, part 3 1997: IBM's Deep Blue defeats world chess champion Garry Kasparov. In 1997, a landmark event in the…
A Brief History Of AI (part 1)

2023年12月22日

A Brief History Of AI (part 1)

part 2, part 3 Sometimes, in a single day, there are so many reports on groundbreaking AI discoveries and novel…

See all articles

Linear Regression (Less Linear Than You Might Think)

Dr. Bernd Fritzke

AI @ DekaBank | Speaker

The Explanation

Solving Linear Regression Problems

Choice of Features

Revisiting the Examples

领英推荐

Summary

Dr. Bernd Fritzke的更多文章

社区洞察

其他会员也浏览了

Logistic Regression: Basics, Obscurities and its Membership as a Classifier

Evaluation of logistic regression model ( Must read for all )

The Day, Linear Regression fails - Example 1

Linear Regression(mostly asked questions) #manralai_top30