Linear Regression: Concepts, Notation, and Overfitting
Notation Key
New Insights
To find relationships between different variables to get new patterns.
Regression Framework
Vector X? | Y?
Dimension m : | :
Data : | :
X? | Y?
---+---
X | Y? ← Scalar
Regressor/predictor: ? = g(X)
Goal: Build function g so when a new X comes in, we can output predicted value ?. X → [g] → ?
We need to learn a good g from the data.
Objective Function
Where:
Overfitting Warning
If it's an arbitrary curve (a line that goes through all data points, so the error is 0), we can't believe it. This is called overfitting, which we need to avoid since it leads to nonsensical conclusions.
[Graph showing overfitting: A wiggly line that perfectly passes through all points, versus a simpler linear fit]
领英推荐
Linear Regression Model
We prohibit g to be arbitrary general, restrict to limited class of predictors:
Within linear regression, we restrict to the class of predictors that are linear in the attributes in the X vector.
When a person comes in with X? up to X?, then form a linear combination of these attributes.
Choice of predictor β = curve in 2D β determines the location of that line.
The slope here would be β?. By playing with β, we can move the line around.
Residuals
Residual is an error between predicted value and the observed value.
We need to find θ so the sum of squared residuals is as small as possible:
Any line gives certain numerical value for the sum of squared residuals. This is called ordinary least squares (OLS).
Assumptions of Linear Regression
Example Application
Linear regression software produces the coefficients that multiply the ads expenditure over different channels.
-0.001 is unusual as it suggests the more you spend, the lower the sales.
Simple linear regression example: Sales = 12.35 + 0.055(Newspaper)
0.055 contradicts with -0.001, so which one is true?