BxD Primer Series: Ridge Regression Models and L2 Regularization in general
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Ridge?Regression Models and use of L2 regularization in other models. Let’s get started:
The What:
Ridge Regression is used to refer specifically to linear regression with L2 regularization, while the term ‘L2 regularization’ is used more generally to refer to the use of L2 penalty terms in any model that involves optimization with a linear sum of parameters.
Ridge Regression Model is used to?handle multicollinearity in a dataset. Multicollinearity is a situation where two or more independent variables in a regression model are highly correlated with each other. This can lead to unstable and unreliable coefficient estimates, making it difficult to interpret the results.
Ridge Regression addresses this issue by?introducing a regularization term?to the least squares objective function of linear regression. This regularization term is a penalty term that shrinks the coefficient estimates towards zero, reducing their variance and improving their stability.
The How:
The regularization term in Ridge Regression is known as the L2 penalty. It is defined as the sum of the squares of the magnitude of the coefficients or weights:
L2 penalty = alpha * (sum of?square?of weights)
Here, alpha is a hyperparameter that controls the strength of the penalty. A higher value of alpha will result in more shrinkage of the coefficients towards zero, reducing their variance but increasing their bias. A lower value of alpha will result in less shrinkage, increasing the variance but reducing the bias.
Cost function with L2 penalty can be written as:
In case of Ridge Regression, original cost function is usually selected to be Mean Squared Error. Hence the cost function will be:
The optimal values of the weights and alpha are found by minimizing the cost function. The optimization of this cost function can be done using various methods, including closed-form solutions and iterative methods such as gradient descent.
The penalty term acts as a constraint on the magnitude of the coefficients. A penalty that increases as the magnitude of the coefficients increases, encourages the model to prioritize smaller and more stable coefficients, since larger coefficients will result in a higher penalty and a higher overall cost.
By minimizing the combined cost function, the model finds the values of the coefficients that balance the tradeoff between fitting the data well and keeping the coefficients stable. This results in a more robust and generalized model that is less likely to overfit the data.
The Why:
Some common reasons for using Ridge regression or L2 regularization are:
The Why Not:
Note on scaling of features: Ridge Regression is sensitive to the scaling of the features. If the features are not on the same scale, the regularization penalty will affect them differently. It is important to scale the features appropriately to avoid this issue.
Time for you to help in return:
In next coming posts, we will be covering two more types of regression models - lasso, elastic net in similar format. Post that we will move to decision tree based models.
Let us know your feedback!
Until then,
Enjoy life in full ??
Founding Partner - BUSINESS x DATA
1 年If you prefer email updates, visit here: https://anothermayank.substack.com #substack