Applied Data Science: Regularization Techniques
Perfect Accuracy on Training Set but Poor Results on Test Set? Try these regularization techniques to tackle Overfitting
L1 and L2 Regularization: This deals with adding a regularization term to loss function to arrive at the cost function instead of simply equating loss function to cost function. This results in compression of weights/ coefficients leading to simplification of models with lesser flexibility.
Cost function = Loss function + Regularization term
L1: when regularization term is a linear combination of absolute value to weights/ coefficients
L2: when regularization term is a linear combination of square of weights/ coefficients
Data Augmentation: Sometimes, simply having more training data can help reduce overfitting. However, labeled data can be costly. One way to get around is by transforming existing training data to generate more labelled data (also known as data augmentation) that can be used for training the model.
Early Stopping: This requires cross-validation using test data after each training step and stopping the training early on once model performance starts to deteriorate on the validation data.
Dropout/ max_depth/ #_leaves: This strategy deals with reducing flexibility/ complexity of the model by setting appropriate hyperparameters in the algorithm. For example, max_depth, #_leaves, #_levels can be hyperparameters in the decision tree based ensemble algorithms. Similarly, dropout is useful in reducing flexibility in deep learning algorithms.
Note: In order to avoid overfitting, ensure that the resulting model is not over-simplified, else, it may lead to high bias. More on bias-variance trade off and its relation with overfitting...in the next article.
What are your thoughts on overfitting? How do you handle it?