登录查看更多内容

Regularization in Neural Networks

Leonardo Calderon J.

Development Lead, Endava

发布日期: 2021年5月28日

The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen. Generalization refers to how well the concepts learned by a machine learning model can apply to specific examples not seen by the model when it was learning. It is bound by the two undesirable outcomes, high bias, and high variance. Detecting whether the model suffers from either one is the sole responsibility of the model developer.

When we talk about how well a machine learning model learns and generalizes to new data, namely overfitting and underfitting. These are the two biggest causes of the poor performance of machine learning algorithms.

Underfitting: high bias and low variance.

Symptoms:

High training error.
Training error close to testing error.
High bias.

Refers to a model that can neither model the training data nor generalizes to new data. Techniques to reduce underfitting :

Increase model complexity.
Increase the number of features, performing feature engineering.
Remove noise from the data.
Increase the number of epochs or increase the duration of training to get better results.

Overfitting: High variance and low bias.

Symptoms:

Very low training error.
Training error much lower than testing error.
High variance.

Refers to a model that models the training data too well. Happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. Techniques to reduce overfitting (regularization):

Reduce model complexity (L1 regularization).
Reduce model complexity (L2 regularization).
Use dropout for neural networks to tackle overfitting (Dropout).
Increase training data (Data Augmentation).
Stopping during the training phase (Early stopping). Have an eye over the loss over the training period as soon as loss begins to increase stop training.

L1 regularization

Adds the absolute value of the magnitude of coefficient as penalty term to the cost. Weights decrease because it assumes that smaller weights give simpler models.

cost = loss + (lambda/2m) x

sum(L1_norm(w))

Pros:

Robustness, better than L2 because the loss of outliers increases linearly.
Sparsity. Zeros out coefficients, which leads to a sparse model. Can be used for feature selections, unimportant ones have zero coefficients.

Cons:

Weights may be reduced to zero here, which can be positive if we want to compress our model.

L2 regularization

Adds squared magnitude of coefficient as penalty term to the cost. Weights decrease because it assumes that smaller weights give simpler models.

cost = loss + (lambda/2m) x sum(L2_norm(w)^2)

Pros:

Robustness. The loss of outliers increases exponentially.
Sparsity. Will produce small values for almost all coefficients.

Cons:

Weights may be reduced to zero here, which can be positive if we want to compress our model.

Dropout

Randomly selects some nodes and removes them along with all of their incoming and outgoing connections. Each iteration has a different set of nodes and this results in a different set of outputs. Dropped neurons are not updated in backward training.

Pros:

Robustness. The final model can fit very well.

Cons:

Complexity for implementation, time, and memory consumption.
Should be used together with other normalization techniques that constrain the parameters to simplify the learning rate selection procedure.

Data Augmentation

Dealing with images, increase the size of the training data by rotating the image, flipping, scaling, shifting, etc.

Pros:

Effectiveness, using more data can train better models

Cons:

Expensive in time and resources.

Early stopping

When we see that the performance on the validation set is getting worse, stop the training on the model. This is known as early stopping.

Pros:

Very simple
Highly recommended to use for all training along with other techniques.

Cons:

It May does not work so well, it may be possible that after the defined limit, the model starts improving again.

https://www.kaggle.com/getting-started/166897

https://www.ibm.com/cloud/learn/underfitting

https://www.ibm.com/cloud/learn/overfitting

https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization

Regularization in Neural Networks

Leonardo Calderon J.

Development Lead, Endava

Underfitting: high bias and low variance.

Overfitting: High variance and low bias.

L1 regularization

L2 regularization

Dropout

Data Augmentation

Early stopping

更多精彩文章

社区洞察

其他会员也浏览了

Recurrent Neural Networks in Deep Learning — Part 1

The Depths of Neural Networks: Fractal Pattern Classification

Neural networks

Neural Networks

How to work with Autoencoders ?

Delving Deeper: Neural Networks

Understanding Backpropagation: A Deep Dive into Neural Networks

How to Build and Use Neural Networks

The Intriguing Challenges of Neural Network Optimization

Deep dive into Recurrent Neural Networks(RNNs)

Underfitting: high bias and low variance.

Overfitting: High variance and low bias.

L1 regularization

L2 regularization

Dropout

Data Augmentation

Early stopping

Bitcoin prediction with LSTM

2021年9月17日

A new activation function for Neural Networks - logmoid

2021年5月12日

The unwritten laws (part 1d)

2020年7月18日

The unwritten laws (part 1c)

2020年7月14日

The unwritten laws (part 1b)

2020年7月4日

The unwritten laws (part 1)

2020年7月3日

A brief of how we built translate-transcription service for Skillshare.

2020年6月25日

De Euclides a G?del

2020年5月22日

Think about this next time you press enter in your internet browser!!!

2020年4月13日

Internet of Things (IoT)

2020年3月9日

社区洞察

其他会员也浏览了

Recurrent Neural Networks in Deep Learning — Part 1

The Depths of Neural Networks: Fractal Pattern Classification

Neural networks

Neural Networks

How to work with Autoencoders ?

Delving Deeper: Neural Networks

Understanding Backpropagation: A Deep Dive into Neural Networks

How to Build and Use Neural Networks

The Intriguing Challenges of Neural Network Optimization

Deep dive into Recurrent Neural Networks(RNNs)