Regularization in Neural Networks
https://www.ibm.com/cloud/learn/underfitting

Regularization in Neural Networks

The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen. Generalization refers to how well the concepts learned by a machine learning model can apply to specific examples not seen by the model when it was learning. It is bound by the two undesirable outcomes, high bias, and high variance. Detecting whether the model suffers from either one is the sole responsibility of the model developer.

When we talk about how well a machine learning model learns and generalizes to new data, namely overfitting and underfitting. These are the two biggest causes of the poor performance of machine learning algorithms.

Underfitting: high bias and low variance.

Symptoms:

  • High training error.
  • Training error close to testing error.
  • High bias.

Refers to a model that can neither model the training data nor generalizes to new data. Techniques to reduce underfitting :

  1. Increase model complexity.
  2. Increase the number of features, performing feature engineering.
  3. Remove noise from the data.
  4. Increase the number of epochs or increase the duration of training to get better results.


Overfitting: High variance and low bias.

Symptoms:

  • Very low training error.
  • Training error much lower than testing error.
  • High variance.

Refers to a model that models the training data too well. Happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. Techniques to reduce overfitting (regularization):

  1. Reduce model complexity (L1 regularization).
  2. Reduce model complexity (L2 regularization).
  3. Use dropout for neural networks to tackle overfitting (Dropout).
  4. Increase training data (Data Augmentation).
  5. Stopping during the training phase (Early stopping). Have an eye over the loss over the training period as soon as loss begins to increase stop training.


L1 regularization

Adds the absolute value of the magnitude of coefficient as penalty term to the cost. Weights decrease because it assumes that smaller weights give simpler models.

cost = loss + (lambda/2m) x
sum(L1_norm(w))

Pros:

  • Robustness, better than L2 because the loss of outliers increases linearly.
  • Sparsity. Zeros out coefficients, which leads to a sparse model. Can be used for feature selections, unimportant ones have zero coefficients.

Cons:

  • Weights may be reduced to zero here, which can be positive if we want to compress our model.


L2 regularization

Adds squared magnitude of coefficient as penalty term to the cost. Weights decrease because it assumes that smaller weights give simpler models.

cost = loss + (lambda/2m) x sum(L2_norm(w)^2)

Pros:

  • Robustness. The loss of outliers increases exponentially.
  • Sparsity. Will produce small values for almost all coefficients.

Cons:

  • Weights may be reduced to zero here, which can be positive if we want to compress our model.


Dropout

Randomly selects some nodes and removes them along with all of their incoming and outgoing connections. Each iteration has a different set of nodes and this results in a different set of outputs. Dropped neurons are not updated in backward training.

Pros:

  • Robustness. The final model can fit very well.

Cons:

  • Complexity for implementation, time, and memory consumption.
  • Should be used together with other normalization techniques that constrain the parameters to simplify the learning rate selection procedure.


Data Augmentation

Dealing with images, increase the size of the training data by rotating the image, flipping, scaling, shifting, etc.

Pros:

  • Effectiveness, using more data can train better models

Cons:

  • Expensive in time and resources.


Early stopping

When we see that the performance on the validation set is getting worse, stop the training on the model. This is known as early stopping.

Pros:

  • Very simple
  • Highly recommended to use for all training along with other techniques.

Cons:

  • It May does not work so well, it may be possible that after the defined limit, the model starts improving again.


https://www.kaggle.com/getting-started/166897

https://www.ibm.com/cloud/learn/underfitting

https://www.ibm.com/cloud/learn/overfitting

https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization


要查看或添加评论,请登录

社区洞察

其他会员也浏览了