What Uncertainties Do We Need to capture in Deep Learning? [with code]
https://www.researchgate.net/figure/Illustration-of-epistemic-and-aleatoric-uncertainty_fig3_358723173

What Uncertainties Do We Need to capture in Deep Learning? [with code]

There are two major types of uncertainty one can model

  • Aleatoric uncertainty captures noise inherent in the observations. This could be for example sensor noise or motion noise, resulting in uncertainty which cannot be reduced even if more data were to be collected. Aleatoric uncertainty draws its name from the Latin root aleatorius: the incorporation of chance into the creation. It describes randomness arising from the data generating process itself. This noise that cannot be eliminated by simply drawing more data.
  • Epistemic uncertainty accounts for uncertainty in the model parameters, uncertainty which captures our ignorance about which model generated our collected data, which can be explained away given enough data. Epistemic uncertainty is derived from the Greek root epistēmē, which refers to knowledge about knowledge. It measures our ignorance of the correct prediction arising from our ignorance of the correct model parameters.

Understanding what a model does not know is a critical part of many machine learning systems

Bayesian deep learning can capture:

  • Epistemic uncertainty: formalized as probability distributions over model parameters
  • Aleatoric uncertainty: formalized as probability distributions over model outputs

No alt text provided for this image
Aleatoric and epistemic uncertainty models are not mutually exclusive. The combination is able to achieve new state-of-the-art results.

Aleatoric uncertainty can further be categorized into:?

  • homoscedastic uncertainty: which stays constant for different inputs.?
  • heteroscedastic uncertainty: which depends on the inputs to the model, with some inputs potentially having more noisy outputs than others.

For example, Homoscedastic regression assumes constant observation noise σ for every input point x. Heteroscedastic regression, on the other hand, assumes that observation noise can vary with input x.

Approaches:

To capture epistemic uncertainty in a neural network (NN) we put a prior distribution over its weights, for example a Gaussian prior distribution: W samples from N (0, I). Bayesian neural networks replace the deterministic network’s weight parameters with distributions over these parameters, and instead of optimizing the network weights directly we average over all possible weights (marginalization). Bayesian inference is used to compute the posterior over the weights p(W|X, Y). This posterior captures the set of plausible model parameters, given the data. We can captured model uncertainty by approximating the distribution p(W|X, Y) via Dropout variational inference, a practical approach for approximate inference in large and complex models. This inference is done by training a model with dropout before every weight layer, and by also performing dropout at test time to sample from the approximate posterior (Monte Carlo dropout).

We can model Heteroscedastic aleatoric uncertainty just by changing our loss functions. Because this uncertainty is a function of the input data, we can learn to predict it using a deterministic mapping from inputs to model outputs. For example in regression, the model predicts not only a mean vale?y^?but also a variance?σ2. Similarly, Homoscedastic aleatoric uncertainty can be modeled in a similar way, but the uncertainty parameter will no longer be a model output, but a free parameter we optimize.

Conclusion?

  • Epistemic uncertainty is important for Safety-critical applications, because epistemic uncertainty is required to understand examples which are different from training data.
  • Aleatoric uncertainty is important for large data situations, where epistemic uncertainty is explained away and Real-time applications, where we can form aleatoric models without expensive Monte Carlo samples.


??Summary

?? Epistemic uncertainty:?

  • Uncertainty in the model parameters
  • Formalized as probability distributions over model parameters
  • Can be explained away given enough data
  • Instead of learning specific weight values, the Bayesian approach learns weight distributions, from which we can sample to produce an output for a given input.
  • The model produces a different output each time we call it with the same input, since each time a new set of weights are sampled from the distributions to construct the network and produce an output.
  • The less certain the mode weights are, the more variability (wider range) we will see in the outputs of the same inputs


???Aleatoric uncertainty:

  • Noise inherent in the observations
  • Formalized as probability distributions over model outputs
  • Cannot be reduced even if more data were to be collected
  • Create a probabilistic NN by letting the model output a distribution.
  • For example, model the output as a IndependentNormal distribution, with learnable mean and variance parameters.
  • The output is a distribution, and we can use its mean and variance to compute the confidence intervals (CI) of the prediction


References:

  1. Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. In?Advances in neural information processing systems?(pp. 5574-5584).
  2. Uncertainty: a Tutorial https://blog.evjang.com/2018/12/uncertainty.html
  3. Probabilistic Bayesian Neural Networks: Code https://keras.io/examples/keras_recipes/bayesian_neural_networks/


Thank you

Aysegul Ucar

Professor at Firat University

5 年

A Brief Survey and an Application of Semantic Image Segmentation for Autonomous Driving

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了