[Discussion] [Research] Variational Bayesian Inference vs Monte-Carlo Dropout for Uncertainty Quantification in DL
- Regarding variational methods, I’m not really sure these are a panacea either. Bayes by backdrop etc normally make heavy independence assumptions to make things tractable.
- To me, uncertainty estimates in deep learning are still really open problems.
- forthispost96: I was fortunate enough to talk to a few ML Scientists from Deepmind who were working on similar content and even they said implementing these Bayesian networks becomes a hassle.
- You could say that enough samples would cover your estimated approximate posterior, but this is certainly not the ‘true’ posterior. MC-Dropout makes strong assumptions on the distributional form of this approximation, and the objective it minimises is only questionably Bayesian (https://arxiv.org/abs/1807.01969). (未读)
- MC-Dropout is at best a variational method.
- Currently there’s push back into how well suited (mean field) variational methods are (including MC-Dropout) to neural networks (https://arxiv.org/abs/1909.00719). （未读）
- I’d float an alternative method you didn’t mention: ensembles (https://arxiv.org/abs/1612.01474). These often outperform variational methods and are arguably simpler and more scalable, e.g. (https://arxiv.org/abs/1906.01620). Finally some work explores the connection between ensembles and Bayesian posteriors (https://arxiv.org/abs/1810.05546).
TF blog: Regression with Probabilistic Layers in TensorFlow Probability
- Note that in this example we are training both P(w) and Q(w). This training corresponds to using Empirical Bayes or Type-II Maximum Likelihood. We used this method so that we wouldn’t need to specify the location of the prior for the slope and intercept parameters, which can be tough to get right if we do not have prior knowledge about the problem. Moreover, if you set the priors very far from their true values, then the posterior may be unduly affected by this choice. A caveat of using Type-II Maximum Likelihood is that you lose some of the regularization benefits over the weights. If you wanted to do a proper Bayesian treatment of uncertainty (if you had some prior knowledge, or a more sophisticated prior), you could use a non-trainable prior (see Appendix B).
RG: Is it possible to get confidence intervals in LSTM forecasting?
- use simulations of multiple predictions to then calculate the prediction intervals
- predict the parameters of a predefined distribution
- predict forecast quantiles directly: Amazon’s MQ-RNN forecaster uses this approach (check this)
Gaussian process 的重要组成部分——关于那个被广泛应用的Kernel的林林总总
Latent variable model
An Introduction to Latent Variable Models
Stochastic (partial) differential equations and Gaussian processes, Simo Särkkä, Aalto University, Finland
Gaussian Process Regression using Spectral Mixture Kernel in GPflow