Daily Reading Logs (Clean up regularly)

Posted on 2020-06-30 Edited on 2022-09-01 In Others Views: Disqus:

Academics

chrisorm:
- Regarding variational methods, I’m not really sure these are a panacea either. Bayes by backdrop etc normally make heavy independence assumptions to make things tractable.
- To me, uncertainty estimates in deep learning are still really open problems.
forthispost96: I was fortunate enough to talk to a few ML Scientists from Deepmind who were working on similar content and even they said implementing these Bayesian networks becomes a hassle.
margaret_spintz:
- You could say that enough samples would cover your estimated approximate posterior, but this is certainly not the ‘true’ posterior. MC-Dropout makes strong assumptions on the distributional form of this approximation, and the objective it minimises is only questionably Bayesian (https://arxiv.org/abs/1807.01969). (未读)
- MC-Dropout is at best a variational method.
- Currently there’s push back into how well suited (mean field) variational methods are (including MC-Dropout) to neural networks (https://arxiv.org/abs/1909.00719). （未读）
- I’d float an alternative method you didn’t mention: ensembles (https://arxiv.org/abs/1612.01474). These often outperform variational methods and are arguably simpler and more scalable, e.g. (https://arxiv.org/abs/1906.01620). Finally some work explores the connection between ensembles and Bayesian posteriors (https://arxiv.org/abs/1810.05546).

Note that in this example we are training both P(w) and Q(w). This training corresponds to using Empirical Bayes or Type-II Maximum Likelihood. We used this method so that we wouldn’t need to specify the location of the prior for the slope and intercept parameters, which can be tough to get right if we do not have prior knowledge about the problem. Moreover, if you set the priors very far from their true values, then the posterior may be unduly affected by this choice. A caveat of using Type-II Maximum Likelihood is that you lose some of the regularization benefits over the weights. If you wanted to do a proper Bayesian treatment of uncertainty (if you had some prior knowledge, or a more sophisticated prior), you could use a non-trainable prior (see Appendix B).

use simulations of multiple predictions to then calculate the prediction intervals
predict the parameters of a predefined distribution
predict forecast quantiles directly: Amazon’s MQ-RNN forecaster uses this approach (check this)