0%

Gal, Yarin, “Uncertainty in Deep Learning,” Doctor of Philosophy, University of Cambridge, 2016.

# 全文的主要贡献

(p15) We will thus concentrate on the development of practical techniques to obtain model confidence in deep learning, techniques which are also well rooted within the theoretical foundations of probability theory and Bayesian modelling. Specifically, we will make use of stochastic regularisation techniques (SRTs).

These techniques adapt the model output stochastically as a way of model regularisation (hence the name stochastic regularisation). This results in the loss becoming a random quantity, which is optimised using tools from the stochastic non-convex optimisation literature. Popular SRTs include dropout [Hinton et al., 2012], multiplicative Gaussian noise [Srivastava et al., 2014], dropConnect [Wan et al., 2013], and countless other recent techniques4,5.

# 作者对 NN 的一些讨论

## CNN

Convolutional neural networks (CNNs). CNNs [LeCun et al., 1989; Rumelhart et al., 1985] are popular deep learning tools for image processing, which can solve tasks that until recently were considered to lie beyond our reach [Krizhevsky et al., 2012; Szegedy et al., 2014]. The model is made of a recursive application of convolution and pooling layers, followed by inner product layers at the end of the network (simple NNs as described above). A convolution layer is a linear transformation that preserves spatial information in the input image (depicted in figure 1.1). Pooling layers simply take the output of a convolution layer and reduce its dimensionality (by taking the maximum of each (2, 2) block of pixels for example). The convolution layer will be explained in more detail in section §3.4.1.

## RNN

Recurrent neural networks (RNNs). RNNs [Rumelhart et al., 1985; Werbos, 1988] are sequence-based models of key importance for natural language understanding, language generation, video processing, and many other tasks [Kalchbrenner and Blunsom, 2013; Mikolov et al., 2010; Sundermeyer et al., 2012; Sutskever et al., 2014].

## PILCO

PILCO [Deisenroth and Rasmussen, 2011], for example, is a data-efficient probabilistic model-based policy search algorithm. PILCO analytically propagates uncertain state distributions through a Gaussian process dynamics model. This is done by recursively feeding the output state distribution (output uncertainty) of one time step as the input state distribution (input uncertainty) of the next time step, until a fixed time horizon T.

## 与 GP 的关系

(p14) Even though modern deep learning models used in practice do not capture model confidence, they are closely related to a family of probabilistic models which induce probability distributions over functions: the Gaussian process. Given a neural network, by placing a probability distribution over each weight (a standard normal distribution for example), a Gaussian process can be recovered in the limit of infinitely many weights (see Neal [1995] or Williams [1997]). For a finite number of weights, model uncertainty can still be obtained by placing distributions over the weights—these models are called Bayesian neural networks.

# Monte Carlo integration

Assuming $\theta_i$ is sampled from the distribution $p(\theta|D)$, the Monte Carlo integreation formula is:

$\mathbb{E}_{\theta\sim p(\theta|D)}[g(\theta)] = \int g(\theta) p(\theta|D) d\theta \approx \frac{1}{n} \sum_{\theta_i\sim p(\theta|D)} g(\theta_i) + O(\sqrt{n})$

The following discussion provides a very clear interpretation about Bayes Inferencing, but I’m not sure it’s exact and 100% correct. Need to do more reading.
Can a posterior expectation be used as a approximate for the true (prior) expectation?

# MCMC 是否适用于大规划问题，有上千个参数的问题

MCMC应用的概率模型，其参数维数往往巨大，但每个参数的支撑集非常小。比如一些NLP问题的参数只取{0,1}，但维数往往达到几千甚至上万左右，这正说明了MCMC更适用这些问题。

# 学习资料

[ref-2] daniel-D, 从随机过程到马尔科夫链蒙特卡洛方法 （不太好，讲得比较混乱）

[ref-3] 靳志辉, LDA-math-MCMC 和 Gibbs Sampling （我从这里开始仔细看算法，细致平稳条件）

[ref-4] shenxiaolu1984, 蒙特卡洛-马尔科夫链(MCMC)初步 （简要介绍了4种采用方法，具体算法的公式挂了）

[ref-7] 随机模拟-Monte Carlo积分及采样（详述直接采样、接受-拒绝采样、重要性采样） （讲了 Monte Carlo 积分与几种常见的采样方式的解释比较直观和深刻。MCMC 的主要作用之一是用来 支持 Monte Carlo 积分，其中涉及到了对某概率 $f(x)$ 的采样。）

[ref-8] Bin的专栏, 随机采样方法整理与讲解（MCMC、Gibbs Sampling等） （推荐。基本是最正确的理解顺序。）

[ref-9] 再谈MCMC方法

[ref-wiki-MCMC] Markov chain Monte Carlo（未看）

[ref-wiki-Gibbs]

# 我的整理

• 随机过程
• Markov 性，无后效性
• Markov Chain 的极限和平稳分布
• 概率分布的采样，数值方法

steinwart_support_2008

## Statistical Learning Theory 的本质

• assuming that the output value y to a given x is stochastically generated by P( · |x) accommodates the fact that in general the information contained in x may not be sufficient to determine a single response in a deterministic manner.
• assuming that the conditional probability P( · |x) is unknown contributes to the fact that we assume that we do not have a reasonable description of the relationship between the input and output values.

## SVM 和 GP 的关系

For a brief description of kernel ridge regression and Gaussian processes, see Cristianini and Shawe-Taylor (2000, Section 6.2).

We refer to Wahba (1999) for the relationship between SVMs and Gaussian processes.

Vallado, D. A., Fundamentals of Astrodynamics and Applications, New York, NY: The McGraw-Hill Companies, Inc., 1997.

# 积累的一些小软件

Language Switcher 自定义打开软件时使用的语言环境设置
~~TinkerTool 设置Eclipse的系统相关字体大小 ~~（2019/03开始放弃使用Eclispe了）
QBlocker 防止误操作关闭。有时候会失效
~~清歌输入法 使用还算可以的五笔输入法~~

HyperSwitch 增强切换窗口

# 放弃解决的一些问题

## brack_inorbit_2017

In-Orbit Tracking of High Area-to-Mass Ratio Space Objects

## Singla, Puneet. 2016. “Certain Thoughts on Uncertainty Analysis for Dynamical Systems.” Department of Mechanical and Aerospace Engineering, University of Texas at Arlington, August 17. http://lairs.eng.buffalo.edu/wiki/images/a/ac/SinglaTalk.pdf.

The fusion of observational data with numerical simulation promises to provide greater understanding of physical phenomenon than either approach alone can achieve.

The most critical challenge here is to provide a quantitative assessment of how closely our estimates reflect reality in the presence of model uncertainty as well as measurement errors and uncertainty.

Uncertainty Propagation: Nonlinear Systems

• Approximate Solution to exact problem: Multiple-model estimation method, Unscented Kalman Filter (UKF), Monte Carlo (MC) methods.
• Exact solution to approximate problem: Extended Kalman Filter (EKF), Gaussian closure, Stochastic Linearization…

Fokker-Planck-Kolmogorov equation (FPKE)

With sufficient number of Gaussian components, any pdf can be approximated as closely as desired.

# 神经网络工具箱学习摘要

both shallow and deep NN

• classification
• regression (这里主要是针对这部分功能的学习摘要，其它网络还有很多不同的内容)
• clustering
• dimensionality reducetion
• time-series forecasting: long short-term memory (LSTM) deep learning networks
• dynamic system modeling and control

For small training sets, you can quickly apply deep learning by performing transfer learning with pretrained deep network models (GoogLeNet, AlexNet, VGG-16, and VGG-19) and models from the Caffe Model Zoo. (什么东西？)