Jonathan Ko, “Gaussian Process for Dynamic Systems”, PhD Thesis, University of Washington, 2011.
Bayes filter equation in Eq. 4.1 (p.34) has a typo (should be , not )
- part is dynamics model, describing how the state evolves in time based on the control input (p.34)
- part is observation model, describing the likelihood of making an observation given the state
- GP-BayesFilter improves these two parts.
The dynamics model maps the state and control to the state transition . So, the training data is
The observation model maps from the state to the observation . So, the training data is
The resulting GP dynamics and observation models are (p.44)
4 Filtering and State Estimation with Gaussian
4.2 Related Work
The key novelty in this work is to not only consider the mean of the previous states, but also their uncertainty. … The drawback is that it requires large amounts of training data which may not be available for highly complex, high dimensional systems.
4.3 GP Bayesian FIlters
4.3.1 GP Dynamics and Observation Models
- a sampling from the dynamics and observations of the system
- assumption that it is representative of the system, that is, that the training data covers those parts of the state space that are visited during normal operation
- explore the behavior of GP models when this is not the case in Section 4.4.3 （？？看并总结这小节）
- Dynamics model maps the state and control, , to the state transition
- Observation model maps from the state, , to the observation, .
GP-BayesFilters represent models for vectorial outputs by learning a different GP for each output dimension. Since the output dimensions are now independent of each other, the resulting noise covariances are diagonal matrices. （没有使用multi-output GP）
(p.45) We call the combination of GP and parametric models Enhanced-GP (EGP) models.
Essentially, EGP models learn the residual output after factoring the contributions of the parametric model.
We thus conjucture that GP-BayesFilters are most useful when high accuracy is needed or for difficult to model dynamical systems.
We extend work on GP latent variable models to handle cases in which no ground truth is available for GP input values. （为什么这里强调是 input values？）
5 Learning Latent States with GPs
- temporal sequences of observations
- and control inputs
- along with partial information about the underlying state of the system
- Determine a state sequence that best matches the above inputs.
- These states are then used along with the control and observations to learn a GP-BayesFilter.
(p.74) optimizing over both the latent space and the hyperparameters .
likelihood function 和 standard GP 的完全一样，只是优化变量多了 。
(p.74) requires a good initialization to avoid local maxima. Typically, such initializations are done via PCA or Isomap.
6 System Control with Gaussian Processes
6.1 Gaussian Process for Reinforcement Learning
6.1.3 GP Reinforcement Learning
GP-based simulator + RL component
Once the parameters of a GP are learned from training data, the GP can be used to simulate the evolution of the dynamics process.
训练好的GP-KF中的预测部分，其实就是动力学模型，所以可以直接拿来作 simulator 生成 RL 在计算过程中需要的仿真轨道。