John L. Crassidis, and John L. Junkins, Optimal Estimation of Dynamic Systems, CRC Press, 2011.

Corrections to the book can be found at here.

# Chapter 2 Probability Concepts in Least Squares

## 2.6. Properties of Maximum Likelihood Estimation

Invariance Principle

Consistent Estimator

Asymptotically Gaussian Property

Asymptotically Efficient Property

## 2.7. Bayesian Estimation

### 2.7.1. MAP Estimation

Maximum a posteriori estimation (MAP)

• if the a priori distribution $p(\hat{\bm{x}})$ is uniform, then MAP estimation is equivalent to maximum likelihood (MLL) estimation.
• asymptotic consistency and efficiency
• MAP estimator converges to MLL estimator for large samples
• obeys the invariance principle

### 2.7.2. Minimum Risk Estimation

assume $\bm{x}$ is distributed by the a posteriori function $p(\bm{x}|\tilde{\bm{y}})$

assign a cost to any loss suffered due to errors in the estimate

The risk funtion is

$J_{\rm MR}(\bm{x}^*) = \int_{-\infty}^{\infty} c(\bm{x}^*|\bm{x}) p(\bm{x}|\tilde{\bm{y}}) d\bm{x} \tag{2.171}$

[WHAT’S THE RELATION TO GP ?? LOOKS LIKE A PREDICTIVE DISTRIBUTION.]

For these reasons (see p97), minimum risk approaches are often avoided in practical estimation problems, although the relationship between decision theory and optimal estimation is very interesting.

# 3. Sequential State EStimation

## 3.2. Full-Order Estimators

Ackermann’s formula

### 3.2.1. Discrete-Time Estimators

Discrete state-space representation

\begin{aligned} \bm{x}_{k+1} &= \Phi \bm{x}_k + \Gamma \bm{u}_k\\ \bm{y}_k &= H \bm{x}_k + D \bm{u}_k \end{aligned} \tag{A.122}

\begin{aligned} \hat{\bm{x}}_{k+1}^- &= \Phi \hat{\bm{x}}_k^+ + \Gamma\bm{u}_k\\ \hat{\bm{x}}_k^+ &= \hat{\bm{x}}_k^- + K [\tilde{\bm{y}}-H\hat{\bm{x}}_k^-] \end{aligned} \tag{3.20}

## 3.3. The Discrete-Time Kalman Filter

The Kalman filter provides a rigorous theoretical approach to “place” the poles of the estimator, based upon stochastic processes for the measurement error and model error.

• both discrete-time dynamic models and measurements
• both continuous-time dynamic models and measurements
• continuous-time dynamic models with discrete-time measurements

### 3.3.1. Kalman Filter Derivation

[INSERT TABLE 3.1 HERE]

$\bm{x}_{k+1} = \Phi_k \bm{x}_k + \Gamma_k \bm{u}_k + \Upsilon_k \bm{w}_k \tag{3.27a}$

$\tilde{\bm{y}}_k = H_k \bm{x}_k + \bm{v}_k \tag{3.27b}$

assuming the errors are not correlated forward or backward in time, so that

$E\{ \bm{v}_k \bm{v}_j^T \} = \begin{cases} 0 & k \neq j \\ R_k & k = j \end{cases} \tag{3.28}$

and

$E\{ \bm{w}_k \bm{w}_j^T \} = \begin{cases} 0 & k \neq j \\ Q_k & k = j \end{cases} \tag{3.29}$

$\tilde{\bm{x}}_{k+1}^- = \Phi_k \tilde{\bm{x}}_k^+ - \Upsilon_k \bm{w}_k \tag{3.33}$


where $\bm{w}_k^+$ and $\tilde{\bm{x}}_k^+$ are uncorrelated since $\tilde{\bm{x}}_{k+1}^-$ (not $\tilde{\bm{x}}_k$) directly depends on $\bm{w}_k$.

$P_{k+1}^- = \Phi_k P_k^+ \Phi_k^T - \Upsilon_k Q_k \Upsilon_k^T \tag{3.35}$

$\tilde{\bm{x}}_k^+ = (I - K_k H_k) \tilde{\bm{x}}_k^- + K_k \bm{v}_k \tag{3.37}$


### 3.3.2 Stability and Joseph’s Form

Another form for $P_{k+1}$ that ensures to be positive definite:

Using

$P_k^+ = [I-K_kH_k]P_k^-[I-k_kH_k]^T + K_kR_kK_k^T \tag{3.39}$

$P_k^+ = [I-K_kH_k]P_k^- \tag{3.44}$

Proof: see p149–151.

### 3.3.3. Information Filter and Sequential Processing

handle large measurement vectors

The information filter may be more computationally efficient than the traditional Kalman filter when the size of the measurement vector is much larger than the size of the state vector.

Another more commonly used approach is to use sequential processing.
processing one measurement at a time, then the tain and covariance are updated until all measurements at each sampling instant have been processed.

### 3.3.4. Steady-State Kalman Filter

For time-invariant systems the error covariance $P$ reaches a steady-state value very quickly.
Therefore, a constant gain $K$ can be pre-computed using the steady-state covariance.
The solution is suboptimal in the strictest sense but can save a lot in computations.

### 3.3.5. Relationship to Least Squares Estimation

The Kalman filter can be derived using a least squares type loss function

$J = \frac{1}{2}(\hat{\bm{x}}_0-\bm{x}_0)^T\mathcal{Pa}_0(\hat{\bm{x}}_0-\bm{x}_0) + \frac{1}{2} \sum_{i=1}^k (\tilde{\bm{y}}-H_i\hat{\bm{x}}_i)^T R_i^{-1} (\tilde{\bm{y}}-H_i\hat{\bm{x}}_i) \tag{3.109}$

subject to the constraint

$\hat{\bm{x}}_{i+1} = \Phi(i+1,i)\hat{\bm{x}}_i,\quad i=1,2,\dots,k-1 \tag{3.110}$

### 3.3.6. Correlated Measurement and Process Noise

Consider the correlation between $\bm{w}_{k-1}$ and $\bm{v}_k$

$E\{ \bm{w}_{k-1} \bm{v}_k^T \} = S_k \tag{3.127}$

because only $\bm{w}_{k-1}$ and $\bm{v}_k$ affect the measurement at the same time (3.27a substituted into 3.27b), as shown below,

\begin{aligned} \tilde{\bm{y}}_k &= H_k \bm{x}_k + \bm{v}_k \\ &= H_k \left( \Phi_{k-1} \bm{x}_{k-1} + \Gamma_{k-1} \bm{u}_{k-1} + \Upsilon_{k-1} \bm{w}_{k-1} \right) + \bm{v}_k. \end{aligned}

In Eq. (3.38),


the following expectation is not zero:

$E\{\tilde{\bm{x}}_k^-\bm{v}_k^T\} = E\{ \left( \Phi_{k-1}\tilde{\bm{x}}_{k-1}^+ - \Upsilon_{k-1}\bm{w}_{k-1} \right) \bm{v}_k^T \} = - E\{ \Upsilon_{k-1}\bm{w}_{k-1} \bm{v}_k^T \} = - \Upsilon_{k-1} S_k \tag{3.128}$

This leads to different covariance update

$P_{k}^+ = \dots \quad \color{red} (\text{Only use its trace to derive } K_k) \tag{3.129}$

and Kalman gain (still by minimizing the above $P_{k}^+$)

$K_k = \dots \tag{3.130}$

which could simplify Eq. (3.129) to the final covariance update formula

$P_k^+ = \dots \quad \color{red} (\text{Use this one for propagation})$

(AN APPLICATION EXAMPLE, refined by me) an aircraft flying through a field of random turbulence. The effect of turbulence in the aircraft’s acceleration is modeled as process noise on $\bm{w}_{k-1}$. Since any sensor mounted on an aircraft is also corrupted by turbulence, the measurement error $\bm{v}_k$ is correlated with the process noise $\bm{w}_{k-1}$.

### 3.3.7. Cramer-Rao Lower Bound

see wikipedia:
In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information.\

### 3.3.8. Orthogonality Principle

the orthogonality of the estimate and its error

$E\{\hat{\bm{x}}_k^+\tilde{\bm{x}}_k^{+T}\} = 0 \tag{3.152}$

where

$\tilde{\bm{x}}_k^+ \equiv \hat{\bm{x}}_k^+ - \bm{x}_k \tag{3.32}$

This orthogonality principle is extremely important in the derivation of the linear quadratic-Gaussian controller of Sec. 8.6.

### Example 3.3: derive the process noise

For the real system (where there is always a random process term):

Continuous system:

\begin{aligned} \dot{\theta} &= \tilde{\omega} - \beta - \eta_v \\ \dot{\beta} &= \eta_u \end{aligned}

Discrete system:

Let $\Delta t = t_{k+1} - t_{k}$.

\begin{aligned} \theta_{k+1} - \theta_{k} &= \int_{t_k}^{t_{k+1}} [\tilde{\omega} - \beta - \eta_v] d\tau \\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau - \int_{t_k}^{t_{k+1}} \textcolor{red}{\beta(\tau)} d\tau - \int_{t_k}^{t_{k+1}} \eta_v d\tau\\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau - \left[ \beta_k \Delta t + \int_{t_k}^{t_{k+1}} (t_{k+1} - \tau) \eta_u(\tau)d\tau \right] - \int_{t_k}^{t_{k+1}} \eta_v d\tau \quad \text{[see Eq. eg.3.3.x1]}\\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau - \beta_k \Delta t - \int_{t_k}^{t_{k+1}} \left[ \eta_v(\tau) + (t_{k+1} - \tau)\eta_u(\tau) \right] d\tau \\ \\ \beta_{k+1} - \beta_{k} &= \int_{t_k}^{t_{k+1}} \eta_u d\tau \quad\Rightarrow\quad \textcolor{red}{\beta(\tau) = \beta_k + \int_{t_k}^{\tau} \eta_u(\nu) d\nu} \\ \end{aligned}

\begin{aligned} \int_{t_k}^{t_{k+1}} \textcolor{red}{\beta(\tau)} d\tau &= \int_{t_k}^{t_{k+1}} \left[ \textcolor{red}{\beta_k + \int_{t_k}^{\tau} \eta_u(\nu) d\nu} \right] d\tau \\ &= \int_{t_k}^{t_{k+1}} \textcolor{red}{\beta_k} d\tau + \int_{t_k}^{t_{k+1}} \textcolor{red}{\int_{t_k}^{\tau} \eta_u(\tau) d\tau} d\tau \\ &= \textcolor{red}{\beta_k} \Delta t + \textcolor{magenta}{\int_{t_k}^{t_{k+1}} \int_{t_k}^{\tau} \eta_u(\nu) d\nu d\tau }\\ &= \beta_k \Delta t + \textcolor{magenta}{\int_{t_k}^{t_{k+1}} (t_{k+1} - \tau) \eta_u(\tau) d\tau } \quad \text{[see Eq. eg.3.3.x2]}\\ \end{aligned} \tag{eg.3.3.x1}

\begin{aligned} \textcolor{magenta}{\int_{t_k}^{t_{k+1}} \left[ \int_{t_k}^{\tau} \eta_u(\nu) d\nu \right] d\tau} &= \int_{t_k}^{t_{k+1}} \left[ \left[ \tau \int_{t_k}^\tau \eta_u(\nu)d\nu \right]' - \tau \eta_u(\tau) \right] d\tau \quad \text{[see Eq. eg.3.3.x3]}\\ &= \int_{t_k}^{t_{k+1}} \left[ \tau \int_{t_k}^\tau \eta_u(\nu)d\nu \right]' d\tau - \int_{t_k}^{t_{k+1}} \tau \eta_u(\tau) d\tau \\ &= t_{k+1} \int_{t_k}^{t_{k+1}} \eta_u(\nu)d\nu - \cancel{t_k \int_{t_k}^{t_k} \eta_u(\nu)d\nu} - \int_{t_k}^{t_{k+1}} \tau \eta_u(\tau) d\tau \\ &= \textcolor{magenta}{\int_{t_k}^{t_{k+1}} (t_{k+1} - \tau) \eta_u(\tau) d\tau} \end{aligned} \tag{eg.3.3.x2}

$\left[ \tau \int_{t_k}^\tau \eta_u(\nu)d\nu \right]' = \textcolor{red}{\int_{t_k}^\tau \eta_u(\nu)d\nu} + \tau \eta_u(\tau) \quad \text{[integration by parts]} \tag{eg.3.3.x3}$

[Keep for records. Only after using the following series derivation, I realized I should have used integration by parts.]

Assuming $\{ \tau_1, \tau_2, \cdots, \tau_n \}$, where $\tau_1 = t_k$ and $\tau_n = t_{k+1}$, and $\nu_i=\tau_i$.

\begin{aligned} \int_{t_k}^{t_{k+1}} \int_{t_k}^{\tau} \eta_u(\nu) d\nu d\tau &= \lim_{n\rightarrow\infin} \sum_{i = 1}^{n} \sum_{j=1}^{i} \eta_u(\nu_j) \delta\nu \delta\tau \\ &= \lim_{n\rightarrow\infin} \{ \eta_u(\nu_1) \\ &\phantom{=}+ \eta_u(\nu_1) + \eta_u(\nu_2) \\ &\phantom{=}+ \eta_u(\nu_1) + \eta_u(\nu_2) + \eta_u(\nu_3) \\ &\phantom{=}+ \cdots \\ &\phantom{=}+ \eta_u(\nu_1) + \eta_u(\nu_2) + \eta_u(\nu_3) + \cdots + \eta_u(\nu_n) \} \cdot \delta\nu \delta\tau \\ &= \lim_{n\rightarrow\infin} \sum_{i=1}^{n} (n-i+1) \eta_u(\nu_i) \delta\nu \delta\tau \\ &= \lim_{n\rightarrow\infin} \sum_{i=1}^{n} (n-i+1)\delta\tau \eta_u(\nu_i)\delta\nu \\ &= \lim_{n\rightarrow\infin} \sum_{i=1}^{n} (t_{k+1}-\tau_i) \eta_u(\nu_i)\delta\nu \\ &= \int_{t_k}^{t_{k+1}} (t_{k+1}-\tau) \eta_u(\tau) d\tau \end{aligned}

For the estimation system:

Continuous system:

\begin{aligned} \dot{\hat{\theta}} &= \tilde{\omega} - \hat{\beta} \\ \dot{\hat{\beta}} &= 0 \end{aligned}

Discrete system:

\begin{aligned} \hat{\theta}_{k+1} - \hat{\theta}_k &= \int_{t_k}^{t_{k+1}} \dot{\hat{\theta}}(\tau) d\tau \\ &= \int_{t_k}^{t_{k+1}} [\tilde{\omega}(\tau) - \hat{\beta}(\tau)] d\tau \\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega}(\tau) d\tau - \hat{\beta}_k \Delta t \\ \hat{\beta}_{k+1} - \hat{\beta}_k &= 0\\ \end{aligned}

Then, we have the final discrete-time error propagation equation

\begin{aligned} \theta_{k+1} - \hat{\theta}_{k+1} &= \theta_k - \hat{\theta}_k \\ &\phantom{=} + \left\{ \cancel{\int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau} - \beta_k \Delta t - \int_{t_k}^{t_{k+1}} \left[ \eta_v(\tau) + (t_{k+1} - \tau)\eta_u(\tau) \right] d\tau \right\}\\ &\phantom{=} - \left\{ \cancel{\int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau} - \hat{\beta}_k \Delta t \right\} \\ &= \theta_k - \hat{\theta}_k - (\beta_k - \hat{\beta}_k) \Delta t + \textcolor{magenta}{\int_{t_k}^{t_{k+1}} \left[ - \eta_v(\tau) - (t_{k+1} - \tau)\eta_u(\tau) \right] d\tau} \\ &\equiv \theta_k - \hat{\theta}_k - (\beta_k - \hat{\beta}_k) \Delta t + \textcolor{magenta}{p_k} \\ \beta_{k+1} - \hat{\beta}_{k+1} &= \beta_{k} - \hat{\beta}_{k} + \textcolor{magenta}{\int_{t_k}^{t_{k+1}} \eta_u d\tau} \\ &\equiv \beta_{k} - \hat{\beta}_{k} + \textcolor{magenta}{q_k} \end{aligned}

The discrete process noise is

$\newcommand{\E}[1]{E\left\{ #1\right\}} Q = \begin{bmatrix} Q_{11} & Q_{12} \\ Q_{21} & Q_{22} \end{bmatrix} \equiv \begin{bmatrix} \E{p_k^2} & \E{p_kq_k} \\ \E{q_kp_k} & \E{q_k^2} \end{bmatrix}$




### MY-QUESTION-1: If bias is not estimated (Not in the book, double check this!)

For the real system:

Continuous system:

\begin{aligned} \dot{\theta} &= \tilde{\omega} - \beta_\text{ML} - \eta_v \\ \end{aligned}

Discrete system:

Let $\Delta t = t_{k+1} - t_{k}$.

\begin{aligned} \theta_{k+1} - \theta_{k} &= \int_{t_k}^{t_{k+1}} [\tilde{\omega} - \beta_\text{ML} - \eta_v] d\tau \\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau - \int_{t_k}^{t_{k+1}} \beta_\text{ML}(\tau) d\tau - \int_{t_k}^{t_{k+1}} \eta_v d\tau\\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau - \beta_{\text{ML},k} \Delta t - \int_{t_k}^{t_{k+1}} \eta_v d\tau \end{aligned}

For the estimation system:

Continuous system:

$\dot{\hat{\theta}} = \tilde{\omega} - \beta_\text{ML}$

Discrete system:

\begin{aligned} \hat{\theta}_{k+1} - \hat{\theta}_k &= \int_{t_k}^{t_{k+1}} \dot{\hat{\theta}}(\tau) d\tau \\ &= \int_{t_k}^{t_{k+1}} \left[ \tilde{\omega}(\tau) - \beta_\text{ML}(\tau) \right] d\tau \\ &= \int_{t_k}^{t_{k+1}} \tilde{\omega}(\tau) d\tau - \beta_{\text{ML},k} \Delta t \end{aligned}

Then, we have the final discrete-time error propagation equation

\begin{aligned} \theta_{k+1} - \hat{\theta}_{k+1} &= \theta_k - \hat{\theta}_k \\ &\phantom{=} + \left\{ \cancel{\int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau} - \bcancel{\beta_{\text{ML},k} \Delta t} - \int_{t_k}^{t_{k+1}} \eta_v(\tau) d\tau \right\}\\ &\phantom{=} - \left\{ \cancel{\int_{t_k}^{t_{k+1}} \tilde{\omega} d\tau} - \bcancel{\beta_{\text{ML},k} \Delta t} \right\} \\ &= \theta_k - \hat{\theta}_k + \textcolor{magenta}{\int_{t_k}^{t_{k+1}} - \eta_v(\tau) d\tau} \\ &\equiv \theta_k - \hat{\theta}_k + \textcolor{magenta}{p_k} \end{aligned}


### MY-QUESTION-2: Other process noise?

The above derivations only considered the process noise due to

1. the difference between the real system model and the estimation system model;
2. the time discretization.

So, it doesn’t consider the process noise due to other differences to the reality (not the real system model). If so, do we need an additional term to compensate for additional uncertainties?
(Is my speculation correct?) Maybe not, because the real system model is also a model of the reality.

Can I just simply increase $\sigma_v$ and $\sigma_u$ to reflect other model errors? Because the real system model is an approximation of the reality and the last term $\bm{\eta}_v$ should include or absorb all the differences to reality. Similar for $\sigma_u$.
(Is this speculation correct?)

## 3.5. The Continuous-Discrete Kalman Filter

\begin{aligned} \dot{\bm{x}}(t) &= F(t)\bm{x}(t) + B(t)\bm{u}(t) + G(t)\bm{w}(t)\\ \tilde{\bm{y}}_k &= H_k\bm{x}_k + \bm{v}_k \end{aligned}

[INSERT TABLE 3.7]

The continuous-time propagation model equation does not involve the measurement directly. The covariance propagation follows a continuous-time Lyapunov differential equation.

the sample times of the measurements need not occur in regular intervals. In fact, different measurement sets can be spread out over various time intervals.

The real beauty of the continuous-discreteKalman filter is that it can handle different scattered measurement sets quite easily.

QUESTIONS:

• During the propagation, how is $G(t)Q(t)G(t)^T$ handled?
• This is just a normal term.
• $G(t)\bm{w}_t$ shall be handled as a random term.
• How is it handled in Orekit?

## 3.6. Extended Kalman Filter

The problem with this nonlinear model is that a Gaussian input does not necessarily produce a Gaussian output (unlike the linear case).

The fundamental concept of this filter involves the notion that the true state is sufficiently close to the estimated state. Therefore, the error dynamics can be represented fairly accurately by a linearized first-order Taylor series expansion.

[INSERT TABLE 3.8]

A steady-state gain cannot be found, because $F(t)$ and $H(t)$ in Table 3.8 will not be constant in general.

Another approach involves linearizing about the normal (a priori) state vector $\bar{\bm{x}}(t)$ instead of the current estiamte $\hat{\bm{x}}(t)$.

This gives the linearized Kalman filter. In general, the linearized Kalman filter is less accurate than the extended Kalman filter since $\bar{\bm{x}}(t)$ is usually not as close to the truth as is $\hat{\bm{x}}(t)$. However, since the nominal state is known a priori the gain $K(t)$ can be pre-computed and stored, which reduces the on-line computational burden.

[INSERT TABLE 3.9]

Proving convergence in the extended Kalman filter is difficult (if not impossible!) even for simple systems where the initial condition is not well known.
But it is often robust to initial condition errors, which can often be verified through simulation.

iterated extended Kalman filter:
local iteration at step $k$, until no longer improved.

nonlinear smoother back to time $t_{k-1}$ (see Sec. 5.1.3):
The reference trajectory over $[t_{k-1},t_k)$ can also be improved once the measurement $\tilde{\bm{y}}_k$ is given.

## 3.9. Summary

All important formulas are summarized here.

Smoothing ??

# 6. Parameter Estimation: Applications (to read now)

## 6.4. Orbit Determination

Gaussian Least Squares Differential Correction (GLSDC)

$?? \tag{6.53}$

$?? \tag{6.54}$

$F \equiv \frac{\partial\bm{f}}{\partial\bm{x}} = \begin{bmatrix}0_{3\times3}&I_{3\times3}\\F_{21}&0_{3\times3}\end{bmatrix} \tag{6.57}$

$\frac{\partial \bm{h}}{\partial\bm{x}} = \begin{bmatrix}H_{11}&0_{3\times3}\end{bmatrix}$

# 7. Estimation of Dynamic Systems: Applications

## 7.1. Attitude Estimation

$\dot{\mathbf{q}} = \frac{1}{2} \Omega(\bm{\omega}) \mathbf{q} \tag{7.1}$

### 7.1.2. Discrete-Time Attitude Estimation

The following approach is named “power series approach”, which is used to derive a discrete approximation to the continuous propagation. Here, the example in the book is for the discrete quaternion propagation of the kinematic equation

$\dot{\mathbf{q}} = \frac{1}{2} \Omega(\bm{\omega}) \mathbf{q} \tag{7.1}$


