Unscented Kalman Filter#
New in version 0.4.0.
The Unscented Kaman Filter (UKF) was first proposed by S. Julier and J. Uhlmann [JU97] as an alternative to the Kalman Fiter for nonlinear systems.
The UKF approximates the mean and covariance of the state distribution using a set of discretely sampled points, called the Sigma Points, obtained through a deterministic sampling technique called the Unscented Transform.
Contrary to the EKF, the UKF does not linearize the models, but uses each of the sigma points as an input to the state transition and measurement functions to get a new set of transformed state points, thus avoiding the need for Jacobians, and yielding an accuracy similar to the KF for linear systems.
The UKF offers significant advantages over the EKF in terms of handling nonlinearities, achieving higher-order accuracy, robustness to initial estimates, and consistent performance.
However, the UKF has disadvantages related to computational complexity, memory requirements, and parameter tuning. These factors can make the UKF less suitable for certain applications, particularly those with limited computational resources.
The implementation in this module is based on the UKF algorithm for nonlinear estimations proposed by Wan and van de Merwe [WvdM00], and further developed by Kraft [Kra03] and Klingbeil [Kli06] for orientation estimation using quaternions.
Kalman Filter
We have a discrete system, whose states are described by a vector \(\mathbf{x}_t\) at each time \(t\).
This vector has \(n\) items, which quantify the position, velocity, orientation, etc. Basically, anything that can be measured or estimated can be a state, as long as it can be described numerically.
Knowing how the state was at time \(t-1\), we want to predict how the state is at time \(t\). In addition, we also have a set of measurements \(\mathbf{z}_t\), that can be used to improve the prediction of the state.
The traditional Kalman filter, as described by [Kal60] computes a state in two steps:
The prediction step computes a guess of the current state, \(\hat{\mathbf{x}}_t\), and its covariance \(\hat{\mathbf{P}}_t\), at time \(t\), given the previous state and covariance at time \(t-1\).
The correction step improves this prediction using a measurement (or set of measurements) \(\mathbf{z}_t\) at time \(t\).
The Kalman filter, however, is limited to linear systems, rendering the process above inapplicable to nonlinear systems like our attitude estimation problem.
Extended Kalman Filter
A common solution to the nonlinearity issue is the Extended Kalman Filter (EKF), which linearizes the system model and measurement functions to __approximate__ the terms, allowing the use of the Kalman filter as if it were a linear system.
In this approach the predicted mean and covariance are computed using the linearized models:
where \(\mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_t)\) is the nonlinear dynamic model function, whose Jacobian is:
whereas the measurement model [1] is linearized as:
where \(\mathbf{h}(\mathbf{x}_t)\) is the nonlinear measurement model function, whose Jacobian is:
Unfortunately, these approximations can introduce large errors in the posterior mean and covariance of the transformed random variable, which may lead to sub-optimal performance.
To avoid these issues, a solution using unscented transforms was proposed by Julier and Uhlmann [JU97], which is the basis for the Unscented Kalman Filter (UKF).
Unscented Kalman Filter#
Unscented Transform
The UKF is a type of Kalman filter that replaces the linearization with a deterministic sampling technique called the Unscented Transform.
This transformation generates a set of points that capture the mean and covariance of the state distribution, called the Sigma Points.
Each of the sigma points is used as an input to the state transition and measurement functions to get a new set of transformed state points.
The unscented transformation … is founded on the intuition that it is easier to approximate a Gaussian distribution than it is to approximate an arbitrary nonlinear function or transformation.
—Jeffrey K. Uhlmann
Imagine there is a set of random points \(\mathbf{x}\) with mean \(\bar{\mathbf{x}}\), and covariance \(\mathbf{P_{xx}}\), and there is another set of random points \(\mathbf{y}\) related to \(\mathbf{x}\) by a nonlinear function \(\mathbf{y} = f(\mathbf{x})\).
Our goal is to find the mean \(\bar{\mathbf{y}}\) and covariance \(\mathbf{P_{yy}}\) of \(\mathbf{y}\). The unscented transform approximates them by sampling a set of points from \(\mathbf{x}\) and applying the nonlinear function \(f\) to each of the sampled points.
Information about the distribution can be captured using a small number of points [JU97]. The samples are not drawn at random but according to a deterministic method.
Sigma Points
The \(n\)-dimensional random variable \(\mathbf{x}\) with mean \(\bar{\mathbf{x}}\) and covariance \(\mathbf{P_{xx}}\) is approximated by \(2n + 1\) points computed with:
where \((\sqrt{(n + \lambda)\mathbf{P_{xx}}})_i\) is the \(i\)-th column of the matrix square root, and \(\lambda=\alpha^2(n + \kappa) - n\) is a scaling parameter.
\(\alpha\) determines the spread of the sigma points around the mean, usually set to \(0.001\), and \(\kappa\) is a secondary scaling parameter, usually set to \(0\) [WvdM00].
But, how do we obtain the matrix form of the square root of \((n + \lambda)\mathbf{P_{xx}}\)?
Because \(\mathbf{P_{xx}}\) is a covariance matrix, it means it is symmetric and positive-definite.
The Cholesky decomposition of a positive-definite matrix \(\mathbf{A}\) is a lower triangular matrix \(\mathbf{L}\) such that \(\mathbf{A} = \mathbf{LL}^T\).
The square root of \((n + \lambda)\mathbf{P_{xx}}\) is then \(\mathbf{L}=\mathrm{chol}((n + \lambda)\mathbf{P_{xx}})\).
The Cholesky decomposition is preferred because:
It efficiently computes (roughly \(\frac{n^3}{3}\) operations for an \(n\times n\) matrix) the lower triangular matrix \(\mathbf{L}\).
It’s numerically stable.
It naturally handles the positive-definiteness requirement of covariance matrices.
Therefore, we first calculate the Cholesky decomposition of \((n + \lambda)\mathbf{P_{xx}}\), and then we obtain the Sigma Points by adding and subtracting the columns of \(\mathbf{L}\) to the mean.
where \(\mathbf{L}\) is the Cholesky decomposition of \((n + \lambda)\mathbf{P_{xx}}\), as mentioned above.
We pass these sigma points through the nonlinear function \(f\) to get the transformed points \(\mathcal{Y}\).
Their mean is given by their wieghted sum:
And their covariance by their weighted outer product:
with \(\mathbf{Q}\) being the \(n\times n\) process noise covariance matrix.
The weights \(\mathbf{w}\) are computed as:
The weights \(\mathbf{w}^{(m)}\) are used to compute the mean, and the weights \(\mathbf{w}^{(c)}\) are used to compute the covariance.
The constant \(\beta\) is used to incorporate prior knowledge about the distribution of the random variable, and is usually set to \(2\) for Gaussian distributions [WvdM00].
UKF Summary
Given the previous state \(\mathbf{x}_{t-1}\), its covariance matrix \(\mathbf{P}_{t-1}\), a control vector \(\mathbf{u}_t\), and a vector with the most recent \(m\) measurements \(\mathbf{z}_t\), the UKF algorithm can be summarized as follows:
Prediction:
Compute Sigma Points.
Propagate Sigma Points through Process model to get Transformed Sigma Points.
Predict State Mean and Covariance.
Correction:
Instantiate each Transformed Sigma Point through Measurement model.
Compute Predicted Measurement Mean and Covariance.
Compute Cross-Covariance.
Compute Kalman gain.
Compute Innovation (residual.)
Update (Correct) State and Covariance
UKF for Attitude Estimation#
In this implementation, we build the simplest UKF for attitude estimation using gyroscopes and accelerometers (and magnetometer, if available), so that we can focus on the details of the algorithm. Once the basic structure is understood, we could extend the model to create more complex systems.
PREDICTION MODEL
We start by defining the vectors of the Prediction Model, a.k.a. Process Model:
The state vector \(\mathbf{x}_t\in\mathbb{R}^4\) has the elements of a quaternion representing the orientation at any time \(t\).
We have added the control vector \(\mathbf{u}_t\in\mathbb{R}^3\) containing the angular velocity readings from a tri-axial gyroscope. This control vector is used to propagate the state vector through the process model. Normally, it is ignored when the gyroscope is not available or when we assume the model is not affected by external forces, but this is not the case in our implementation.
The quaternion describing the orientation is also known as a versor, and it is a unit quaternion, meaning \(\|\mathbf{q}\|=1\). Thus, we normalize the quaternion after each transformation.
Note
Notice we don’t extend the state vector to include the gyroscope biases like other models do. For the sake of simplicity we don’t estimate these biases, and assume the sensor readings are already calibrated.
Sigma Points
Given the previous state and covariance, the sigma points are computed first.
Using the cholesky decomposition we obtain the matrix square root:
where the previous covariance is set as \(\mathbf{P_{xx}}=\mathbf{P}_{t-1}\), \(n=4\) is the number of items in the state vector \(\mathbf{x}\), and \(\lambda=\alpha^2(n + \kappa) - n\) is the scaling parameter.
Hint
Using the default values \(\alpha=0.001\), and \(\kappa=0\), we get:
which yields \(\mathbf{L} = \mathrm{chol}\big(\sqrt{0.000004\mathbf{P}_{t-1}}\big)\).
Then, we compute the sigma points using the equations:
The first sigma point \(\mathcal{X}_0\) is always equal to the previous state \(\mathbf{x}_{t-1}\). The rest are obtained by adding and subtracting the columns of \(\mathbf{L}\) to the mean.
Because the state vector has 4 items, we obtain a set of 9 sigma points:
The estimation process is done as a two-step filter consisting of an attitude propagation (using the gyroscope) and a correction (using the accelerometer.)
Attitude Propagation
Based on the time spent between \(t-1\) and \(t\) (known as the time step \(\Delta t\)) we can compute the angular displacement \(\boldsymbol\Delta_\theta\) using the measured angular velocities \(\boldsymbol\omega\) from the gyroscopes, and add it to the previous attitude \(\mathbf{x}_{t-1}\) to obtain the predicted attitude \(\hat{\mathbf{x}}_t\):
This is called attitude propagation. However, this is a nonlinear operation and we cannot use it in the Kalman Filter. Therefore, we approximate it to define our required linear Process Model:
where the rotation operator \(\big[\mathbf{I}_4 + \frac{\Delta t}{2} \boldsymbol\Omega_t(\boldsymbol\omega_t)\big]\) is a truncation up to the second term of the Taylor series expansion of \(\int_{t-1}^t\boldsymbol\omega\, dt\).
Note
For more details about this linear operation, please refer to the documentation of the Attitude from Angular Rate.
We assume in this description, that the time step \(\Delta t\) is constant. However, it can be changed at any time during the implementation.
We propagate each of the sigma points through the process model \(f\) to get a new set of transformed state points \(\mathcal{Y}\):
Every \(\mathcal{Y}_i\) describes a quaternion. They must be normalized after the transformation, so that \(\forall i \in \{0, \ldots, 2n\} \;, \|\mathcal{Y}_i\|=1\).
Now we compute the Predicted State Mean.
This predicted state represents the mean of the transformed state points as a quaternion. Therefore, we must normalize it:
We proceed to compute the Predicted State Covariance.
Notice the product \((\mathcal{Y}_i - \bar{\mathbf{y}})(\mathcal{Y}_i - \bar{\mathbf{y}})^T\) is the outer product of the vector \((\mathcal{Y}_i - \bar{\mathbf{y}})\), which results in a \(4\times 4\) matrix.
MEASUREMENT MODEL
Our strategy is to use Earth’s known physical reference values, rotate them around the set of predicted orientations \(\mathcal{Y}\), and compare these against actual sensor readings \(\mathbf{z}\).
Their difference tells us how “off” the predicted orientation is. To ease the comparison both the references and the sensor readings are normalized.
The two main physical references are commonly used in attitude estimation, and we will use them in our Measurement Model:
Earth’s gravitational vector \(\mathbf{g}\).
Earth’s geomagnetic vector \(\mathbf{r}\).
In order to perform the rotations, we use the direction cosine matrix, a.k.a. rotation matrix, built from each predicted orientation (transformed points) to rotate the reference vectors to the sensor frame.
For Earth’s gravitational vector, we set the normalized reference on local tangent plane coordinates (LTP) as:
We assume Earth’s gravitational vector is aligned with the Z-axis of the LTP and, therefore, the X- and Y-axes are equal to zero.
Earth’s Magnetic Field is not aligned with a particular axis of Earth’s ellipsoid, and it is fully described with:
where \(r_x\), \(r_y\), and \(r_z\) are the components of the reference geomagnetic vector, which can be obtained from the World Magnetic Model (WMM) based on the sensor’s geographical location.
This vector represents the direction and intensity of the Earth’s magnetic field at the sensor’s location.
The reference geomagnetic vector \(\mathbf{r}\) is also normalized to unit length, so that we can use it as a direction vector too:
Sensor Readings
If only a tri-axial accelerometer is available to correct (update) the predicted attitude, the measurement vector \(\mathbf{z}_t\) has normalized accelerometer readings only.
However, if both tri-axial accelerometer and magnetometer are available, the measurement vector \(\mathbf{z}_t\) includes both:
Given one predicted orientation \(\mathcal{Y}_i\) we rotate a reference vector \(\mathbf{g}\) or \(\mathbf{r}\) to get its corresponding expected sensor reading. We call this the Measurement Model Function.
If only the accelerometer is available, the measurement model function yields:
If both accelerometer and magnetometer are available, it is stacked as:
Note
Notice we use the transpose of the rotation matrix to rotate the gravity vector from the global frame to the sensor frame (the opposite of what it describes.) We do this, so that we can compare it against the accelerometer readings in sensor frame.
We execute the measurement model function over each of the predicted sigma points \(\mathcal{Y}\) to get the expected accelerometer readings \(\mathcal{Z}\):
With this set of expected accelerometer readings \(\mathcal{Z}\) we can compute the Measurement Mean:
And the Measurement Covariance:
The Measurement Noise Covariance Matrix \(\mathbf{R}\) is built based on whether we have accelerometer readings only, or both accelerometer and magnetometer readings.
where \(\sigma_{a_x}\), \(\sigma_{a_y}\), and \(\sigma_{a_z}\) are the standard deviations of the accelerometer readings, and \(\sigma_{m_x}\), \(\sigma_{m_y}\), and \(\sigma_{m_z}\) are the standard deviations of the magnetometer readings.
The Cross-Covariance Matrix \(\mathbf{P_{yz}}\) represents how changes in the state variables correlate with changes in the measurement variables. Specifically, it quantifies how errors in the predicted states are related to errors in the expected measurements.
With these matrices we compute the Kalman Gain.
Notice this is a much simpler operation than the Extended Kalman Filter (EKF) where we need to compute the Jacobian matrix.
We compare the expected measurement mean \(\bar{\mathbf{z}}\) against the actual measurement reading \(\mathbf{z}\) to get the innovation at the curent time step \(t\):
This tells us, basically, how “far off” the predicted state is from the actual measurements.
Finally, we use all this information to correct the state and covariance:
Footnotes#
See also
EKF - Extended Kalman Filter for orientation estimation.
AngularRate - Attitude propagation using angular rate.
- class ahrs.filters.ukf.UKF(gyr: ndarray = None, acc: ndarray = None, mag: ndarray = None, frequency: float = 100.0, alpha: float = 0.001, beta: float = 2, kappa: float = 0, **kwargs)#
Unscented Kalman Filter to estimate orientation as Quaternion.
Examples
>>> import numpy as np >>> from ahrs.filters import UKF >>> from ahrs.common.orientation import ecompass >>> ukf = UKF() >>> num_samples = 1000 # Assuming sensors have 1000 samples each >>> Q = np.zeros((num_samples, 4)) # Allocate array for quaternions >>> Q[0] = ecompass(acc_data[0], mag_data[0]) # First sample of tri-axial accelerometer and magnetometer >>> for t in range(1, num_samples): ... Q[t] = ukf.update(Q[t-1], gyr_data[t], acc_data[t], mag_data[t])
The estimation can be simplified by giving all sensor values at the construction of the UKF object.
>>> ukf = UKF(gyr=gyr_data, acc=acc_data, mag=mag_data) >>> ukf.Q.shape (1000, 4)
This will perform all steps above and store the estimated orientations, as quaternions, in the attribute
Q.The most common sampling frequency is 100 Hz, which is used in the filter. If the sampling frequency is different in the given sensor data, it can be changed too.
>>> ukf = UKF(gyr=gyr_data, acc=acc_data, frequency=200.0) # Sampling frequency is 200 Hz
The initial quaternion is estimated with the first observations of the tri-axial accelerometers and, if available, magnetometers, but it can also be given directly in the parameter
q0.>>> ukf = UKF(gyr=gyr_data, acc=acc_data, mag=mag_data, q0=[0.7071, 0.0, -0.7071, 0.0])
- Parameters:
gyr (numpy.ndarray, default: None) – N-by-3 array with measurements of angular velocity in rad/s
acc (numpy.ndarray, default: None) – N-by-3 array with measurements of acceleration in in m/s^2
mag (numpy.ndarray, default: None) – N-by-3 array with measurements of magnetic field in mT. If not given, the filter will estimate orientation using only the accelerometer data.
frequency (float, default: 100.0) – Sampling frequency in Herz.
q0 (numpy.ndarray, default: None) – Initial orientation, as a versor (normalized quaternion).
Dt (float, default: 0.01) – Sampling step in seconds. Inverse of sampling frequency. NOT required if
frequencyvalue is given.alpha (float, default: 1e-3) – Parameter controlling spread of Sigma points.
beta (float, default: 2) – Parameter controlling distribution of Sigma points.
kappa (float, default: 0) – Secondary scaling parameter. Usually set to 0.
- Omega(x: ndarray) ndarray#
Omega operator.
Given a vector \(\mathbf{x}\in\mathbb{R}^3\), return a \(4\times 4\) matrix of the form:
\[\begin{split}\boldsymbol\Omega(\mathbf{x}) = \begin{bmatrix} 0 & -\mathbf{x}^T \\ \mathbf{x} & -\lfloor\mathbf{x}\rfloor_\times \end{bmatrix} = \begin{bmatrix} 0 & -x_1 & -x_2 & -x_3 \\ x_1 & 0 & x_3 & -x_2 \\ x_2 & -x_3 & 0 & x_1 \\ x_3 & x_2 & -x_1 & 0 \end{bmatrix}\end{split}\]- Parameters:
x (numpy.ndarray) – Three-dimensional vector.
- Returns:
Omega – Omega matrix.
- Return type:
numpy.ndarray
- compute_sigma_points(state: ndarray, state_covariance: ndarray) ndarray#
Sigma Points computation.
Given a state \(\mathbf{x}\) and its covariance \(\mathbf{P_{xx}}\), compute the sigma points \(\mathcal{X}\) as:
\[\begin{split}\begin{array}{rcl} \mathcal{X}_0 &=& \mathbf{x}_{t-1} \\ \mathcal{X}_i &=& \mathbf{x}_{t-1} + \mathbf{L}_i \\ \mathcal{X}_{i+n} &=& \mathbf{x}_{t-1} - \mathbf{L}_i \end{array}\end{split}\]where the Square Root covariance matrix \(\mathbf{L}\) is obtained using the Cholesky decomposition:
\[\mathbf{L} = \mathrm{chol}\Big(\sqrt{(n + \lambda)\mathbf{P_{xx}}}\Big)\]- Returns:
sigma_points – Sigma points array of shape (2n+1, n), where n is the state dimension.
- Return type:
numpy.ndarray
- set_weights() tuple#
Set weights for mean and covariance computation
The weights \(\mathbf{w}=\begin{bmatrix}\mathbf{w}^{(m)}& \mathbf{w}^{(c)}\end{bmatrix}\) are computed as:
\[\begin{split}\begin{array}{rcl} w_0^{(m)} &=& \frac{\lambda}{n + \lambda} \\ w_0^{(c)} &=& \frac{\lambda}{n + \lambda} + (1 - \alpha^2 + \beta) \\ w_i^{(m)} = w_i^{(c)} &=& \frac{1}{2(n + \lambda)} \quad \text{for} \quad i=1,2,\ldots,2n \end{array}\end{split}\]The weights \(\mathbf{w}^{(m)}\) are used to compute the mean, and the weights \(\mathbf{w}^{(c)}\) are used to compute the covariance.
The constant \(\beta\) is used to incorporate prior knowledge about the distribution of the random variable, and is usually set to \(2\).
The scaling parameter \(\alpha\) determines the spread of the sigma points around the mean. A small value of \(\alpha\) results in sigma points that are close to the mean, while a larger value results in sigma points that are more spread out. The value of \(\alpha\) is usually set to a small positive number, typically in the range of \(10^{-3}\) to \(10^{-1}\).
- Returns:
Weights for mean and covariance.
weight_mean: Weights for mean.weight_covariance: Weights for covariance.
- Return type:
tuple
- update(q: ndarray, gyr: ndarray, acc: ndarray, mag: ndarray = None, dt: float = None) ndarray#
Perform an update of the state.
- Parameters:
q (numpy.ndarray) – A-priori state describing orientation as quaternion.
gyr (numpy.ndarray) – Sample of tri-axial Gyroscope in rad/s.
acc (numpy.ndarray) – Sample of tri-axial Accelerometer in m/s^2.
dt (float, default: None) – Time step, in seconds, between consecutive Quaternions.
- Returns:
updated_quaternion – Estimated state describing orientation as quaternion.
- Return type:
numpy.ndarray