Visual Inertial Odometry (VIO) is a method for estimating a device's pose by fusing data from a camera and an inertial measurement unit (IMU). By combining visual features from the camera with IMU data, VIO provides accurate motion tracking and position estimation, even in environments with poor GPS signals.
This repository implements a nonlinear Kalman Filter, specifically the Extended Kalman Filter (EKF), to estimate the pose of a quadcopter drone.
Kalman Filter assumes linearity in the process and the observation models. But this assumption rarely works in the real world. Extended Kalman Filter allows non-linear functions by linearizing the functions at each timestep by using Taylor expansion.
Details and an implementation of the Kalman Filter can be found here
We use the IMU acceleration readings in the process model and then update the estimates by the pose estimation from the Camera.
To estimate the pose of the quadcopter drone, we will apply the Perspective-n-Point (PnP) problem-solving approach. This technique involves determining the position and orientation of the drone by relating known 3D points in the environment to their corresponding 2D projections in the camera's image plane. Specifically, we utilize AprilTag markers placed in the environment as 3D reference points. These AprilTag corners provide fixed, recognizable locations in the world, and their projections onto the 2D image plane, captured by the onboard camera, allow us to compute the drone's pose relative to the environment.
By solving the PnP problem, we can efficiently estimate the drone's position and orientation using visual data, which can then be combined with other sensor measurements (e.g., from an IMU) to improve the overall accuracy of the pose estimation.
The camera calibration matrix and distortion coefficients are provided as follows:
Camera Matrix:
[314.1779 0 199.4848
0 314.2218 113.7838
0 0 1 ]
Distortion Coefficients:
[-0.438607 & 0.248625 & 0.00072 & -0.000476 & -0.0911 ]
The camera model establishes the relationship between the homogeneous world coordinates and the homogeneous image coordinates. It is represented as:
[x, y, z]^T = [ fx γ u0 [r12 r12 r13 t1 [X
0 fy vo * r21 r22 r23 t2 * Y
0 0 1] r31 r32 r33 t3 ] Z
1]
Where:
fx
andfy
are the focal lengths in the x and y directions, respectively.u0
andv0
are the coordinates of the principal point.γ
is the skew coefficient (typically zero for most cameras).r11, r12, ..., r33
are the components of the rotation matrix, andt1, t2, t3
are the translation parameters that define the pose of the camera with respect to the world.
After obtaining the orientation data in the form of a rotation vector, we will convert this vector into a rotation matrix. Using the rotation matrix, we can extract the three Euler angles. This allows us to compute the transformation matrix from the camera frame to the world frame.
Additionally, the given information—specifically, a yaw rotation of π/4 and a roll rotation of π is used to determine the transformation from the camera frame to the IMU frame.
The time stamps have been aligned as the rate of data collection for the IMU and the Vicon system is different. The function performs linear interpolation to estimate the ground truth state of a drone at a given timestamp, drone_timestamp.
Our goal is to estimate the covariance matrix in the observation model: N(0, R) → R, assuming that the noise is zero-mean.
R = (1 / (n - 1)) * Σ (ν_t * ν_t^T)
z = [p,q] + ν
We estimate the ν
by doing the process step on the entire dataset and getting the difference between sensor and estimate value.
The following equations define the Kalman Filter process:
μ̂_t = μ_(t-1) + δ_t * f(μ_(t-1), u_t, 0)
Σ̅_t = F * Σ_(t-1) * F^T + V * Q * V^T
K = Σ̂_t * G_Tt * (G_t * Σ̂_t * G_Tt + W_t * R_t * W_t^T)^(-1)
μ_t = μ̂_t + K * (z_t - g(μ̂_t, 0))
Σ_t = Σ̂_t - K * G_t * Σ̂_t
We use a 15-state model for this scenario. The system state is represented by the vector x, which is given by:
[x] = [p
q
ṗ
bg
ba]
where:
p is a 3x1 position vector.
q is a 3x1 orientation vector.
ṗ is a 3x1 velocity vector (time derivative of the position).
bg is a 3x1 gyroscope bias vector.
ba is a 3x1 accelerometer bias vector.
The continuous-time state-space model can be derived as :
[ẋ] = [ ṗ
G(q)^(-1) * u_ω
a
g + R(q) * u
n_gb
n_ab ]
The details of
We get the next step by using the Euler One Step Approximation:
Extended Kalman Filter based VIO |
In summary, the Extended Kalman Filter (EKF) offers a robust approach for state estimation in nonlinear systems by integrating sensor measurements with system dynamics. This repository has covered the core principles of EKF, including system modeling, prediction, and measurement updates, and we have visualized the results against the Vicon ground truth. However, it is important to note that while the EKF is highly effective, it is computationally intensive due to the Jacobian calculations involved.