Implementing a Kalman filter in R for Android GPS measurements
The problem
I downloaded my data from Google Take-Out and have been playing around with it for my thesis. This is roughly what the data looks like after some pre-processing:
lon | lat | accuracy |
---|---|---|
5.115447 | 52.10574 | 877 |
5.115447 | 52.10574 | 877 |
5.113952 | 52.10451 | 18 |
5.115447 | 52.10574 | 877 |
5.115447 | 52.10574 | 877 |
5.115447 | 52.10574 | 877 |
As soon as I looked into it I realised there were some issues with the measurements. They were all over the place!
The circles are the measurements (with their radius representing the accuracy and the center being the point) and lines connect these measurements in time. As you can see, on this particular day in February, I went to classes, to the gym and probably to the pub to drink a Heineken.
What is a Kalman filter?
The Kalman filter was proposed by a Hungarian mathematician as a method to “filter” out inaccuracies and other types of noise. I find this surprising because in my experience Hungarian mathematicians usually propose getting drinks.
By far the best resource on the filter I have found on the internet can be found on this old fashioned website. I encourage you to read the introduction there as I have paraphrased almost all of the explanation (badly). However, I did find most online guides to be a bit confusing in their use of teminology regarding states and processes, so I’ll try to apply everything here to my relatively simple problem.
If you care little about how the model is built just scroll down to see a cool map.
How is the model built?
A state \(x \in \mathbb{R}^n\) is the current state of whatever one is interested in. In my case this is a human being located on earth as defined by two parameters: latitutde and longitude. This depends entirely on the subject of intrest: if you are more interested in ballistic missiles, you could perhaps also add altitude and speed to this state.
The state \(x\) in a given point in time \(k\) is modelled by what is known as the state equation:
\[ x_k = Ax_{k-1} + Bu_{k-1}+w_{k-1} \]
Here \(w_k\) is known as the process noise and is conceptually equivalent to the error term in a linear regression. For instance, if measurements are not evenly spaced (e.g. not every two minutes) then it makes sense for the process noise to increase with the amount of time since the last measurement.
\(A\) is a \(n \times n\) matrix which relates our current state to the previous state. In my case, I cannot think of a better matrix to use than an identity matrix. This is because my best guess of where I will be next is where I was previously. This would not be true if I had information about my speed and my direction.
\(B\) is an \(n \times l\) matrix which relates the optional control input \(u \in \mathbb{R}^l\) to the state \(x\). A textbook example is that if you tell your robot to walk 100 meters north (this is the control input \(u\)), but you know your robot tends to forget to take a step every 3 steps, then this is the part of the equation where you can incorporate your information. Given that I do not have any control input I ignored this part subsequently.
On the other hand, we also have a measurement equation that models our measurements \(z \in \mathbb{R}^{m}\) in a given point in time \(k\). This is modelled by:
\[ z_k = H x_k +v_k \]
Like in the previous equation, \(v_k\) is known as the measurement noise and represents the errors in measurement.
\(H\) is an \(m \times n\) matrix relating the state to the measurement. Say you had a biased measurement that is always overestimated the state by a factor of two, here you could correct it.
The noise in both equations assumed to be:
\[p(w) \sim N(0,Q)\]
\[p(v) \sim N(0,R)\]
Where \(Q\) is the process noise covariance matrix and \(R\) is the measurement noise covariance. In our case, Google provides an accuracy indication for the measurements, so it makes sense for \(R_k\) not to be constant but rather to be a function of this accuracy. Moreover, since we know the amount of time that has elapsed between measurements, it would also make sense for \(Q_k\) to be a function of the amount of time between \(k\) and \(k-1\).
The Kalman filter
Lets define \(\widehat{x}^{-}_k \in \mathbb{R}^n\) as the a priori state estimate given our knowledge of the process. In practical terms, this is our guess of what the state is going to be without taking into consideration the measurements. Thus, for my case it is the guess of where I will be in timepoint \(k\) given where I was in timepoint \(k-1\).
Moreover, let \(\widehat{x}_k \in \mathbb{R}^n\) be the a posteriori state estimate at step \(k\) given measurement \(z_k\). For my case it is the guess of where I am given the measurement at that point in time.
These two state estimates have error covariances denoted \(P^{-}_k\) and \(P_k\) respectively.
Thus the filter computes:
an a posteriori state estimate as a linear combination of an a priori estimate and the weighted difference between the actual measurement and the measurement prediction.
Algebraically it is: \[ \widehat{x}_k = \widehat{x}^{-}_k + K (z_k - H \widehat{x}^{-}_k) \] Where \(K\) is the weight (also known as the Kalman gain) and the subsequent difference can also be understood as a residual. Basically, here we want to improve our guess based on our knowledge of the process with the actual received measurements.
The Kalman gain is an \(n \times m\) matrix which minimizes the posteriori error covariance. A handy way of thinking about this weighting is that it ensures that:
- If the measurement error decreases the measurements are trusted more.
- If the measurement error increases the measurements are trusted less.
- If the process error decreases the measurements are trusted less.
- If the process error increases the measurements are trusted more.
In my simple example the measurement error is simply the accuracy of the measurements provided by Google (the radius of the circle) and the process error could be understood as a function of the amount of time in between measurements.
The algorithm itself
1: Initial estimates
The first position of \(\widehat{x}_{k-1}\) is estimated as well as the error covariance matrix \(P_{k-1}\).
2: Predicting the state
Project the state and error covariances
\[ \widehat{x}^{-}_k = A \widehat{x}_{k-1} \]
Remember, here I’ve ignored the control input.
\[ P^{-}_k = A P_{k-1} A^T + Q \]
3: Measurement Update (“Correcting the prediction based on the measurements”)
Computing the Kalman gain
\[ K_k = \frac{P^-_{k-1} H^T}{H P^-_{k-1}H^T+R} \]
Update estimate of state with measurement
\[ \widehat{x}_k = \widehat{x}^{-}_k +K_(z_k-H\widehat{x}^{-}_k ) \]
Update estimate of error covariance
\[ P_k = (I - K_k H)P^-_{k-1} \]
Repeat steps 2 and 3 for as many \(k\) there are.
Implementation in R
Here I basically played around with the functions of \(Q\) and \(R\) until I found something reasonable.
#initializing variables
count <- nrow(day) # amount of data points in the day
z <- cbind(day$lon,day$lat) #measurements
#Allocate space:
xhat <- matrix(rep(0,2count),ncol =2) #a posteri estimate at each step
P <- array(0,dim=c(2,2,count)) #a posteri error estimate
xhatminus <- matrix(rep(0,2count),ncol =2) #a priori estimate
Pminus <- array(0,dim=c(2,2,count)) #a priori error estimate
K <- array(0,dim=c(2,2,count)) #gain
#Initializing matrices
A <-diag(2)
H<-diag(2)
R<-function(k) diag(2)* day$accuracy[k]^2#estimate of measurement variance
Q<-function(k) diag(2)* as.numeric(day$timeNextMeasure[k])^1.5# the process variance
#initialise guesses:
xhat[1,] <- z[1,]
P[,,1] <- diag(2)
for (k in 2:count){
#time update
#project state ahead
xhatminus[k,] <- A %% xhat[k-1,] #+ B %% u[k-1]
#project error covariance ahead
Pminus[,,k] <- A %% P[,,k-1] %% t(A) + (Q(k))
#measurement update
kalman gain
K[,,k] <- Pminus[,,k] %% t(H)/ (H %% Pminus[,,k] %*% t(H) + R(k))
#what if NaaN?
K[,,k][which(is.nan(K[,,k]))]<-0
update estimate with measurement
xhat[k,] <- xhatminus[k,] + K[,,k] %% (z[k,] - H %% xhatminus[k,])
#update error covariance
P[,,k] = (diag(2) - K[,,k]%% H) %% Pminus[,,k]
}
Lets see the results!
What else there is to do
There are a lot of things that could be done to improve this model. The most obvious would be to do some fine tuning with respect to the error covariances.
In addition, there may be ways to incorporate previous information into the model beyond \(k-1\), for instance I usually cycled one out of two routes to university in Utrecht three days a week, but this model remains oblivious to that.