Links between Kalman Filtering and Data Assimilation with Generalized Least Squares ()
1. Introduction
In this paper, we compare two data assimilation methods that are routinely applied to monitor time-dependent of linear systems, one based on Generalized Least Squares (GLS) [1] and the other on Kalman Filtering (KF) [2] [3]. We are motivated by our anecdotal observation that, while both are widely used tools that do broadly similar things, GSL and KF tend to be used by different communities who often think of their method as the “best”. Our goal is to enumerate and study the similarities and differences between the GLS and KF. Especially, we wish to determine whether or not the same prior information is used in each. Establishing a link between KF and GLS provides a clear pathway for applying GLS concepts, and especially resolution analysis [4] [5], to KF, and also a pathway for extending GLS analysis to the real-time scenarios for which KF is already well-suited.
A synopsis of variables used in this paper is provided in Table 1. At any time,
,
, a linear system is described by a state vector (model parameter vector),
. This state vector evolves away from the initial condition
(1)
according to the dynamical equation:
(2)
Here,
is the dynamics matrix and
is the source vector. Neither
the initial conditions nor the source is known exactly, but rather have uncertainty described by their respective covariance matrices,
and
.
This formulation well approximates the behavior of systems described by linear partial differential equations that are first order in time. For example, let
be temperature at time,
, and position,
, where
and
are small increments, and suppose that
satisfies the thermal diffusion equation,
(with zero boundary conditions). This partial differential equation can be approximated by:
(3)
which has the form of dynamical Equation (2). Here, the choices:
(4)
encode both the differential equation and the boundary conditions. The matrix,
, is sparse in this example, as well as in many other cases in which it approximates a differential operator.
The data equation expresses the relationship between the state vector and the observables:
(5)
Here,
is the data vector, determined to covariance,
, and
is the data kernel. In the simplest case, the observations may be of selected elements of the state vector, itself, in which case, each row of
is zero, except for a single element, say in column, k, which is unity. Here,
is a function that associates
with
. Other more complicated relationships are possible. For instance, in tomography, the data is a line integral through
(with y another spatial dimension). The data kernel,
, is sparse in these two cases. In other cases, it may not be sparse.
The data assimilation problem is to estimate the set of state vectors,
, using the dynamical equation, the data equation and the initial condition. One possible approach is based on Generalized Least Squares (GLS); another upon Kalman Filtering (KF). In this paper, we demonstrate that these two apparently very different methods are, in fact, exactly equivalent. In order to simplify notation, we concatenate the state vectors for times into an overall vector,
, and data vectors into an overall vector,
. By assumption, no observations are made at time,
.
2. Generalized Least Squares Applied to the Data Assimilation Problem
Generalized Least Squares (GLS) [1] [6] [7] [8] is a technique used to estimate
when two types of information are available: prior information and data. By prior information, we mean expectations about the behavior of
that are based on past experience or general physical considerations. The dynamical equation and initial condition discussed in the previous section are examples of prior information. By data we mean direct observations, as typified by the data equation discussed in the previous section.
Prior information can be represented by the linear equation,
(with
and where the equation is accurate to covariance,
) and observations can be represented by the linear equation,
(with covariance
). The Bayesian principle leads to the optimal solution, which we denote the Generalize Least Squares (GLS) solution,
[1] [6] [7] [8]. It minimizes a combination of the weighted
error in prior information and the weighted
error in the data (where the weighting depends upon the covariances).
Several equivalent forms of the GLS solution,
, and its posterior variance,
, are common in the literature. We enumerate a few of the more commonly-used forms here:
Form 1 [8] groups the prior information and data equations into a single equation,
:
(6)
Note that the factors of
amd
are weights proportional to “certainty”.
Form 2 [1] introduces generalized inverses,
and
:
(7)
Here,
is the estimated solution and
is its posterior covariance.
Form 3 [7] organizes the solution in terms of the prior state vector,
; that is, the state vector implied by the prior information, acting alone:
(8)
The matrix,
, plays the role of a projection matrix. See Appendix A.1 for a deviation of the covariance equation.
Form 4 [1] introduces the deviation,
, of the solution from the prior state vector, and the corresponding deviation,
, of the data from that predicted by the prior state vector:
(9)
Finally, Form 5 [6] [9] uses as alternate form of the generalized inverse:
(10)
The equality of the two forms was proven by [6] using a matrix identity that we denote TV82-A (Appendix A.2). Because
is
and
is
, the first form is most useful when
; the second when
. However, a decision to use one or the other must also take in consideration the sparsity of the various matrix products.
Form 4 is derived from Form 3 by subtracting
from both sides of the Gram equation,
(11)
and then by requiring that the second term on the right-hand side vanish, which leads to:
(12)
That is,
is due to the prior information acting along. The deviatoric manipulation is completely general; alternately the first term could have been made to vanish, in which leads to:
(13)
Here,
is due to the data acting alone. Note that deviatoric manipulations of this type never change the form of the matrix,
. We will apply this principle later in the paper.
In the subsequent analysis, we will focus on the Gram equations:
(14)
The initial condition and the dynamical equation can be into a single prior information equation of the form,
:
(15)
Here,
. Several quantities derived from
, and which we will use later, are:
(16)
The existence of
implies that
can be uniquely specified. The data equation expresses the relationship between the state vector and the observables, and presuming that no data are available for time,
, has the form:
(17)
This equation is taken to have an accuracy described by the summary covariance matrix,
. We note that:
(18)
The matrix,
, in the Gram equation is symmetric and block-triagonal:
(19)
with elements:
(20)
The vector,
, on the right-hand side of the Gram equation, is:
(21)
Here, we define
to be zero.
3. Recursive Solution Using the Thomas Method
Insight into the behavior of the GLS solution can be gained by solving the Gram equation iteratively [10]. We the Thomas method [11] [12] (see Appendix A.3), though other methods are viable alternatives. It consists of a forward-in-time pass through the system that recursively calculates two quantities,
and
:
(22)
After the forward recursion, the system is block-upper-bidiagonal with row i having elements
and
(except for the last row, which lacks the
) and the modified right-hand side is
. The solution,
, is achieved through a backward recursion:
(23)
It is evident that information is propagated both forward and backward in time during the solution process. Furthermore, computation time grows no faster than the number of steps, K, in the recursion.
The Thomas method has a disadvantage in the common case where the covariances matrices,
and
are diagonal and when
and
are sparse, because although
is then also sparse, the matrices,
, are in general not sparse, so the effort needed to compute them scales with
. Other direct methods share this limitation, too. Consequently, the overall calculation scales with
. An iterative method [10], such conjugate gradient method, applied to the Gram equation,
, is usually a better choice. This method requires that the quantity,
, be calculated for an arbitrary vector,
, and this quantity can be very efficiently calculated as
[13]. In cases in which the dynamical equation approximates a partial differential equation, the number of non-zero elements in the matrix,
, scale with
. The conjugate gradient algorithm requires no more than
iterations (and often much fewer), each requiring
multiplications. Thus, the overall solution time scales with
. Consequently, the conjugate gradient method has a speed advantage when
.
4. Present-Time Solution
Suppose that the analysis focuses on the “present-time”, i, in the sense that only the solution,
, determined using data up to and including time, i, is of interest. One can assemble a sequence of present-time solutions during the forward recursion, using the fact that the ith solution can always be considered to be the final one. No backwards recursion is needed to compute the solution for the final time. However, the forms of the “final”
and
differ from that of the previous
s in the recursion, so a separate computation is needed:
(24)
Consequently, in order to create a sequence of present-time solution, the two linear systems must be solved at each step in the forward recursion. The present-time solution is the same as the reference solution,
, defined in the previous section.
Kalman Filtering
Kalman Filtering (KF) is a solution method with an algorithm that, like the Thomas present-time solution, is forward-in-time, only [2] [3]. It consists of four steps, the final three which are iterated.
Step 1 assigns the
solution,
, and its covariance,
:
(25)
Step 2 propagates the solution and its covariance forward in time using the dynamical equation, and considers it to be prior information.
(26)
Step 3 uses GLS to combine the prior information,
, with covariance
, and data,
, with covariance
, into a solution,
, with covariance,
. Any of the equivalent forms of the GLS solutions described in Section 2 can be used in this step.
Step 4, increments i and returns to Step 2, creating a recursion.
Most often, the GLS solution and its variance are written as:
(27)
However, any of the equivalent forms described above can substitute, such as:
(28)
5. Kalman Filtering Is Not “Filtering” in the Strict Sense
A standard Infinite Impulse Response (IIR) filter has the form
, where
is the “input” timeseries,
is the “output” timeseries,
and
are filters (with
) and
signifies convolution [14]. Key to this formulation is that the filter coefficients are constants; that is, they are not a function of time.
If the generalized inverse in KF was time-independent, so that
, then KF could be put into form of an IIR filter:
(29)
because the convolution reproduces the KF solution:
(30)
So, from this point of view, the KF has a
of length 2 (each element of which is a matrix), and a
of length 1 (each element of which is a row vector of two matrices). However, this formulation does not really correspond to a standard IIR filter, because the filter coefficients, which depend upon the generalized inverse,
, depend upon time, i. Hence, the word “filter”, though generally indicative of the KF process, oversimplifies the actual operation being performed. KF is not filtering in the strict sense.
6. The Present-Time Thomas and Kalman Filtering Solutions Are Equal
We will now demonstrate that the present-time Thomas solution,
, and the Kalman filtering solution,
, are equal. We will make use of an identity, abbreviated TV82-B, that is due to [6], which shows that for invertible symmetric matrices
and
and arbitrary matrix,
:
(31)
Thus, for instance, when
,
and
:
(32)
The PT and KF recursions both start with
and
. The
case in irregular, and must be examine separately. The KF solution is:
(33)
This can be compared with the present-time Thomas solution:
(34)
Note that we used TV82-B to simplify the expression for
. By inspection,
. Thus, the two solutions are equal if
. The terms involving
match. The terms involving
would match if it could be shown that:
(35)
But this equation is true by TV82-B. The terms
also match, because of the matrix identity
(36)
derived in Appendix A.4. Consequently, the solutions,
, and their posterior covariances,
are equal. Applying
to the KF recursion, and TV82-B and
to present-time Thomas recursion, leads to:
(37)
Thus,
as long as
. Because the latter is true for
, so the formula can be successively applied to show
for all
. Similarly, the procedure that demonstrated the equality of
and
can be extended to
and
(38)
Here we have used TV82-B and
. The terms ending in
match. The terms ending in
also match, since it has been established previously that
. In order for
to equal
, we must have
and:
(39)
This equation has the same form as identity (36), where the equality has been demonstrated. Starting with
, we have
and
, which implies
and
, which implies
. This process can be iterated indefinitely, establishing that the present-time Thomas and Kalman solutions, and their posterior variance, are equal.
7. Comparison between the Present-Time Solution and GLS
The present-time solutions at time, j, depends on information available for times,
but not upon information that subsequently becomes available (that is, for times
. This limitation is necessary in a real-time scenario. However, the lack of future data leads to a solution that is poorer estimate of the true solution, than a GLS solution in which the state vectors at all time are globally adjusted to best-fit all the prior information and data.
An outlier that occurs at, or immediately before, the present moment can cause large error in the present-time solution. The full GLS solution is less affected because measurements in the near future may compensate (Figure 1).
Having established links between KF and GLS, we are now able to apply several useful inverse theory concepts and especially resolution [4] [5] [7]. In a GLS problem, model resolution refers to the ability of the data assimilation process to reconstruct deviations of the true model from the one predicted by the prior information, alone [1]. Data resolution refers to its ability to reconstruct deviations of the data from the one predicted by the prior information [9]. Model resolution is quantified by a
matrix,
and data resolution by a
matrix,
, that satisfy.
(40)
![]()
Figure 1. Hypothetical data assimilation scenario, with
,
,
,
. The initial condition (red box) satisfies
. The dynamical equation requires that the slope (dotted red lines) be about 1/4. The data,
, are noisy versions of state
, which is expected to linearly increase with time with slope, s. When the “present-time” is
, the present-time solution (grey),
, is pulled down by the noisy datum,
, leading to a poor fit to the dynamics at that time. The GLS solution,
, is less affected by the outlier, because the datum at time,
, better defines the linear trend, leading to a solution at
that better matched the dynamics.
Resolution is perfect when
. When
, an element of the reconstructed state vector is a weighted average of all elements of the true state vector. When
, an element of the reconstructed data vector is a weighted average of all elements of the true data vector. The resolution matrices quantify resolution in both time and space. As we will show in the example, below, the model resolution (or data resolution) can be temporally poor, even when it is spatially good.
Another important quantity is the full
posterior covariance matrix,
. In addition to correlations between elements of the state vector at a given time, it contains the correlations between elements of the state vectors at different times. These coefficients are needed for computing confidence intervals of quantities that depend of the state vectors at two or more times. Although
,
and
are large matrices, methods are available for efficiently computing selected elements of them using the conjugate gradient method [1].
Finally, there may be instances in which the covariance,
, may be known only up to some scalar parameter,
. For example, the choice
expresses decorrelation over a scale length, p. An initially poor guess of the parameter, p, can be “tuned” using partial derivatives based on the Bayesian principle [15] and the gradient descent method [16].
8. Example
We consider a data assimilation problem based on the heat diffusion Equation (3), with
and
. The state vector,
, represents temperature at position,
, and is of length
. The source is Gaussian function in space and impulsive function in time, plus additive noise:
(41)
with scale length,
, and peak position,
. Here,
is a Normally-distributed random variable with zero mean and variance,
. The initial condition is:
(42)
where
and
is a Normally-distributed random variable with zero mean and variance,
. The dynamical Equation (2) is iterated for
time steps, to provide the “true” state vector. As expected, the solution has a Gaussian shape with a width that increases, and an amplitude that decreased, with time (Figure 2(A)). The data are a total of
temperature measurement at each time,
, made at randomly-selected positions (without duplications) and perturbed with Normally-distributed random noise with zero mean and variance,
.
GLS solutions were computed by both the full Thomas algorithm (Figure 2(B)) and by solving the
system by the conjugate gradient method (not
![]()
Figure 2. Comparison of solutions. (A) True solution. (B) Difference between the Generalized least squares solution and the true solution. (C) Difference between the present-time Thomas solution and the true solution. (C) Difference between the Kalman solution and the true solution. Note that color scale for differences is expanded with respect to the one for the true model.
shown). They were found to be identical to machine precision. Present-time solutions were computed for both the present-time Thomas (Figure 2(C)) and KF algorithms (Figure 2(D)). They were also found to be identical to machine precision.
In general, both the GLS and present-time solutions fit the data well. However, the present-time solution matches the true model more poorly than does the GLS solution (Figure 3). In this numerical experiment, the present-time solution is about 10% poorer than the GLS solution, quantified with the root mean squared deviation from the true solution. However, the percentage, while always positive, varies considerably when the underlying parameters are changed.
Both the Thomas and Kalman versions of the present-time algorithm are well suited for providing ongoing diagnostic information, such as posterior covariance and root mean squared data fitting error (Figure 4), which can provide quality control in real time applications.
![]()
Figure 3. Histogram of ratios of the root mean squared error between the present-time estimate,
,
, and the true state and the root mean squared error between GLS estimate,
,
, and the true state, for 1000 realizations of the exemplary data assimilation problem.
![]()
Figure 4. Kalman Filtering solution of the exemplary data assimilation problem. (A) the data,
,
. (B) The estimated state
,
. (C) The root mean square data prediction as a function of time, i. (D) The posterior variance as a function of time, i.
The model resolution matrix,
, (Figure 5(A)) for this exemplary problem has a poorly-populated central diagonal, meaning that some elements of the state vector,
, are well-resolved from their spatial neighbors,
while others are very poorly resolved. The matrix large elements along other diagonals, corresponding to, offset from the main diagonal by M rows, indicating that some elements are not well resolved from their temporal neighbors,
. The temporal width of the resolving kernel is about ±4, and the shape is very irregular, indicating very uneven averaging is taking place (Figure 6). The data resolution matrix,
, (Figure 5(A)) has a well-populated central diagonal, meaning that mosts elements of the predicted data vector,
, are well-resolved from their spatial neighbors,
. Like
, it also has elements along other diagonals, corresponding to, offset from the main diagonal by M rows, indicating that some elements are not well resolved from their temporal neighbors,
.
![]()
Figure 5. Central portion of the resolution matrices for the exemplary data assimilation problem. (A) The model resolution matrix,
. (B) The data resolution matrix,
. The portion shown corresponds time slices
.
![]()
Figure 6. Rows of the model resolution matrix,
(colors), rearranged into an
grid, for three target points (circles) (A)
; (B)
; (C)
. The high intensities are localized near the target points (good), but have a rough pattern (bad).
9. Conclusion
In this paper, we examine a data processing scenario in which real-time data assimilation is performed using Kalman Filtering, and then reanalysis is performed using generalized least squares (GLS). In this problem, spatial characteristics of the system are described by a state vector (mode parameter vector), and its temporal characteristics by the evolution of the state-vector with time. We explore the relationship between the real-time Kalman Filter estimate and the GLS reanalysis estimate of the state vector. We show that the KF solution at a given time is equal to the GLS solution that one would obtain if it excluded data for future times. Furthermore, we show that the recursive procedure in KF is exactly equivalent to the solution of the GLS problem via Thomas’ algorithm for solving the block-tridiagonal matrix that arises in the reanalysis problem. This connection indicates that GLS reanalysis is better considered the final step of a single process, rather than a “different method” arbitrarily being applied, post factor. Now that this connection between KF and GLS has seen identified, the familiar GLS concepts of model and data resolution can be applied to KF. We provide an exemplary problem, based on thermal diffusion. In addition to showcasing our result, the example demonstrates that the state vector and vector of predicted data can be poorly-resolved in time, even when they are well resolved in space.
Acknowledgements
The author thanks the students in his 2022 Geophysical Inverse Theory course at Columbia University (New York, USA), and especially George Lu, for helpful discussion.
Appendix
A.1) Proof that
. This derivation is well known [6], but is presented here for completeness and to point out potential pitfalls should it be applied incorrectly. Because
and
are independent of one another,
, the normal rules of error propagation apply. The posterior covariance,
, is:
(A.1)
Here, we have used the fact that
, with
. Although
, has the form of a projection operator, it is a function of
, and has deceptive properties. Consider the case in which
, where
is an invertible symmetric matrix. In the limit of the parameter,
, becoming indefinitely large,
does not also become indefinitely large, but rather tends to
, which is the posterior variance that arises from the data, only. The zero limit tends to zero; that is, when
is known very accurately, so is
.
A.2) Derivation of the TV82-A and TV82-B identities, following [6]. Consider invertible symmetric matrices,
and
, and arbitrary matrix,
. The expression
can alternately be factored:
(A.2)
Multiplying by the inverses yields identity TV82-A:
(A.3)
Now consider the expression
, which by the above identity equals
. Factoring out the term in brackets
(A.4)
cancelling terms yields identity TV82-B:
(A.5)
A.3) The Thomas algorithm [10] for a symmetric block-diagonal matrix is well-known; we reproduce it here for completeness. The th row of the matrix has elements
,
,
and the right-hand size is
. Consider the step in the upper-triangularization process when rows
and above have been triangularized, but rows
and below have not:
(A.6)
The second row is modified by multiply the top row by
and adding the result to the second, which eliminates the first term, yielding:
(A.7)
Note that the new row has two terms, and that the coefficient of the second is always
, which is the same pattern as the first row. Thus, the bottom row becomes a new top row, and the recursion is
(A.8)
After the recursion, the matrix upper-bidiagonal with diagonals,
and
and the right-hand side is
. It is back-solved as:
(A.9)
A.4) Verification of the identity in (36):
(A.10)