Links between Kalman Filtering and Data Assimilation with Generalized Least Squares

Abstract

Kalman filtering (KF) is a popular form of data assimilation, especially in real-time applications. It combines observations with an equation that describes the dynamic evolution of a system to produce an estimate of its present-time state. Although KF does not use future information in producing an estimate of the state vector, later reanalysis of the archival data set can produce an improved estimate, in which all data, past, present and future, contribute. We examine the case in which the reanalysis is performed using generalized least squares (GLS), and establish the relationship between the real-time Kalman estimate and the GLS reanalysis. We show that the KF solution at a given time is equal to the GLS solution that one would obtain if data excluded future times. Furthermore, we show that the recursive procedure in KF is exactly equivalent to the solution of the GLS problem via Thomas’ algorithm for solving the block-tridiagonal matrix that arises in the reanalysis problem. This connection suggests that GLS reanalysis is better considered the final step of a single process, rather than a “different method” arbitrarily being applied, post factor. The connection also allows the concept of resolution, so important in other areas of inverse theory, to be applied to KF formulations. In an exemplary thermal diffusion problem, model resolution is found to be somewhat localized in both time and space, but with an extremely rough averaging kernel.

Share and Cite:

Menke, W. (2022) Links between Kalman Filtering and Data Assimilation with Generalized Least Squares. Applied Mathematics, 13, 566-584. doi: 10.4236/am.2022.136036.

1. Introduction

In this paper, we compare two data assimilation methods that are routinely applied to monitor time-dependent of linear systems, one based on Generalized Least Squares (GLS) [1] and the other on Kalman Filtering (KF) [2] [3]. We are motivated by our anecdotal observation that, while both are widely used tools that do broadly similar things, GSL and KF tend to be used by different communities who often think of their method as the “best”. Our goal is to enumerate and study the similarities and differences between the GLS and KF. Especially, we wish to determine whether or not the same prior information is used in each. Establishing a link between KF and GLS provides a clear pathway for applying GLS concepts, and especially resolution analysis [4] [5], to KF, and also a pathway for extending GLS analysis to the real-time scenarios for which KF is already well-suited.

A synopsis of variables used in this paper is provided in Table 1. At any time, t i , 1 i K , a linear system is described by a state vector (model parameter vector), m ( i ) M . This state vector evolves away from the initial condition

m ( 1 ) = m A ( 1 ) (1)

according to the dynamical equation:

m ( i ) = D m ( i 1 ) + s ( i 1 ) (2)

Here, D is the dynamics matrix and s M is the source vector. Neither

Table 1. List of variables.

the initial conditions nor the source is known exactly, but rather have uncertainty described by their respective covariance matrices, C A 1 and C s .

This formulation well approximates the behavior of systems described by linear partial differential equations that are first order in time. For example, let m n ( i ) = m ( x n , t i ) be temperature at time, t i = i Δ t , and position, x n = n Δ x , where Δ t and Δ x are small increments, and suppose that m ( x , t ) satisfies the thermal diffusion equation, m / t = c 2 m / x 2 + q (with zero boundary conditions). This partial differential equation can be approximated by:

m ( i ) m ( i 1 ) Δ t = c ( Δ x ) 2 Δ 2 m ( i 1 ) + q ( i 1 ) or m ( i ) = D m ( i 1 ) + s ( i 1 )

with D = ( c Δ t ( Δ x ) 2 Δ 2 + I ) m ( i 1 ) and s ( i 1 ) = Δ t q ( i 1 ) (3)

which has the form of dynamical Equation (2). Here, the choices:

Δ 2 [ 1 1 2 1 1 2 1 1 2 1 1 ] and q ( i 1 ) = [ 0 q 2 ( i 1 ) q M 1 ( i 1 ) 0 ] (4)

encode both the differential equation and the boundary conditions. The matrix, D , is sparse in this example, as well as in many other cases in which it approximates a differential operator.

The data equation expresses the relationship between the state vector and the observables:

G ( i ) m ( i ) = d ( i ) (5)

Here, d ( i ) N is the data vector, determined to covariance, C d , and G ( i ) is the data kernel. In the simplest case, the observations may be of selected elements of the state vector, itself, in which case, each row of G is zero, except for a single element, say in column, k, which is unity. Here, k ( i , n ) is a function that associates d n ( i ) with m k ( i ) . Other more complicated relationships are possible. For instance, in tomography, the data is a line integral through m ( x , y , t ) (with y another spatial dimension). The data kernel, G , is sparse in these two cases. In other cases, it may not be sparse.

The data assimilation problem is to estimate the set of state vectors, { m ( i ) : 1 < i K } , using the dynamical equation, the data equation and the initial condition. One possible approach is based on Generalized Least Squares (GLS); another upon Kalman Filtering (KF). In this paper, we demonstrate that these two apparently very different methods are, in fact, exactly equivalent. In order to simplify notation, we concatenate the state vectors for times into an overall vector, m = [ m ( i ) , , m ( K ) ] , and data vectors into an overall vector, d = [ d ( 2 ) , , d ( N ) ] . By assumption, no observations are made at time, i = 1 .

2. Generalized Least Squares Applied to the Data Assimilation Problem

Generalized Least Squares (GLS) [1] [6] [7] [8] is a technique used to estimate m when two types of information are available: prior information and data. By prior information, we mean expectations about the behavior of m that are based on past experience or general physical considerations. The dynamical equation and initial condition discussed in the previous section are examples of prior information. By data we mean direct observations, as typified by the data equation discussed in the previous section.

Prior information can be represented by the linear equation, H m = h (with h L and where the equation is accurate to covariance, C h ) and observations can be represented by the linear equation, G m = d (with covariance C o ). The Bayesian principle leads to the optimal solution, which we denote the Generalize Least Squares (GLS) solution, m G [1] [6] [7] [8]. It minimizes a combination of the weighted L 2 error in prior information and the weighted L 2 error in the data (where the weighting depends upon the covariances).

Several equivalent forms of the GLS solution, m G , and its posterior variance, C m , are common in the literature. We enumerate a few of the more commonly-used forms here:

Form 1 [8] groups the prior information and data equations into a single equation, F m = h :

m G = [ F T F ] 1 F T f with F [ C h 1 / 2 H C o 1 / 2 G ] and f [ C h 1 / 2 h C o 1 / 2 d ]

C m = [ F T F ] 1 (6)

Note that the factors of C h 1 / 2 amd C o 1 / 2 are weights proportional to “certainty”.

Form 2 [1] introduces generalized inverses, G g and H g :

m G = G g d + H g h

with G g A 1 G T C o 1 and H g A 1 H T C h 1 and A G T C o 1 G + H T C h 1 H

C m = A 1 (7)

Here, m G is the estimated solution and C m is its posterior covariance.

Form 3 [7] organizes the solution in terms of the prior state vector, m A ; that is, the state vector implied by the prior information, acting alone:

m G = G g d + P G m A with P G ( I G g G )

and with m A { H 1 h H 1 [ H T C h 1 H ] 1 H T C h 1 h H 1 and C A = [ H T C h 1 H ] 1

C m = P G C A (8)

The matrix, P G , plays the role of a projection matrix. See Appendix A.1 for a deviation of the covariance equation.

Form 4 [1] introduces the deviation, Δ m , of the solution from the prior state vector, and the corresponding deviation, Δ d , of the data from that predicted by the prior state vector:

Δ m = G g Δ d and Δ m m G m A and Δ d d G m A (9)

Finally, Form 5 [6] [9] uses as alternate form of the generalized inverse:

Δ m = G g Δ d with G g C A G T A 1 and A [ C o + G C A G T ] (10)

The equality of the two forms was proven by [6] using a matrix identity that we denote TV82-A (Appendix A.2). Because A is M × M and A is N × N , the first form is most useful when M < N ; the second when M > N . However, a decision to use one or the other must also take in consideration the sparsity of the various matrix products.

Form 4 is derived from Form 3 by subtracting A m A from both sides of the Gram equation,

A ( m G m A ) = a A m A

[ G T C o 1 G + H T C h 1 H ] ( m G m A ) = G T C o 1 ( d G m A ) + H T C h 1 ( h H m A ) (11)

and then by requiring that the second term on the right-hand side vanish, which leads to:

A Δ m = G T C o 1 Δ d with Δ d = d G m A and m A = [ H T C h 1 H ] 1 H T C h 1 h (12)

That is, m A is due to the prior information acting along. The deviatoric manipulation is completely general; alternately the first term could have been made to vanish, in which leads to:

A Δ m = H T C h 1 Δ h with Δ h = h H m G and m G = [ G T C o 1 G ] 1 G T C o 1 d (13)

Here, m G is due to the data acting alone. Note that deviatoric manipulations of this type never change the form of the matrix, A . We will apply this principle later in the paper.

In the subsequent analysis, we will focus on the Gram equations:

A m = G T C o 1 d a (14)

The initial condition and the dynamical equation can be into a single prior information equation of the form, H m = h :

[ I D I D I D I ] [ m ( 1 ) m ( 2 ) m ( 3 ) m ( K ) ] = [ m A ( 1 ) s ( 1 ) s ( 2 ) s ( K 1 ) ] withcovariance C h (15)

Here, C h diag ( C A 1 , C s , C s , , C s ) . Several quantities derived from H , and which we will use later, are:

H T = [ I D T I D T I D T I D T I ] and H 1 = [ I D I D 2 D I D 3 D 2 D I ]

H T C h 1 H = [ [ C A 1 1 + D T C s 1 D ] D T C s 1 0 C s 1 D [ C s 1 + D T C s 1 D ] D T C s 1 C s 1 D [ C s 1 + D T C s 1 D ] D T C s 1 D T C s 1 C s 1 ]

H T C h 1 h = [ C A 1 1 m A ( 1 ) D T C s 1 s ( 1 ) C s 1 s ( 1 ) D T C s 1 s ( 2 ) C s 1 s ( N 1 ) D T C s 1 s ( N ) C s 1 s ( N ) ] (16)

The existence of H 1 implies that m A can be uniquely specified. The data equation expresses the relationship between the state vector and the observables, and presuming that no data are available for time, i = 1 , has the form:

[ 0 G ( 2 ) G ( N ) ] [ m ( 1 ) m ( 2 ) m ( M ) ] = [ d ( 2 ) d ( 3 ) d ( N ) ] (17)

This equation is taken to have an accuracy described by the summary covariance matrix, C o diag ( C d , C d , C d , , C d ) . We note that:

G T C d 1 G = [ 0 G ( 2 ) T C d 1 G ( 2 ) T G ( K ) T C d 1 G ( K ) T ] and G T C d 1 d = [ 0 G ( 2 ) T C d 1 d ( 2 ) G ( 3 ) T C d 1 d ( 3 ) G ( K ) T C d 1 d ( N ) ] (18)

The matrix, A , in the Gram equation is symmetric and block-triagonal:

[ A 1 B T B A 2 B T B T B A N ] [ m ( 1 ) m ( 2 ) m ( 3 ) m ( K ) ] = [ a 1 a 2 a 3 a K ] (19)

with elements:

A i = { [ D T C s 1 D + C A 1 1 ] ( i = 1 ) [ D T C s 1 D + C s 1 + G ( i ) C d 1 G ( i ) T ] ( 1 < i < K ) C s 1 + G ( K ) C d 1 G ( K ) T ( i = K )

B = C s 1 D (20)

The vector, a , on the right-hand side of the Gram equation, is:

a i = { [ D T C s 1 s ( 1 ) + C A 1 1 m A ( 1 ) ] ( i = 1 ) [ D T C s 1 s ( i ) + C s 1 s ( i 1 ) + G ( i ) T C d 1 d ( i ) ] ( 1 < i < K ) C s 1 s ( i 1 ) + G ( i ) T C d 1 d ( i ) ( i = K ) (21)

Here, we define s ( K ) to be zero.

3. Recursive Solution Using the Thomas Method

Insight into the behavior of the GLS solution can be gained by solving the Gram equation iteratively [10]. We the Thomas method [11] [12] (see Appendix A.3), though other methods are viable alternatives. It consists of a forward-in-time pass through the system that recursively calculates two quantities, A ^ i and a ^ i :

A ^ i 1 { A 1 1 ( i = 1 ) [ A i B A ^ i 1 1 B T ] 1 ( i > 1 )

a ^ i { a 1 ( i = 1 ) [ a i B A ^ i 1 1 a ^ i 1 ] ( i > 1 ) (22)

After the forward recursion, the system is block-upper-bidiagonal with row i having elements A ^ i and B T (except for the last row, which lacks the B T ) and the modified right-hand side is a ^ i . The solution, m G ( i ) , is achieved through a backward recursion:

m G ( i ) = { A ^ K 1 a ^ K ( i = K ) A ^ i 1 [ a ^ i B T m ( i + 1 ) ] ( i < K ) (23)

It is evident that information is propagated both forward and backward in time during the solution process. Furthermore, computation time grows no faster than the number of steps, K, in the recursion.

The Thomas method has a disadvantage in the common case where the covariances matrices, C A 1 , C s and C d are diagonal and when D and G ( i ) are sparse, because although F is then also sparse, the matrices, A ^ i 1 , are in general not sparse, so the effort needed to compute them scales with M 3 . Other direct methods share this limitation, too. Consequently, the overall calculation scales with K M 3 . An iterative method [10], such conjugate gradient method, applied to the Gram equation, F T F m = F T h , is usually a better choice. This method requires that the quantity, u = ( F T F ) v , be calculated for an arbitrary vector, v , and this quantity can be very efficiently calculated as u = F T ( F v ) [13]. In cases in which the dynamical equation approximates a partial differential equation, the number of non-zero elements in the matrix, F , scale with K M . The conjugate gradient algorithm requires no more than K M iterations (and often much fewer), each requiring K M multiplications. Thus, the overall solution time scales with K 2 M 2 . Consequently, the conjugate gradient method has a speed advantage when M > K .

4. Present-Time Solution

Suppose that the analysis focuses on the “present-time”, i, in the sense that only the solution, m P ( i ) , determined using data up to and including time, i, is of interest. One can assemble a sequence of present-time solutions during the forward recursion, using the fact that the ith solution can always be considered to be the final one. No backwards recursion is needed to compute the solution for the final time. However, the forms of the “final” A ^ i and a ^ i differ from that of the previous A s in the recursion, so a separate computation is needed:

m P ( i ) = A ^ i 1 a ^ i

A ^ i = A ^ i D T C s 1 D = C s i 1 + G ( i ) C d 1 G ( i ) T and a ^ i = a ^ i + D T C s 1 s ( i ) (24)

Consequently, in order to create a sequence of present-time solution, the two linear systems must be solved at each step in the forward recursion. The present-time solution is the same as the reference solution, m D ( i ) , defined in the previous section.

Kalman Filtering

Kalman Filtering (KF) is a solution method with an algorithm that, like the Thomas present-time solution, is forward-in-time, only [2] [3]. It consists of four steps, the final three which are iterated.

Step 1 assigns the i = 1 solution, m K ( 1 ) , and its covariance, C m ( 1 ) :

m K ( 1 ) = m A ( 1 ) and C m ( 1 ) = C A 1 (25)

Step 2 propagates the solution and its covariance forward in time using the dynamical equation, and considers it to be prior information.

m A ( i ) = ( D m K ( i 1 ) + s ( i 1 ) ) and C A ( i ) = D C m ( i 1 ) D T + C s (26)

Step 3 uses GLS to combine the prior information, m A ( i ) , with covariance C A ( i ) , and data, d ( i ) , with covariance C d , into a solution, m K ( i ) , with covariance, C m . Any of the equivalent forms of the GLS solutions described in Section 2 can be used in this step.

Step 4, increments i and returns to Step 2, creating a recursion.

Most often, the GLS solution and its variance are written as:

m K ( i ) = m A ( i ) + G i g Δ d ( i ) with Δ d ( i ) = d ( i ) G ( i ) m A ( i )

with G i g = C A ( i ) G ( i ) T [ C d ( i ) + G ( i ) C A ( i ) G ( i ) T ] 1 and C A ( i ) = D C m ( i 1 ) D T + C s

and C m ( i ) = [ I G i g G ( i ) ] C A ( i ) = P G ( i ) C A ( i ) and P G ( i ) [ I G i g G ( i ) ] (27)

However, any of the equivalent forms described above can substitute, such as:

m K ( i ) = A ˜ i 1 a ˜ i

C A ( i ) = D C m ( i 1 ) D T + C s and A ˜ i 1 = [ G ( i ) T C d 1 G ( i ) + [ C A ( i ) ] 1 ] 1 = C m ( i )

a ˜ i = G ( i ) T C d 1 d + [ C A ( i ) ] 1 D m K ( i 1 ) + [ C A ( i ) ] 1 s ( i 1 ) (28)

5. Kalman Filtering Is Not “Filtering” in the Strict Sense

A standard Infinite Impulse Response (IIR) filter has the form v m = u z , where z is the “input” timeseries, m is the “output” timeseries, u and v are filters (with v 1 = 1 ) and signifies convolution [14]. Key to this formulation is that the filter coefficients are constants; that is, they are not a function of time.

If the generalized inverse in KF was time-independent, so that G i g = G g , then KF could be put into form of an IIR filter:

[ [ I ] [ G g G D I ] ] [ m ( i 1 ) m ( i ) ] = [ G g G g G ] [ [ d ( i 1 ) s ( i 2 ) ] [ d ( i ) s ( i 1 ) ] ] (29)

because the convolution reproduces the KF solution:

m ( i ) = m ( i 1 ) + G g ( d ( i ) G ( D m ( i 1 ) + s ( i 1 ) ) ) (30)

So, from this point of view, the KF has a v of length 2 (each element of which is a matrix), and a u of length 1 (each element of which is a row vector of two matrices). However, this formulation does not really correspond to a standard IIR filter, because the filter coefficients, which depend upon the generalized inverse, G i g , depend upon time, i. Hence, the word “filter”, though generally indicative of the KF process, oversimplifies the actual operation being performed. KF is not filtering in the strict sense.

6. The Present-Time Thomas and Kalman Filtering Solutions Are Equal

We will now demonstrate that the present-time Thomas solution, m P ( i ) , and the Kalman filtering solution, m K ( i ) , are equal. We will make use of an identity, abbreviated TV82-B, that is due to [6], which shows that for invertible symmetric matrices C 1 and C 2 and arbitrary matrix, M :

C 2 C 2 M T [ M C 2 M T + C 1 ] 1 M C 2 = [ M T C 1 1 M + C 2 1 ] 1 (31)

Thus, for instance, when C 1 1 = C A ( i ) , C 2 1 = C s and M = D T :

[ C A ( i ) ] 1 = [ D C m ( i 1 ) D T + C s ] 1 = C s 1 C s 1 D [ D T C s 1 D + [ C m ( i 1 ) ] 1 ] 1 D T C s 1 (32)

The PT and KF recursions both start with m K ( 1 ) = m P T ( 1 ) = m A and C K A ( 1 ) = C P T A ( 1 ) = C A . The ( i = 2 ) case in irregular, and must be examine separately. The KF solution is:

m K ( 2 ) = A ˜ 2 1 a ˜ 2

with A ˜ 2 = G ( 2 ) T C d 1 G ( 2 ) + [ C A ( 2 ) ] 1 with C A ( 2 ) = D C A 1 D T + C s

and a ˜ 2 = G ( 2 ) T C d 1 d ( 2 ) + [ C A ( 2 ) ] 1 D m K ( 1 ) + [ C A ( 2 ) ] 1 s ( 1 ) (33)

This can be compared with the present-time Thomas solution:

m P ( 2 ) = A ^ 2 1 a ^ 2

with A ^ 2 = G ( 2 ) C d 1 G ( 2 ) T + C s 1 C s 1 D [ D T C s 1 D + C A 1 1 ] 1 D T C s 1 = G ( 2 ) C d 1 G ( 2 ) T + [ D C A 1 D T + C s ] 1

a ^ 2 = G ( 2 ) T C d 1 d ( 2 ) + C s 1 s ( 1 ) C s 1 D A ^ 1 1 D T C s 1 s ( 1 ) + C s 1 D A ^ 1 1 C s 1 D C A 1 1 m A (34)

Note that we used TV82-B to simplify the expression for A ^ 2 . By inspection, A ^ 2 = A ˜ 2 . Thus, the two solutions are equal if a ^ 2 = a ˜ 2 . The terms involving d ( 2 ) match. The terms involving s ( 1 ) would match if it could be shown that:

C s 1 C s 1 D [ D T C s 1 D + C A 1 1 ] 1 D T C s 1 = ? [ D C A 1 D T + C s ] 1 (35)

But this equation is true by TV82-B. The terms m A also match, because of the matrix identity

[ D C A 1 D T + C s ] 1 D = C s 1 D [ D T C s 1 D + C A 1 1 ] 1 D C A 1 1 (36)

derived in Appendix A.4. Consequently, the solutions, m K ( 2 ) = m P ( 2 ) , and their posterior covariances, C K m ( 2 ) = A ^ 2 1 = C P m ( 2 ) = A ˜ 2 1 are equal. Applying C K m ( i ) = A ˜ i 1 to the KF recursion, and TV82-B and A ^ i = A ^ i + D T C s 1 D T to present-time Thomas recursion, leads to:

A ˜ i + 1 = G ( i + 1 ) T C d 1 G ( i + 1 ) + [ D A ˜ i 1 D T + C s ] 1

A ^ i + 1 = G ( i + 1 ) C d 1 G ( i + 1 ) T + C s 1 C s 1 D [ D T C s 1 D + A ^ i ] 1 D T C s 1 = G ( i + 1 ) C d 1 G ( i + 1 ) T + [ D A ^ i 1 D T + C s ] 1 (37)

Thus, A ˜ i + 1 = A ^ i + 1 as long as A ˜ i 1 = A ^ i 1 . Because the latter is true for i = 2 , so the formula can be successively applied to show A ˜ i + 1 = A ^ i + 1 for all i > 2 . Similarly, the procedure that demonstrated the equality of a ^ 2 and a ˜ 2 can be extended to

a ˜ i + 1 = G ( i + 1 ) T C d 1 d ( i + 1 ) + [ D A ˜ i 1 D T + C s ] 1 s ( i ) + [ D A ˜ i 1 D T + C s ] 1 D m K ( i )

and

a ^ i + 1 = C s 1 s ( i ) + G ( i + 1 ) T C d 1 d ( i + 1 ) + C s 1 D A ^ i 1 [ a ^ i D T C s 1 s ( i ) ] = G ( i + 1 ) T C d 1 d ( i + 1 ) + [ C s 1 C s 1 D A ^ i 1 D T C s 1 ] s ( i ) + C s 1 D A ^ i 1 a ^ i = G ( i + 1 ) T C d 1 d ( i + 1 ) + [ C s + D A ^ i 1 D T ] 1 s ( i ) + C s 1 D A ^ i 1 A ^ i m P ( i ) (38)

Here we have used TV82-B and m P ( i ) = A ^ i 1 a ^ i . The terms ending in d ( i + 1 ) match. The terms ending in s ( i ) also match, since it has been established previously that A ˜ i 1 = A ^ i 1 . In order for a ˜ i + 1 to equal a ^ i + 1 , we must have m P ( i ) = m K ( i ) and:

[ D A ˜ i 1 D T + C s ] 1 D = ? C s 1 D [ A ^ i 1 + D T C s 1 D ] A ^ i (39)

This equation has the same form as identity (36), where the equality has been demonstrated. Starting with i = 2 , we have m P ( 2 ) = m K ( 2 ) and A ^ 2 = A ˜ 2 , which implies A ^ 3 = A ˜ 3 and a ^ 3 = a ˜ 3 , which implies m P ( 3 ) = m K ( 3 ) . This process can be iterated indefinitely, establishing that the present-time Thomas and Kalman solutions, and their posterior variance, are equal.

7. Comparison between the Present-Time Solution and GLS

The present-time solutions at time, j, depends on information available for times, ( i j ) but not upon information that subsequently becomes available (that is, for times ( i > j ) . This limitation is necessary in a real-time scenario. However, the lack of future data leads to a solution that is poorer estimate of the true solution, than a GLS solution in which the state vectors at all time are globally adjusted to best-fit all the prior information and data.

An outlier that occurs at, or immediately before, the present moment can cause large error in the present-time solution. The full GLS solution is less affected because measurements in the near future may compensate (Figure 1).

Having established links between KF and GLS, we are now able to apply several useful inverse theory concepts and especially resolution [4] [5] [7]. In a GLS problem, model resolution refers to the ability of the data assimilation process to reconstruct deviations of the true model from the one predicted by the prior information, alone [1]. Data resolution refers to its ability to reconstruct deviations of the data from the one predicted by the prior information [9]. Model resolution is quantified by a ( K M ) × ( K M ) matrix, R = G g G and data resolution by a ( K N ) × ( K N ) matrix, N = G G g , that satisfy.

( m m A ) = R ( m t r u e m A ) and ( d G m A ) = N ( m t r u e G m A ) (40)

Figure 1. Hypothetical data assimilation scenario, with K = 4 , M = N = 1 , D = 1 , s = 0.25 . The initial condition (red box) satisfies m A ( 0 ) = 0 . The dynamical equation requires that the slope (dotted red lines) be about 1/4. The data, d ( i ) , are noisy versions of state m ( i ) , which is expected to linearly increase with time with slope, s. When the “present-time” is i = 3 , the present-time solution (grey), m P ( 3 ) , is pulled down by the noisy datum, d ( i ) , leading to a poor fit to the dynamics at that time. The GLS solution, m G ( 3 ) , is less affected by the outlier, because the datum at time, i = 4 , better defines the linear trend, leading to a solution at i = 3 that better matched the dynamics.

Resolution is perfect when R = N = I . When R I , an element of the reconstructed state vector is a weighted average of all elements of the true state vector. When N I , an element of the reconstructed data vector is a weighted average of all elements of the true data vector. The resolution matrices quantify resolution in both time and space. As we will show in the example, below, the model resolution (or data resolution) can be temporally poor, even when it is spatially good.

Another important quantity is the full ( K M ) × ( K M ) posterior covariance matrix, C m = [ F T F ] 1 . In addition to correlations between elements of the state vector at a given time, it contains the correlations between elements of the state vectors at different times. These coefficients are needed for computing confidence intervals of quantities that depend of the state vectors at two or more times. Although R , N and C m are large matrices, methods are available for efficiently computing selected elements of them using the conjugate gradient method [1].

Finally, there may be instances in which the covariance, C s ( p ) , may be known only up to some scalar parameter, p . For example, the choice [ C s ( p ) ] i j = γ 2 exp { 1 2 p 2 ( x i x j ) 2 } expresses decorrelation over a scale length, p. An initially poor guess of the parameter, p, can be “tuned” using partial derivatives based on the Bayesian principle [15] and the gradient descent method [16].

8. Example

We consider a data assimilation problem based on the heat diffusion Equation (3), with Δ t = Δ x = 1 and c 0 = 0.4 . The state vector, m ( i ) , represents temperature at position, x , and is of length M = 31 . The source is Gaussian function in space and impulsive function in time, plus additive noise:

s j ( i ) = exp [ 1 2 s x 2 ( x j x ¯ ) 2 ] δ i 1 + n s ( j ) (41)

with scale length, s x = 5 , and peak position, x ¯ = 1 2 M Δ x . Here, n s ( j ) is a Normally-distributed random variable with zero mean and variance, σ s 2 = 0.05 . The initial condition is:

[ m A ( 0 ) ] i = m 0 + n A ( j ) (42)

where m 0 = 0.1 and n A ( j ) is a Normally-distributed random variable with zero mean and variance, σ d 2 = 0.07 . The dynamical Equation (2) is iterated for K = 61 time steps, to provide the “true” state vector. As expected, the solution has a Gaussian shape with a width that increases, and an amplitude that decreased, with time (Figure 2(A)). The data are a total of N = 10 temperature measurement at each time, i 2 , made at randomly-selected positions (without duplications) and perturbed with Normally-distributed random noise with zero mean and variance, σ d 2 = 0.10 .

GLS solutions were computed by both the full Thomas algorithm (Figure 2(B)) and by solving the [ F T F ] m = F T f system by the conjugate gradient method (not

Figure 2. Comparison of solutions. (A) True solution. (B) Difference between the Generalized least squares solution and the true solution. (C) Difference between the present-time Thomas solution and the true solution. (C) Difference between the Kalman solution and the true solution. Note that color scale for differences is expanded with respect to the one for the true model.

shown). They were found to be identical to machine precision. Present-time solutions were computed for both the present-time Thomas (Figure 2(C)) and KF algorithms (Figure 2(D)). They were also found to be identical to machine precision.

In general, both the GLS and present-time solutions fit the data well. However, the present-time solution matches the true model more poorly than does the GLS solution (Figure 3). In this numerical experiment, the present-time solution is about 10% poorer than the GLS solution, quantified with the root mean squared deviation from the true solution. However, the percentage, while always positive, varies considerably when the underlying parameters are changed.

Both the Thomas and Kalman versions of the present-time algorithm are well suited for providing ongoing diagnostic information, such as posterior covariance and root mean squared data fitting error (Figure 4), which can provide quality control in real time applications.

Figure 3. Histogram of ratios of the root mean squared error between the present-time estimate, m P ( i ) , i = 1 , , K , and the true state and the root mean squared error between GLS estimate, m G ( i ) , i = 1 , , K , and the true state, for 1000 realizations of the exemplary data assimilation problem.

Figure 4. Kalman Filtering solution of the exemplary data assimilation problem. (A) the data, d ( i ) , i = 1 , , N . (B) The estimated state m K ( i ) , i = 1 , , K . (C) The root mean square data prediction as a function of time, i. (D) The posterior variance as a function of time, i.

The model resolution matrix, R , (Figure 5(A)) for this exemplary problem has a poorly-populated central diagonal, meaning that some elements of the state vector, m j ( i ) , are well-resolved from their spatial neighbors, m j ± 1 ( i ) while others are very poorly resolved. The matrix large elements along other diagonals, corresponding to, offset from the main diagonal by M rows, indicating that some elements are not well resolved from their temporal neighbors, m j ( i ± 1 ) . The temporal width of the resolving kernel is about ±4, and the shape is very irregular, indicating very uneven averaging is taking place (Figure 6). The data resolution matrix, N , (Figure 5(A)) has a well-populated central diagonal, meaning that mosts elements of the predicted data vector, d j p r e ( i ) , are well-resolved from their spatial neighbors, d j ± 1 p r e ( i ) . Like R , it also has elements along other diagonals, corresponding to, offset from the main diagonal by M rows, indicating that some elements are not well resolved from their temporal neighbors, d j p r e ( i ± 1 ) .

Figure 5. Central portion of the resolution matrices for the exemplary data assimilation problem. (A) The model resolution matrix, R . (B) The data resolution matrix, N . The portion shown corresponds time slices 20 i 23 .

Figure 6. Rows of the model resolution matrix, R (colors), rearranged into an ( x , t ) grid, for three target points (circles) (A) x = 15 , t = 12 ; (B) x = 15 , t = 25 ; (C) x = 15 , t = 38 . The high intensities are localized near the target points (good), but have a rough pattern (bad).

9. Conclusion

In this paper, we examine a data processing scenario in which real-time data assimilation is performed using Kalman Filtering, and then reanalysis is performed using generalized least squares (GLS). In this problem, spatial characteristics of the system are described by a state vector (mode parameter vector), and its temporal characteristics by the evolution of the state-vector with time. We explore the relationship between the real-time Kalman Filter estimate and the GLS reanalysis estimate of the state vector. We show that the KF solution at a given time is equal to the GLS solution that one would obtain if it excluded data for future times. Furthermore, we show that the recursive procedure in KF is exactly equivalent to the solution of the GLS problem via Thomas’ algorithm for solving the block-tridiagonal matrix that arises in the reanalysis problem. This connection indicates that GLS reanalysis is better considered the final step of a single process, rather than a “different method” arbitrarily being applied, post factor. Now that this connection between KF and GLS has seen identified, the familiar GLS concepts of model and data resolution can be applied to KF. We provide an exemplary problem, based on thermal diffusion. In addition to showcasing our result, the example demonstrates that the state vector and vector of predicted data can be poorly-resolved in time, even when they are well resolved in space.

Acknowledgements

The author thanks the students in his 2022 Geophysical Inverse Theory course at Columbia University (New York, USA), and especially George Lu, for helpful discussion.

Appendix

A.1) Proof that C m = [ I G g G ] C A P G C A . This derivation is well known [6], but is presented here for completeness and to point out potential pitfalls should it be applied incorrectly. Because d and m A are independent of one another, m G = G g ( d G m A ) + m A = G g d + ( I G g G ) m A , the normal rules of error propagation apply. The posterior covariance, C m , is:

C m = G g C d G g T + ( I G g G ) C A ( I G T G g T ) = C A + G g C d G g T + G g G C A G T G g T C A G T G g T G g G C A = C A + G g [ G C A G T + C d ] G g T C A G T G g T G g G C A = C A + C A G T A 1 A A 1 G C A C A G T A 1 G C A C A G T A 1 G C A = C A + C A G T A 1 G C A C A G T A 1 G C A C A G T A 1 G C A = C A C A G T A 1 G C A = C A G g G C A = [ I G g G ] C A P G C A (A.1)

Here, we have used the fact that G g = C A G T A 1 , with A = [ G C A G T + C d ] . Although P G , has the form of a projection operator, it is a function of C A , and has deceptive properties. Consider the case in which C A = ε S , where S is an invertible symmetric matrix. In the limit of the parameter, ε , becoming indefinitely large, C m does not also become indefinitely large, but rather tends to [ G T C d 1 G ] 1 , which is the posterior variance that arises from the data, only. The zero limit tends to zero; that is, when m A is known very accurately, so is m G .

A.2) Derivation of the TV82-A and TV82-B identities, following [6]. Consider invertible symmetric matrices, C 1 and C 2 , and arbitrary matrix, M . The expression M T + M T C 1 1 M C 2 M T can alternately be factored:

M T C 1 1 [ C 1 + M C 2 M T ] = [ C 2 1 + M T C 1 1 M ] C 2 M T (A.2)

Multiplying by the inverses yields identity TV82-A:

C 2 M T [ C 1 + M C 2 M T ] 1 = [ C 2 1 + M T C 1 1 M ] 1 M T C 1 1 (A.3)

Now consider the expression C 2 C 2 M T [ C 1 + M C 2 M T ] 1 M C 2 , which by the above identity equals C 2 [ C 2 + M T C 1 1 M ] 1 M T C 1 1 M C 2 . Factoring out the term in brackets

[ C 2 1 + M T C 1 1 M ] 1 [ [ C 2 1 + M T C 1 1 M ] C 2 M T C 1 1 M C 2 ] (A.4)

cancelling terms yields identity TV82-B:

C 2 C 2 M T [ C 1 + M C 2 M T ] 1 M C 2 = [ C 2 1 + M T C 1 1 M ] 1 (A.5)

A.3) The Thomas algorithm [10] for a symmetric block-diagonal matrix is well-known; we reproduce it here for completeness. The th row of the matrix has elements B , A i , B T and the right-hand size is a i . Consider the step in the upper-triangularization process when rows ( i 1 ) and above have been triangularized, but rows ( i 1 ) and below have not:

A ^ i 1 m ( i 1 ) + B T m ( i ) = a ^ i 1

B m ( i 1 ) + A i m ( i ) + B T m ( i + 1 ) = a i (A.6)

The second row is modified by multiply the top row by B A ^ i 1 1 and adding the result to the second, which eliminates the first term, yielding:

[ A i B A ^ i 1 1 B T ] m ( i ) + B T m ( i + 1 ) = a i B A ^ i 1 1 a ^ i 1 (A.7)

Note that the new row has two terms, and that the coefficient of the second is always B T , which is the same pattern as the first row. Thus, the bottom row becomes a new top row, and the recursion is

A ^ 1 1 = A 1 1 followed by A ^ i 1 = [ A i B A ^ i 1 1 B T ] 1

a ^ 1 = a 1 followed by a ^ i = a i B A ^ i 1 1 a ^ i 1 (A.8)

After the recursion, the matrix upper-bidiagonal with diagonals, A ^ i and B T and the right-hand side is a ^ i . It is back-solved as:

m ( K ) = A ^ K 1 a ^ K followed by m ( i ) = A ^ i 1 [ a ^ i B T m ( i + 1 ) ] (A.9)

A.4) Verification of the identity in (36):

[ D C A 1 D T + C s ] 1 D = ? C s 1 D [ D T C s 1 D + C A 1 1 ] 1 D C A 1 1

[ D C A 1 D T + C s ] 1 D C A 1 = ? C s 1 D [ D T C s 1 D + C A 1 1 ] 1

D C A 1 [ D T C s 1 D + C A 1 1 ] = ? [ D C A 1 D T + C s ] C s 1 D

[ D C A 1 D T C s 1 D + D ] = [ D C A 1 D T C s 1 D + D ] (A.10)

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Menke, W. (2014) Review of the Generalized Least Squares Method. Surveys in Geophysics, 36, 1-25.
https://doi.org/10.1007/s10712-014-9303-1
[2] Stratonovich, R.L. (1959) Optimum Nonlinear Systems Which Bring about a Separation of a Signal with Constant Parameters from Noise. Radiofizika, 2, 892-901.
[3] Zarchan, P. and Musoff, H. (2000) Fundamentals of Kalman Filtering: A Practical Approach. American Institute of Aeronautics and Astronautics, Reston.
[4] Backus, G. and Gilbert, F. (1968) The Resolving Power of Gross Earth Data. Geophysical Journal International, 16, 169-205.
https://doi.org/10.1111/j.1365-246X.1968.tb00216.x
[5] Wiggins, R.A. (1972) The General Linear Inverse Problem: Implications of Surface Waves and free Oscillations for Earth Structure. Reviews of Geophysics, 10, 251-285.
https://doi.org/10.1029/RG010i001p00251
[6] Tarantola, A. and Valette, B. (1982) Inverse Problems = Quest for Information. Journal of Geophysics, 50, 159-170.
[7] Menke, W. (2018) Geophysical Data Analysis: Discrete Inverse Theory. 4th Edition (Textbook), Elsevier, Amsterdam, 350.
[8] Menke, W. and Menke, J. (2016) Environmental Data Analysis with MATLAB. 2nd Edition, Academic Press (Elsevier), Cambridge, MA, 342 p.
https://doi.org/10.1016/C2015-0-01993-1
[9] Menke, W. and Creel, R. (2021) Gaussian Process Regression Reviewed in the Context of Inverse Theory. Surveys in Geophysics, 42, 473-503.
https://doi.org/10.1007/s10712-021-09640-w
[10] Dax, A. (2020) The Equivalence between Orthogonal Iterations and Alternating Least Squares. Advances in Linear Algebra & Matrix Theory, 10, 7-21.
https://doi.org/10.4236/alamt.2020.102002
[11] Thomas, L.H. (1949) Elliptic Problems in Linear Difference Equations over a Network. Watson Scientific Computing Laboratory Report, Columbia University, New York, 71.
[12] Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P. (2007) Numerical Recipes: The Art of Scientific Computing. 3rd Edition, Cambridge University Press, New York.
[13] Menke, W. (2005) Case Studies of Seismic Tomography and Earthquake Location in a Regional Context. In: Levander, A. and Nolet, G., Eds., Seismic Earth: Array Analysis of Broadband Seismograms, Vol. 157, American Geophysical Union, Washington DC, 7-36.
https://doi.org/10.1029/157GM02
[14] Schlichthärle, D. (2000) Digital Filters: Basics and Design. Springer, Berlin, 361 p.
https://doi.org/10.1007/978-3-662-04170-3
[15] Menke, W. (2021) Tuning of Prior Covariance in Generalized Least Squares. Applied Mathematics, 12, 157-170.
https://doi.org/10.4236/am.2021.123011
[16] Snyman, J.A. and Wilke, D.N. (2018) Practical Mathematical Optimization: Basic Optimization Theory and Gradient-Based Algorithms. Springer Optimization and Its Applications, Vol. 133, 2nd Edition, Springer, New York, 340 p.
https://doi.org/10.1007/978-3-319-77586-9_9

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.