Rolling Gaussian Process Regression with Application to Regime Shifts ()
1. Introduction
Gaussian Process Regression (GPR) is a popular data assimilation method that can be used to reconstruct a field,
, that varies with spatial coordinates,
(
) [1] [2] [3] [4]. Observations of the field at particular positions, together with prior information on its value at these positions and its spatial correlation function, are used to estimate the field at arbitrary positions. GPR is conceptually similar to interpolation, but is not true interpolation because the reconstructed field does not, in general, reproduce the observations (except for the special case called Kriging [1]). This behavior is advantageous in the case of noisy data, because the reconstruction contains fewer defects caused by data outliers.
In many real-world applications, the field that is being reconstructed also has time,
, dependence, that is,
. Furthermore, observations often are not made synchronously, but rather in a sporadic and ongoing manner. When the goal is to estimate the current state of the field, older measurements become obsolete and only the most recent observations are relevant. The choice of the length of the observation window, T, affects both the resolution of the reconstruction (for resolution improves with the number of data) and the variance of the reconstruction (for the inclusion of new data decreases variance but the inclusion of obsolete data increases it). The issue of window length is particularly acute when the field is relatively stable, except for undergoing sporadic reorganizations. These so-called regime shifts are common features of fields associated with the Earth’s biosphere [5] [6] [7], climate system [8] [9] and geodynamo [10] [11].
We use a formulation of GPR [3] in which the M model parameters,
, and corresponding spatial coordinates,
(where the semicolon implies vertical concatenation), are divided into target, t, groups of length M and control, c, groups of length N. Only the control group is observed by data, d. The control points represent observations of the field at irregularly-spaced positions and the target points represent its values on a regularly-spaced grid. A prior estimate of the model parameters,
, their covariance,
, and the covariance of the data,
, are assumed to be known. The GPR estimate of the model parameters is then:
(1)
The formula for the solution,
, should be evaluated from right to left for maximum computational efficiency. The matrix inverse,
, appears both in the estimated solution,
, and its posterior covariance,
[4], which is an important diagnostic of the solution’s quality. In typical cases,
is far from being sparse, so when N is large,
can be computationally expensive to compute. Alternatives methods that omit the calculation of
, such as solving the linear system,
, and then calculating
, are nearly as expensive and do not provide a (simple) pathway for computing the posterior covariance.
Suppose that we know the GPR solution when the control group has
element, ordered by time of observation. Our goal is to compute an updated solution after we delete the first
observations from the group, leaving
, and then append
elements, so that it now has
elements. This process, which we term rolling GPR (or moving-window GPR) is repeated many times as new data become available and old data are deemed obsolete. The process ensures that the estimate of the field is kept up-to-date.
The structure of this paper is as follows: The process of performing rolling GPR is detailed in Methodology section, and issues concerning its implementation are discussed. This process is divided into four conceptually distinct parts: discarding obsolete data from the GPR problem, appending new data, detecting and correcting the error in the solution, and detecting and responding to regime changes through changes in window length. The complete process of performing rolling GPR is outlined in Procedure section. Examples section provides a demonstration of the technique, applied to the reconstruction of a two-dimensional field with a regime shift, using an exemplary synthetic dataset and a Gaussian prior covariance function. Finally, a discussion of the efficiency of the discard-append process, together with summary remarks, are presented in Discussion and Conclusion section.
2. Methodology
Discarding Obsolete Data. The quantities
,
,
,
,
,
and
all need to be modified when data are discarded. The vectors,
and
are modified by retaining only their last
elements:
(2)
Here,
and
refer to the deleted group and retained group, respectively. The corresponding xs must be modified in an identical manner. Similarly, only one block within the covariances matrices is retained:
(3)
The updating of
requires more effort, but uses only well-known techniques. At the start of the process, the
matrix,
, and its inverse,
, are known. These matrices are partitioned into submatrices:
(4)
Here,
and
are
,
and
are
, and
and
are
, with
. We mention an identity involving
,
,
and
that will be used later in the paper:
(5)
It can be derived by multiplying out the block matrix form of
, solving the diagonal elements for
and
, and substituting the result into the off-diagonal elements. The process of removing the first
data corresponds to replacing
with
, and
with
. An efficient method for computing
can be designed using the Woodbury identity [12]:
(6)
Here,
,
,
and
are conformable matrices. The reader may confirm by direct substitution that:
(7)
are consistent choices of the several matrices (with the subscripts indicating the sizes of selected matrices). Thus,
can be read from the lower-right-hand block of
. Note that the matrix,
, is
, so that its inverse requires less computational effort than does the
matrix,
, as long as a relatively few data are being removed. Below, we will develop an analytic formular for
that further reduces the size of the necessary matrices to
). An explicit formula for
is obtained by manipulating the block matrix form of the Woodbury formula, starting with:
(8)
Note that the off-diagonal elements of this matrix are symmetric, and the off-diagonal elements are transposes of one another. Its inverse has the form:
(9)
The formulas for
and
were derived by multiplying out the block diagonal form of
, solving the diagonal elements for
and
, and applying identity Equation (5). Note that both
and
are symmetric matrices. The products
and
have the forms:
(10)
Thus, the expression,
, can be simplified to:
(11)
The lower right-hand block of Equation (11) becomes the new
:
(12)
This formula has been tested numerically, and was found to be correct to within machine precision. Accuracy and speed are improved thorough the use of coding techniques that utilize and enforce the symmetry of all the symmetric matrices.
Appending New Data. As before, the quantities
,
,
,
,
,
and
all need to be modified when new data are appended. The vectors,
and
are modified by appending
elements:
(13)
The corresponding xs must be modified in an identical manner. Similarly, new blocks are appended to the covariances matrices:
(14)
As before, the updating of
requires more effort, but use a well-known technique based on the Bordering method [13] [14]. We rename the existing matrix,
, to
, as it will become the upper-left block of the modified
. Both
and its inverse,
, are known
matrices. We seek to add the blocks,
and
, associated with the
new data, creating an augmented
matrix,
, with
. The matrix,
, and its inverse satisfy:
(15)
Multiplying out the equation and solving for
,
and
yields:
(16)
The matrices then become:
(17)
These formulas have been tested numerically, and are correct to within machine precision. Accuracy and speed are improved thorough the use of coding techniques that utilize and enforce the symmetry of all the symmetric matrices.
Detecting and Correcting of Errors in the Solution. Numerical tests (not shown) indicate that delete/append process can be repeated many (e.g. hundreds) of times without significant loss of precision, at least for the class of matrices encountered in GPR. However, as a precaution, we recommend that the inverse be corrected every few hundred iterations, either though the direct calculation of
from
, or through one application of the Schultz method [15] [16] [17]:
(18)
One way to monitor accuracy is to compute the absolute value of just one (or a few) of the diagonal elements of the error matrix,
. For fixed i, quantity
can be computed very efficiently, because only one inner product (between the ith row of
and the th column of
) need be performed. The correction process then can be initiated when
exceeds a threshold, chosen to be small fraction, say 10−8.
Readjusting Window Length. The posterior data variance,
(19)
is a measure of how well the reconstruction fits the data. The quantity,
, is approximately chi-squared distributed with
degrees of freedom, and therefore has an expected value of
and variance of
. Thus, the expected value of
is
, its variance is
and its 95% confidence bound is
. However,
can be expected to rise well above this bound during a regime shift due to model error, that is, two distinct spatial patterns are being comingled and cannot be fit simultaneously. Shortening the window length tends to bring
closer to the bound, because obsolete data are being discarded more rapidly. This suggests the strategy of defining a threshold, guided by the value of
, and decreasing the window length when E is above it (by setting
when) and increasing it once E has dropped below it (by setting
until
has reached some upper limit).
3. Procedure
Step 1. Decide upon initial values of
,
,
. The choice
implies that the same number of data are appended as are discarded, leaving the window length,
unchanged. The rolling process is most efficient when
and
.
Step 2. Solve the GPR problem for an initial group of
data, as in Equation (1).
Step 3. Discard
data by modifying
and
, and their corresponding xs, as in Equation (2),
and
as in Equation (3) and
and
as in Equation (12).
Step 4. Append
data by modifying
and
, and their corresponding xs, as in Equation (13),
and
as in Equation (14) and
and
as in Equation (17).
Step 5. (Optional) Monitor the error
, where
, for a single index, i, and when it exceeds a threshold of say, 10−8, refine
as in Equation (18).
Step 6. Solve the GPR problem for
data, obtaining
and
as in Equation (1).
Step 7. (Optional) Compute the posterior data variance,
, as in Equation (19) and reassign the values of
,
and
, as needed (as discussed in the Readjusting Window Length section).
Step 8. Once
new are available, iterate the procedure, starting at Step 3.
4. Examples
Test Scenario. In these examples, the two dimenional field,
,
,
is reconstructed on a regular 30 × 30 grid, using data that are acquired at the steady rate of 10 observations per time interval, ∆t. The observations are made at randomly chosen positions and have uncorrelated and uniform error with prior variance,
. The true field experiences a regime shift at time, 25∆t, when it abruptly changes from a four-lobed to a three-lobed pattern:
(20)
The field is assumed to have zero prior value and Gaussian prior covariance:
(21)
Example without Window Length Readjustment. In this example (Figure 1(A)), the size of the rolling set of observations increases with time up to an upper bound of 90. For times,
, the field is correctly reconstructed, showing the correct four-lobed pattern, and the posterior variance is about equal to the
![]()
Figure 1. Example of rolling GPR. (A) Case where the window length is held constant. The window length (top curve) grows to
and then is held constant. The two-dimensional field,
(colors), abruptly changes from a four-lobed patter to a three-lobed pattern, at time,
. The posterior data variance,
(bottom curve), is low, except near the time of the change. (B) Case where the window length is varied. The posterior data variance,
(top black curve), and the time at which it exceeds a threshold (top red curve) is detected. The window length (bottom black curve) is decreased during the interval of high variance, and then increased afterwards, within bounds (bottom two red curves).
prior variance. The field is incorrectly reconstructed during the time interval,
, when the window comingles the two patterns, and the posterior variance is about one thousand times higher than the prior variance. For times,
, the field is correctly reconstructed, showing the correct three-lobed pattern, and the posterior variance returns to its original level. This example confirms the ability of the method to reconstruct the field in the presence of regime shifts.
Example with Window Length Readjustment. In this example (Figure 1(B)), the posterior data variance,
(a measure of the misfit of the data), is monitored, and the size of the rolling set of observations is reduced when it increases past a threshold (but not below a lower bound of
) (Figure 1(B)). Once the error has declined below the threshold, the set size is allowed to increase, up to an upper bound of
. The duration of the interval of elevated error is reduced by a factor of about two compared to the first experiment. The presence of the second three-lobed pattern is evident at an earlier time than in the first example, demonstrating the utility of the window length readjustment.
5. Discussion and Conclusions
The main advantage of the rolling Gaussian Process Regression method that we describe here is its ability to reconstruct a time-varying field without any assumptions about its dynamics. This is in contrast to other data assimilation techniques, such as Kalman filtering [18], in which the differential equation describing the time dependence is assumed to be known.
The
matrix,
, that arises in Gaussian Process Regression is a non-sparse, symmetric positive definite matrix that takes
floating point operations to invert [19]. Careful counting of operations reveals that the discard/append process described above takes about
operations. Thus, for
, it is more efficient when
, that is, when just a few percent of the data are being updated.
The procedure for implementing “rolling GRP” (or moving window GPR) that we present here is more computationally efficient than solving the full GPR problem at each update, at least when the number of data that are deleted/appended is only a few percent of the total number used in the calculation. Regime shifts (sudden large changes in the field) can be detected by monitoring the posterior data variance (a measure of the misfit of the data) during the updates, and their detrimental effect is mitigated by shortening the time window as the variance rises, and then decreasing it as it falls (but within prior bounds). The numerical experiments presented here demonstrate the viability and usefulness of the procedure.
Acknowledgements
The author thanks Roger Creel of Columbia University (New York, USA) for the helpful discussion.