Fourth-Order Predictive Modelling: I. General-Purpose Closed-Form Fourth-Order Moments-Constrained MaxEnt Distribution ()
1. Introduction
Scientific progress stems from the judicious combination of experimental information with results produced by computational models, neither of which are perfectly accurate. On the one hand, computations are afflicted by errors stemming from numerical procedures, uncertain model parameters, boundary/initial conditions and/or imperfectly known physical processes. On the other hand, results of measurements are afflicted by experimental errors, which means that around any reported experimental value there always exists a range of values that may also be plausibly representative of the true but unknown value of the measured quantity. Extracting “best estimate” values for model parameters and predicted results, together with “best estimate” uncertainties for these parameters and results requires the combination of experimental and computational data and their uncertainties.
The earliest systematic activities aimed at extracting best-estimate values for model parameters were initiated in the mid-1960s [1] [2] , in the course of evaluating neutron cross sections by using time-independent reactor physics models and experiments for evaluating, computationally and experimentally, so-called “integral quantities” (also called “system responses”) such as reaction rates and multiplication factors. A decade later, these activities had reached conceptual maturity under the name of “cross section adjustment” methodology [3] [4] . This methodology employs a user-defined least-square functional for combining uncertainties in the model parameters with uncertainties in the experimental data, subject to the hard constraint represented by the reactor physics model linearized with respect to the model parameters. The resulting “adjusted” neutron cross sections (model parameters) and their “adjusted” uncertainties were subsequently employed in the respective reactor physics model to predict improved “model responses,” such as improved reaction rates and reaction rate ratios, reactor multiplication factors, Doppler coefficients. By the late 1970s, adjoint neutron fluxes were introduced and used [5] [6] to compute efficiently the first-order response sensitivities, which were used as weighting functions in the least squares adjustment procedure. It is important to note that all of these works dealt with the time-independent linear neutron transport or diffusion equation, as encountered in reactor physics and shielding, for which the corresponding adjoint equations were already known and readily available.
In the late 1980s and during the 1990s, the fundamental concepts underlying the above-mentioned “data adjustment” methodology were “rediscovered” in the geophysical sciences while developing the so-called “data assimilation” procedure, in that the concepts underlying “data assimilation” are the same as those underlying the previously developed “data adjustment” procedure. Since then, numerous works have been published on “data assimilation;” the most representative can be found cited in the books by Lewis et al. [7] , Lahoz et al. [8] , and Cacuci et al. [9] . The data assimilation procedures also minimize, in a least-squares procedure, a “user-defined quadratic functional” which is meant to represent the discrepancies between computed and measured responses. The main differences between the data adjustment and data assimilation methods are as follows: 1) the data adjustment method are aimed at time-dependent models/systems, whereas the data assimilation methods are aimed at time-dependent models/systems; 2) the data adjustment methods can adjust/calibrate simultaneously model parameters and model responses, whereas the data assimilation methods can adjust/calibrate only model responses and initial conditions.
In the late 1990s, Cacuci and Ionescu-Bujor [10] [11] have developed a predictive modeling methodology which significantly generalizes and extends the “data adjustment and/or assimilation” methods by dispensing with the need to minimize the a priori user-defined “cost functional”, and by replacing the respective least-squares minimization procedure with the “principle of maximum entropy” (MaxEnt), originally formulated by Jaynes [12] , to attain optimal compatibility of the available information, while simultaneously ensuring minimal spurious information content. This methodology has been extended by Cacuci [13] to include the predictive modeling of coupled multi-physics systems, leading to the development of the “BERRU-PM” mathematical framework [14] for obtaining best-estimate optimal results with reduced predicted uncertainties. The “BERRU-PM” (Best-Estimate Results with Reduced Uncertainties Predictive Modeling) methodology provides a probabilistic description of possible future outcomes based on all recognized errors and uncertainties, along with a quantitative indicator, constructed from sensitivity and covariance matrices, for determining the consistency (agreement or disagreement) among the a priori computational and experimental data for parameters and responses. This consistency indicator measures, in the corresponding metric, the deviations between the experimental and nominally computed responses, and can be evaluated directly from the originally given data (i.e., originally available parameters and responses, together with their original uncertainties), once the response sensitivities have become available.
The data adjustment and assimilation methodologies, as well as the BERRU-PM methodology, incorporate only 1st-order sensitivities of the model responses with respect to the model’s parameters. In generalizing the BERRU-PM methodology, this limitation has been removed by Cacuci [15] [16] by having recently conceived the “2nd-BERRU-PM” methodology, which can incorporate arbitrarily-high order sensitivities of model responses with respect to model parameters. The essential contributions of the second- and higher-order sensitivities for reducing predicted uncertainties in the model response have been illustrated in [17] [18] [19] by applying the 2nd-BERRU-PM methodology [15] [16] to the polyethylene-reflected plutonium (PERP) OECD/NEA reactor physics benchmark [20] . The response of interest for this benchmark was the leakage of neutrons through the benchmark’s outer surface, which was computed [21] using the neutron transport Boltzmann equation, involving 21,976 imprecisely known parameters, the solution of which is representative of “large-scale computations.” As detailed in [21] , the first- through fourth-order sensitivities of this benchmark’s leakage response to the benchmark’s parameters were most efficiently computed using the adjoint sensitivity analysis methodology conceived by Cacuci [22] [23] and subsequently generalized by Cacuci [24] [25] to enable the computation of arbitrarily-high order sensitivities of model responses to model parameters.
Although the 2nd-BERRU-PM methodology can incorporate arbitrarily high sensitivities, this methodology is limited to considering just second-order moments (hence the designation “2nd-”) of the experimentally measured and computed model responses. The third-order moments (skewness) and fourth-order moments (kurtosis) of the distribution of measured and computed responses cannot be considered within the framework of the 2nd-BERRU-PM methodology. Furthermore, the “output” produced by the 2nd-BERRU-PM methodology is limited to yielding optimal best-estimate values for the means and covariances (i.e., the first- and second- moments) of the best-estimate predicted distribution of responses and parameters.
On the other hand, skewness and kurtosis play an essential role in determining the asymmetries of distributions. It is therefore paramount to generalize the 2nd-BERRU-PM methodology to enable the incorporation of third- and fourth-order moments of measured and computed responses, as well as to enable the computation of skewness and kurtosis of the best-estimate predicted posterior distribution of calibrated model parameters and responses. However, such a generalization can be achieved only if the underlying moment-constrained MaxEnt distribution could be correspondingly generalized to include third- and fourth-order moments of the distributions of measured and computed model responses.
Various variants of the “moment constrained maximum entropy problem” arise in areas as diverse as solid-state physics [26] , econometrics [27] , statistical description of gas flows [28] , weather and climate prediction [29] [30] [31] . It appears that the most efficient computational algorithm currently available for solving the “multidimensional moment constrained maximum entropy problem” is the algorithm devised by Abramov [32] , which is practically capable of solving two-dimensional problems with moments of order up to 8, three-dimensional problems with moments of order up to 6, and four-dimensional problems of order up to 4. But being able to solve “four-dimensional problems of order up to 4” is woefully insufficient for realistic problems, which are characterized by multivariate quantities of interest having dimensions much larger than just “four-dimensional”. On the other hand, the unparalleled efficiency of 2nd-BERRU-PM methodology for handling large-scale systems was demonstrated by means of the OECD/NEA reactor physics benchmark [20] involving 21,976 imprecisely known parameters. This computational efficiency was enabled by the analytical expressions for the first-and second-order moments produced by the underlying posterior second-order MaxEnt distribution within the 2nd-BERRU-PM methodology. It is therefore logical to attempt to extend to fourth-order the mathematical concepts and algorithms underlying the moment-constrained MaxEnt distribution within the 2nd-BERRU-PM methodology. This extension has been accomplished as will be presented in this work.
This work is structured as follows: Section 2 presents the novel, closed-form, fourth-order moment-constrained MaxEnt probabilistic representation of the distribution of an imprecisely measured or computed multivariate quantity, the components of which will be called “responses.” Such responses are produced by computations, measurements or both. The first four moments of the unknown multivariate distribution of these response, namely their mean values, variances/ covariances, skewness, and kurtosis, are assumed to be known. Section 3 concludes this work by outlining the crucial role which the analytical expression of fourth-order moment-constrained MaxEnt distribution presented in this work will play in constructing the mathematical framework of the 4th-BERRU-PM methodology, which will be presented in the accompanying Part 2 [33] .
2. Construction of the Fourth-Order Moment-Constrained Maximum Entropy (MaxEnt) Representation of Uncertain Multivariate Quantities
The components of the imprecisely known multivariate quantity of interest will be called “system responses.” Such system responses usually stem from measurements. Also, a computed response is equivalent to a “measurement” when the details (equations, parameters, etc.) underlying the computational model are unavailable. The case when the details of the computational model are available will be analyzed in the accompanying Part 2 [33] . In order to introduce the mathematical definitions of the known information (i.e., the moments of the unknown distribution of system responses), the unknown distribution of measured model system responses will be denoted as
,
, where
denotes the “ith-system response”,
, and where TR denotes the total number of system responses of interest. The unknown distribution
is formally defined on a TR-dimensional real-valued domain
. The letter “e” will be used either as a superscript or a superscript to indicate “experimentally obtained” (e.g., measured or computed with an inaccessible model) quantities. Matrices will be denoted using capital bold letters while vectors will be denoted using either capital or lower-case bold letters. The symbol “
” will be used to denote “is defined as” or “is by definition equal to.” Transposition will be indicated by a dagger (
) superscript.
In this work, it is considered that the first four moments, namely: mean values, variances/covariances, triple correlations (skewness) and quadruple correlations (kurtosis), of the unknown distribution of system responses are available. The formal mathematical definitions of the first four moments, considered to be known, of the unknown distribution
, are as follows:
(i) Known mean/expectation values, denoted as
, for the system responses
, where
:
. (1)
(ii) Known covariances, denoted as
, for two system responses
and
, where
:
(2)
The covariances
,
, of the system responses are considered to be components of the
-dimensional covariance matrix of
system responses, which will be denoted as
.
(iii) Known triple correlations, denoted as
, for three system responses
,
,
, where
:
(3)
(iv) Known quadruple correlations, denoted as
, for four system responses
,
,
,
, where
:
(4)
When an unknown distribution, such as
, needs to be reconstructed from a finite number of its known moments, the principle of maximum entropy (MaxEnt) originally formulated by Jaynes [12] provides the optimal compatibility with the available information, while simultaneously ensuring minimal spurious information content, yielding an estimate of a probability density with the highest uncertainty among all densities satisfying the known moment constraints. According to the MaxEnt principle, such a MaxEnt probability density would satisfy the “available information” provided in Equations (1)-(4), without implying any spurious information or hidden assumptions, if the following conditions are satisfied:
(a)
maximizes the Shannon [34] information entropy, S, which is defined below:
, (5)
(b)
satisfies the four “moments constraints” defined by Equations (1) through (4);
(c)
satisfies the following normalization condition:
(6)
The MaxEnt distribution
is obtained as the solution of the variational problem
, where the entropy (Lagrangian functional)
is defined as follows:
(7)
In Equation (7), the quantities
,
,
,
, and
denote the respective Lagrange multipliers, and the factor 1/2 has been introduced for subsequent computational convenience. Introducing the variable:
(8)
and solving the equation
yields the following expression for the resulting MaxEnt distribution
:
(9)
where:
(10)
Re-writing Equations (1)-(4) in terms of the variable
yields the following formal expressions for the known first four moments of the MaxEnt distribution
:
(11)
(12)
(13)
(14)
The expression of
can be evaluated in closed-form up to fifth-order in the variables
by expanding in Taylor-series the third- and fourth-order terms in the expression of
, to obtain the following approximate expression:
, (15)
where the following definitions have been used:
(16)
(17)
(18)
where:
(19)
(20)
(21)
The closed form expression of the integral shown in Equation (19) is well known:
, (22)
where:
(23)
It follows from the definitions provided in Equations (19)-(21) that:
(24)
Inserting the results obtained in Equations (19)-(24) into Equation (18) yields the following expression:
(25)
The explicit expressions for the quantities
and
are derived in Appendix A. Inserting into Equation (25) the expression obtained in Equation (19) for
together with the expressions obtained in Equations (64) and (66), in Appendix A, for the quantities
and
, respectively, yields the following expression for the normalization integral
:
(26)
where the quantities
and
are defined in Appendix A.
In terms of the approximate expression shown in Equation (15) for the distribution
, the moments defined by Equations (11)-(14) take on the following expressions:
(27)
(28)
(29)
(30)
The relation shown in Equation (27) can be expressed in terms of the normalization integral
as follows:
. (31)
It follows from Equation (26) that:
(32)
It follows from Equations (31) and (32) that:
(33)
The last approximate equality on the right-side of Equation (33) was obtained by expanding the respective denominator in a power series and neglecting the sixth- and higher-order terms.
It follows from the expressions obtained in Equations (68) and (71) of Appendix A that:
(34)
It follows from Equations (33) and (34) that if the third-order correlations among the measured system responses are neglected by setting
,
, then the solution of these equations becomes:
(35)
The solution of Equation (33) cannot be obtained analytically in closed form if the third-order correlations among the measured system responses are not neglected.
Taking the second derivative of
with respect to
Differentiating the relation obtained in Equation (31) with respect to another Lagrange multiplier,
,
, yields the following relation:
(36)
The last equality on the right-side of Equation (36) has been obtained by using Equations (31) and (35).
On the other hand, differentiating the relation shown in Equation (33) with respect to
,
, yields the following result:
(37)
The result obtained on the rightmost-side of Equation (37) follows by evaluating all terms at
(which implies that third-order correlations were neglected) and by also neglecting the fourth-order correlations by setting
. The results obtained in Equations (37) and (36) indicate that
(38)
where
denotes the known
-dimensional covariance matrix of the system responses. It follows from Equation (38) that:
. (39)
In view of Equations (34) and (38), neglecting the third- and fourth-order system response correlations while obtaining the expressions shown in Equations (35) and (38) for the Lagrange multipliers (i.e.,
and
, respectively) is equivalent to neglecting double- and triple products of measured-system response correlations.
Differentiating the relation shown in Equation (37) with respect to another Lagrange multiplier,
,
, yields the following result:
. (40)
On the other hand, differentiating the relation shown in Equation (36) with respect to
and evaluating the resulting expression by using the results obtained in Equations (35) and (38) for the Lagrange multipliers
and
yields the following result:
(41)
The last equality shown on the right-side of Equation (41) follows from the relation provided in Equation (29).
Evaluating the expression obtained in Equation (40) by using Equations (70) and (79) from Appendix B, in conjunction with the results obtained in Equations (35) and (38) for the Lagrange multipliers
and
yields the following relation for determining the Lagrange multipliers
:
(42)
Evidently, Equation (42) could be solved numerically, but cannot be solved analytically to obtain a closed-form solution in the general case. Furthermore, the triple cross-correlations
,
, are seldom available in practice. Usually, only the third-order self-correlations
, for each system response
,
, are available in practice for the measured system responses. In this case, Equation (42) will reduce to the following form:
(43)
Equation (43) can be solved explicitly by inverting the matrix
to obtain:
(44)
In particular, if only the system response variances are available but the second-order correlations among the system responses are negligible or unavailable, then the result obtained in Equation (44) reduces to the following simple expression for determining the Lagrange multipliers
,
:
(45)
Differentiating the relation shown in Equation (40) with respect to another Lagrange multiplier,
,
, yields the following result:
. (46)
The expression of
is known, as provided in Equation (84) in Appendix A.
On the other hand, differentiating the relation shown in Equation (41) with respect to
and evaluating the resulting expression by using the results obtained in Equations (35) and (38) for the Lagrange multipliers
and
yields the following expression:
(47)
Performing the above differentiations and evaluating the resulting expression by using the results obtained in Equations (35) and (38) for the Lagrange multipliers
and
yields the following relation:
(48)
Using the expressions provided in Equations (28) and (30) to replace the expressions on the right-side of Equation (48) yields the following result:
(49)
It follows from the results obtained in Equations (49) and (46) that the following relation holds for
:
(50)
The Lagrange multipliers
for the fourth-order correlations of the measured system responses are to be obtained as the solution of Equation (50). This equation could be solved numerically but cannot be solved analytically to obtain closed-form solutions for these Lagrange multipliers. In addition, the general fourth-order correlations
are not expected to be available in practice; at most, the fourth-order self-correlations
, for each system response
,
, might be available in practice for the measured system responses. In this case, Equation (50) will reduce to the following form for the quantities
and
:
(51)
Using the result presented in Equation (84) in Appendix B yields the following expression for the quantity
, for
and
:
(52)
Inserting the result obtained in Equation (52) into the right-side of Equation (51) yields the following vector-matrix equation for determining the Lagrange multipliers
,
:
(53)
The solution
of Equation (53) is obtained in the following form:
(54)
In particular, if the (second-order) correlations among the measured system responses are negligible or unavailable, then the result obtained in Equation (54) reduces to the following simple expression for determining the Lagrange multipliers
,
:
(55)
The expressions of the Lagrange multipliers obtained in Equations (35), (38), (44) and (54) are not exact, but have been obtained in closed forms, so they can be used in Equations (9) and (10) to obtain the correspondingly approximate MaxEnt distribution, which will be denoted as
and which has the following expression:
(56)
where the Lagrange multipliers
,
and
,
, have the expressions obtained in Equations (38), (44) and (54), respectively. The approximations incurred during the course of obtaining the expression shown in Equation (56) for the fourth-order MaxEnt distribution
of experimentally measured system responses are summarized below:
1) When obtaining the result
in Equation (35) for the Lagrange multipliers for the expected/mean values of the system responses, the third-order correlations among the system responses were neglected. This approximation is confirmed, as shown in Appendix B, namely:
(57)
2) When obtaining the result
in Equation (38) for the Lagrange multipliers for the covariances of the system responses, the third- and fourth-order correlations among the system responses were neglected. This approximation is confirmed by using the expression of the fourth-order MaxEnt distribution
given in Equation (56), the result shown in Equation (57) and the definition of the system responses’ covariances, as shown in Appendix C, to obtain the following expression:
(58)
3) When obtaining the result
in Equation (44) for the Lagrange multipliers for the triple-correlations of the system responses, only the self-triple-correlations among the system responses were considered. This approximation is confirmed, as shown in Appendix B, by using:
i) the expression of the approximate fourth-order MaxEnt distribution
given in Equation (56);
ii) the result shown in Equation (57); and
iii) the definition of the self-correlations of third-order for the system responses.
Thus, as shown in Appendix B, the following expression is obtained for the triple-correlations
;
:
(59)
4) When obtaining the result
in Equation (54) for the Lagrange multipliers for the quadruple-correlations of the measured system responses, only the self-correlations of fourth order among the measured system responses were considered. This approximation is confirmed by using the expression of the approximate fourth-order MaxEnt distribution
given in Equation (56), the result shown in Equation (57), and the definition of the self-correlations of third-order for the measured system responses to obtain, as shown in Appendix B, the following expression for the quadruple-correlations
,
:
(60)
In summary, the results obtained in Equations (57)-(60) indicate that the known/given means, covariances, triple and quadruple correlations for the system responses can all be used as values for the corresponding moments of the approximate fourth-order MaxEnt distribution
given in Equation (56), since the approximations incurred when using this correspondence are at least of 2-orders higher than for the moment in question, namely:
1) The first-order moments of
have third-order errors by comparison to the measured mean values.
2) The second-order moments of
have fourth-order errors by comparison to the measured mean values.
3) The third-order moments of
have fifth-order errors by comparison to the measured mean values.
4) The fourth-order moments of
have sixth-order errors by comparison to the measured mean values.
5) Notably, if the triple- and the quadruple correlations are negligeable (or unavailable) then the MaxEnt distribution
reduces to a multivariate Gaussian with mean
and covariance matrix
.
3. Discussion and Conclusions
This work has presented a novel closed-form expression for the fourth-order moments-constrained Maximum Entropy (MaxEnt) probability distribution, which was constructed from the known first four moments (means, covariances, skewness, kurtosis) of an otherwise unknown distribution of a high-dimensional multivariate uncertain quantity of interest, which was called a “system response”. This fourth-order MaxEnt distribution provides optimal compatibility of the available information while simultaneously ensuring minimal spurious information content, yielding an estimate of a probability density with the highest uncertainty among all densities satisfying the known moment constraints. This novel generic fourth-order MaxEnt distribution is of interest in its own right for applications in many areas, including solid-state physics, econometrics, statistical description of gas flows, weather and climate prediction. In the accompanying work [33] , this novel generic fourth-order MaxEnt distribution will be used to construct a novel fourth-order predictive modeling methodology aimed at obtaining “best-estimate results with reduced uncertainties” for the first four moments (mean values, covariance, skewness and kurtosis) of the optimally predicted distribution of model results and calibrated model parameters, by combining fourth-order experimental and computational information, including fourth (and higher) order sensitivities of computed model system responses to model parameters.
Appendix A: Auxiliary Computations for Constructing the Moment-Constrained Fourth-Order MaxEnt Distribution
Recall that the following closed-form expression of
was provided in Equation (19):
. Differentiating this expression with respect to a Lagrange multiplier
yields the following relation:
. (61)
Differentiating the expressions in Equation (61) with respect to a Lagrange multiplier
yields the following relations:
(62)
where:
(63)
Differentiating the expressions in Equation (62) with respect to a Lagrange multiplier
yields the following relations:
(64)
where:
(65)
Differentiating the expressions in Equation (64) with respect to a Lagrange multiplier
yields the following relations:
(66)
where:
(67)
The various derivatives of the functions
and
with respect to the Lagrange multipliers
,
, will also be used when evaluating the various derivatives of the normalization integral
with respect to these Lagrange multipliers. The derivatives of the function
with respect to the Lagrange multipliers
,
, are obtained by using Equation (65), which yields the following expressions:
(68)
Differentiating the expression in Equation (68) with respect to a Lagrange multiplier
yields the following relation:
(69)
Differentiating the expression in Equation (69) with respect to a Lagrange multiplier
yields the following relation:
(70)
The derivatives of the function
with respect to a Lagrange multipliers
,
, are obtained by using Equation (67). Thus, the first derivative of the function
with respect to a Lagrange multipliers
,
, has the following expression:
(71)
Differentiating Equation (71) with respect to a Lagrange multiplier
, yields the following relation:
, (72)
where:
(73)
(74)
(75)
(76)
(77)
(78)
Differentiating Equation (72) with respect to a Lagrange multiplier
, yields the following relation:
(79)
where:
(80)
(81)
(82)
(83)
Differentiating Equation (79) with respect to a Lagrange multiplier
,
, yields the following relation:
(84)
where:
(85)
(86)
(87)
(88)
Appendix B: Approximations Inherent to the Fourth-Order Maximum Entropy Distribution
Recalling Equation (56), the expression of the approximate fourth-order MaxEnt probability distribution function
for the responses has the following form:
(89)
where:
(90)
(91)
. (92)
The MaxEnt probability distribution function
is properly normalized, as shown below:
(93)
The first-order moments
,
, of
are given by the following expression:
(94)
As indicated by the result obtained in Equation (94), if the third-order correlations
among the measured responses are all neglected, then the MaxEnt moments
coincide with the measured moments,
(i.e.,
if
).
Within the approximation
, the second-order moments
,
, of
are given by the following expression:
(95)
Within the approximation
, the third-order moments
,
, of
are given by the following expression:
(96)
where the acronym “HOT” denotes “higher order terms.”
Within the approximation
, the fourth-order moments
,
, of
are given by the following expression:
(97)
where the acronym “HOT” denotes “higher order terms”.