Some Applications of Higher Moments of the Linear Gaussian White Noise Process ()
1. Introduction
The objective of estimation procedures is to produce residuals (the estimated noise sequence) with no apparent deviations from stationarity, and in particular with no dependence among these residuals. If there is no dependence among these residuals, then we can regard them as observations of independent random variables; there is no further modeling to be done except to estimate their mean and variance. If there is significant dependence among the residuals, then we need to look for the noise sequence that accounts for the dependence [1] .
In this paper, we examine the covariance structure of powers of the noise sequence when the noise sequence is assumed to be independent and identically distributed normal (Gaussian) random variates with mean zero and finite variance,
. Some simple tests for checking the hypothesis that the residuals and their powers are observed values of independent and identically distributed random variables are also considered. Also considered are tests for normality of the residuals and their powers.
The stochastic process
is said to be strictly stationary if the distribution function is time invariant. That is;
(1.1)
where
(1.2)
That is, the probability measure for the sequence
is the same as that for
for all k. If a series satisfies the next three equations, it is said to be weakly or covariance stationary.
(1.3)
If the process is covariance stationary, all the variances are the same and all the covariances depend on the difference between
and
. The moments
(1.4)
are known as the autocovariance function. The autocorrelations which do not depend on the units of measurements of
are given by
(1.5)
A stochastic process
, where
, is called a white noise if with finite mean and variance all the autocovariances (1.4) are zero except at lag zero [
, for
]. In many applications,
is assumed to be normally distributed with mean zero and variance,
, and the series is called a linear Gaussian white noise process if:
(1.6)
and
(1.7)
where
is known as the partial autocorrelation function. For large n, the sample autocorrelations:
(1.8)
of an iid sequence
with finite variance are approximately distributed as
[1] [2] [3] . We can use this to do significance tests for the
autocorrelation coefficients by constructing a confidence interval. Here
is a realization of such an iid sequence, about
of the sample autocorrelations should fall between the bounds:
(1.9)
where
is the
quartile of the normal distribution. If the null and alternative hypothesis are:
(1.10)
where
are autocorrelations at lag k computed for
.
We can also test the joint hypothesis that all m of the
correlation coefficients are simultaneously equal to zero. The null and alternative hypothesis are:
(1.11)
The most popular test for (1.11) is the [4] portmanteau test which admits the following form
(1.12)
where m is the so-called lag truncation number [5] and (typically) assumed to be fixed [6] . Under the assumption that
is an iid sequence,
is asymptotically a chi-squared random variable with m degree of freedom. [7] modified the
statistic to increase the power of the test in finite samples as
(1.13)
Several values of m are often used and simulation studies suggest that the choice of
provides better power performance [8] .
Another Portmanteau test formulated by [9] can be used as a further test for iid hypothesis, since if the data are iid, then the squared data are also iid. It is based on the same statistic used for the Ljung-Box test as
(1.14)
where the sample autocorrelations of the data are replaced by the sample autocorrelations of the squared data,
.
According to [6] , the methodology for testing for white noise can be roughly divided into two categories: time domain tests and frequency domain tests. Other time domain tests include the turning point test, the difference-sign test, the rank test [1] . Another time domain test is to fit an autoregressive model to the data and choosing the order which minimizes the AICC statistic. A selected order equal to zero suggests that the data is white noise [1] .
Let
(1.15)
be the normalized spectral density of
. The normalized spectral density function for the linear Gaussian white noise process is
(1.16)
The equivalent frequency domain expressions to H0 and H1 are
H0:
and H1:
(1.17)
In the frequency domain, [10] proposed test statistics based on the famous
and
processes [6] , and a rigorous theoretical treatment of their limiting distributions was provided by [11] . Some contributions to the frequency domain tests can be found in [12] and [13] , among others. This study will concentrate on the time domain approach only.
A stochastic process
may have the covariance structure (1.6) even when it is not the linear Gaussian white noise process. Examples are found in the study of bilinear time series processes [14] [15] . Researchers are often confronted with the choice of the linear Gaussian white noise process for use in constructing time series models or generating other stationary processes in simulation experiments. The question now is, “How do we distinguish between the linear Gaussian white noise process from other processes with similar covariance structure”? Additional properties of the linear Gaussian white noise process are needed for proper identification and characterization of the process from other processes with similar covariance structure. Therefore, the ultimate aim of this study is on the use of higher moments for the acceptability of the linear Gaussian white noise process. The first moment (mean) and second or higher moments (variance, covariances, skewness and kurtosis) of powers of the linear Gaussian white noise process was established in Section 2. The methodology was discussed in Section 3, the results are contained in Section 4 while Section 5 is the conclusion.
2. Mean, Variance and Covariances of Powers of the Linear Gaussian White Noise Process
2.1. Mean of Powers of the Linear Gaussian White Noise Process
Let
, where
is the linear Gaussian white noise process. The expected value of
are needed for the effective determination of the variance and covariance structure of
. Lemma 2.1 gives the required result.
Lemma 2.1: Let
be a linear Gaussian white noise process with mean zero and variance
(
follows iid
), then
(2.1)
where [16]
(2.2)
Proof:
Let
, then
(2.3)
Note that
(2.4)
(2.5)
1) Case 1:
Equation (2.5) reduces to
(2.6)
Let
(2.7)
(2.8)
The integral in Equation (2.8) is a gamma function
[17] and by definition
(2.9)
(2.10)
Thus
(2.11)
2) Case II:
(2.12)
Thus
2.2. Variances of Powers of the Linear Gaussian White Noise Process
Theorem 2.2: Let
be a linear Gaussian white noise process with mean zero and variance
(
follows iid
), then
(2.13)
Proof:
Let
, then the expected value of
is given by Equation (2.1).
Case I:
(d even)
Now
From Equation (2.1)
(2.14)
and
(2.15)
(2.16)
Case II
(d odd)
From Equation (2.1)
(2.17)
and
(2.18)
Generally
(2.19)
Table 1 summarizes the mean and variances of
. The standard deviation of
is also included when
. A plot of
against d for fixed
is given in Figure 1. From Figure 1, we note that for fixed
, increase in d leads to an exponential increase in the standard deviation.
The specific objective of this paper is to investigate if powers of
are also iid and to determine the distribution of
, especially for
. The analytical proofs are provided in Section 2.3.
2.3. Covariances of Powers of the Linear Gaussian White Noise Process
Theorem 2.3: If
is a linear Gaussian white noise process then
Figure 1. Plot of standard deviation of
against power (d) for fixed σ = 1.
Table 1. Mean, variance and standard deviation of
.
higher powers of
are also white noise processes (iid) but not normally distributed.
Proof:
Since
are iid and
, we consider for
.
However, for
,
. Hence
(2.20)
It is clear from Equation (2.20) that when
are iid, the powers
of
are also iid. That is,
(2.21)
The probability distribution function (p.d.f) of
can be obtained to enable a detailed study of the series. Theorem 2.4 gives the p.d.f of
Theorem 2.4: If
is a linear Gaussian white noise process, then
has the p.d.f
(2.22)
Proof:
If
and
, the distribution function of Y is, for
,
Let
, then since
, we have
Of course
, where
. The p.d.f of Y is
and by one form of the fundamental theorem of calculus [17]
Note that the p.d.f of
is the p.d.f of a gamma distribution with parameters
. That is,
.
However, for a more detailed study on the behavioral of the linear Gaussian white noise process, the coefficient of symmetry and kurtosis for powers of the process are provided in Section 2.4.
2.4. Coefficient of Symmetry and Kurtosis for Powers of the Linear Gaussian White Noise Process
Non-normality of higher powers of
(
) can also be confirmed by the coefficient of symmetry and kurtosis defined by
(2.23)
(2.24)
where
(2.25)
(2.26)
and
(2.27)
Note that
(2.28)
(2.29)
The kurtosis for
and 6 are given in Table 2. A plot of
against
is given in Figure 2. From Figure 2, we note that increase in d leads to an exponential increase in the kurtosis.
Figure 2. Plot of kurtosis coefficient against power of the linear Gaussian white noise process.
Table 2. Coefficient of symmetry and kurtosis for
.
3. Methodology
3.1. Checking for Normality
If the noise process is Gaussian (that is, if all of its joint distributions are normal), then stronger conclusions can be drawn when a model is fitted to the data. We have shown that all powers of the linear Gaussian process are non-normal. The only reasonable test is the one that enables us to check whether the observations are from an iid normal sequence. The Jarque-Bera (JB) test [18] [19] [20] for normality can be used. The JB test is based on the assumption that the normal distribution (with any mean or variance) has skewness coefficient of zero, and a kurtosis coefficient of three. We can test if these two conditions hold against a suitable alternative and the JB test statistic is
(3.1)
where
(3.2)
(3.3)
is the sample size while,
and
are the sample skewness and kurtosis coefficients. The asymptotic null distribution of JB is
with 2 degrees of freedom.
3.2. White Noise Testing
We have shown that the sample autocorrelations of
. are those of the white noise series if the sample autocorrelations of
are also iid. We will adopt the Ljung-Box test by replacing the sample autocorrelations of the data
with those of
and use the statistic
(3.4)
The hypothesis of iid data is then rejected at level
if the observed
is larger than the
quartile of the
distribution.
3.3. Determining the Optimal Value of d
Figure 1 suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. We are going to use the behavior of the variance and kurtosis coefficient to determine the optimal value of d. The optimal value is that value of d that gives a perfect fit for either the quadratic or exponential growth curves. Using the standard deviation for
, the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values. However, the residual of the resulting exponential curve is very large as measured by the following accuracy measures [21] .
Mean Absolute Error (MAE)
(3.5)
Mean Absolute Percentage Error (MAPE)
(3.6)
Mean Squared Error (MSE)
(3.7)
where m is the value of d used in the trend analysis and,
(3.8)
Table 3 gives the accuracy measures for the trend analysis of the standard deviation of
when
while Table 4 gives detailed results for optimality.
When
, the quadratic growth curve performs better than the exponential curve with minimal residual. Both curves fitted positive values at different data points. We also observed from Table 3 that with
, the quadratic
Table 3. Summary of accuracy measures for the exponential and quadratic curves using the standard deviation of
for
.
Table 4. Fitting exponential and quadratic curves to the standard deviation of powers of linear Gaussian white noise process when
and
.
*Exponential and Quadratic trend analysis cannot be possible for
or
.
growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual. The implication of the result is that we obtain a perfect fit for the data point when
for the quadratic curve only. Hence, the optimal value of d is 3 when we use the standard deviation curve.
Figure 2 also suggests two growth models: 1) the quadratic growth model and 2) exponential growth model. Using the kurtosis coefficient for
, the exponential growth curve performs better than the quadratic growth curve. The quadratic growth curve fitted negative values to positive values at the different data points while the exponential curve fitted only positive values.
When
, the quadratic growth curve performs optimally than the exponential growth curve. The resulting quadratic curve yielded zero residual as that of the standard deviation curve. The implication of these results is that we obtain a perfect fit for the data point when
for the quadratic curve only. Hence, the optimal value of d is 3. Therefore, we recommend that in order to stop the variance from exploding, the order of the data points should not be raised to power greater that three.
3.4. On the Use of Higher Moment for the Acceptability of the Linear Gaussian White Noise Process
We have shown that if
is a linear Gaussian white noise process,
is also iid but not normally distributed. Using the variances and kurtosis of
, we were able to establish that the optimal value of d is three. Variances and kurtosis of
have been given in Table 5 and Table 6 respectively. It is also clear from Equation (2.24) that the kurtosis itself is a function of variances. We, therefore, insist that for a stochastic process to be accepted as a linear Gaussian white noise process, the following variances must be true:
(3.9)
(3.10)
and
(3.11)
Table 5. Summary of accuracy measures for the exponential and quadratic curves using the Kurtosis Coefficient of
for
.
*Exponential and Quadratic trend analysis cannot be possible for
or
.
Table 6. Fitting exponential and quadratic curves to the kurtosis coefficient of powers of linear Gaussian white noise process when
and
.
In view of these, we suggest that the two following null hypothesis be tested before a stochastic process is accepted as a linear Gaussian white noise process:
(3.12)
and
(3.13)
Then, the chi-square test statistic [22] for testing (3.12) is
(3.14)
while that for (3.13) is
(3.15)
where
and
are the estimated variance of the second and third power of the stochastic process,
is the null value for the true variance of the stochastic process and n is the number of observations of the random digits. The null hypothesis is rejected at level
if the observed value of
is larger
than
quartile of the chi-square distribution with
. Degree of freedom.
4. Results
For an illustration, six (6) random digits were simulated using Minitab 16 series (see Appendix). The simulated series met the following conditions: 1) The simulated series
are normal and 2) Powers of
are shown to be iid but not normally distributed (see Table 7).
Table 7. Descriptive statistics and estimate of the test statistic for rejecting the null hypothesis of equality of the variance of higher moment for six simulated series,
, as linear Gaussian white noise process.
The value of the chi-square test statistic for testing (3.12) and (3.13) are also shown in Table 7. We observed that the null hypothesis is rejected at level
equals 5% for two simulated series and is not rejected for the other four. The result clearly showed that testing the variance of higher moments for
is a necessary condition for accepting the linear Gaussian white noise process.
5. Conclusion
We have been able to show that if
are iid then, all powers of
are also iid but, non-normal. Hence, we computed the kurtosis of some higher powers of
and established that an increase in the powers of
leads to an exponential increase on the kurtosis. We recommend that stochastic processes (white noise processes) and processes with similar covariance structure should be considered for normality, white noise testing and for test of the variance of higher moments being equal to the theoretical values of Table 1 with
.
Appendix
Table A1. Six simulated white noise series:
data.