Can Choice of Reference Density Improve Power of M-Estimation Based Unit Root Tests? ()
1. Introduction
Following the seminal work of Dickey and Fuller [1], there has been immense interest in the statistical test for the unit root hypothesis in time series data. Stock [2], Phillips, and Xiao [3] have surveyed many such tests. Elliot, Rothenberg, and Stock [4] proposed a family of tests and also a modified Dickey-Fuller test for Gaussian time series with an unknown mean or a trend. Most of these tests are based on the OLS method assuming Gaussian errors. Monte Carlo procedure indicates that these tests do not have high statistical power when the errors show a heavy-tailed behaviour.
Many scholars explored tests with higher power for non-Gaussian errors based on M-estimation. Cox and Llatas [5], Lucas [6], Rothenberg and Stock [7], Knight [8], Herce [9], Xiao [10], Thompson [11] and [12] and others have made significant contributions in the M-estimation domain.
They all assumed a known density function on the error process for the model to estimate the model parameters using the M-estimation procedure. Henceforth we shall call this assumed density function on the error process the “reference density”.
For example, Lucas [6] investigated student t distribution. Herce [9] used the absolute value function (double exponential density). Hansen [13] proposed a unit root test using a covariate approach. However, identifying the covariates may be difficult in practice. Hasan and Koenker [14] proposed rank-based tests. Thompson [11] and [12] showed that for each rank-based test, there exists an M-estimation test with the same asymptotic power. Shin and So [15] used a nonparametric method to estimate the unknown error process. The method developed by them is difficult to implement. Moreover, for the Cauchy errors, their test has very poor power [11]). Koenker and Xiao [16] apply quantile regression to analyse the unit root process. In his monograph, Choi [17] has explained (page 97) that the LAD estimation-based test of Herce [9] has lower power than that of the quantile regression test. Using the approach proposed by Potscher and Prucha [18], Lima and Xiao [19] used a partially adaptive estimation method (PADF) to estimate the unknown error density. Hallin, Van Den Akker, and Werker [20] proposed a class of tests using the ranks of the samples. The rank-based test, however, requires an independently and identically distributed (IID) error process which may not be feasible in many practical applications.
Johnson SU (JHSU) and Pearson Type IV (PIV) distributions have been used to model financial time series data and in the risk management literature. Nagahara [21] used PIV density to model the stock return distribution. Bhattacharyya and Madhav [22] used the Johnson SU distribution and other methods to estimate the VaR for leptokurtic equity index returns. Bhattacharyya, Chaudhary, and Yadav [23] used PIV distribution to obtain the conditional VaR. Bhattacharyya, Misra, and Kodase [24] used PIV distribution to obtain the conditional MaxVaR.
Johnson SU distribution covers a wide range of shapes depending on its parameter values. It may also be a good approximation of Pearson type IV distribution [25]. The main advantage of this distribution is its ability to capture high kurtosis and skewness which is commonly observed in financial and economic time series data.
The main objective of this paper is to explore if the use of Johnson SU distribution (see Section 2) as a reference density (hereafter we call this test the JHSU test) could improve power. This is, perhaps, the first use of Johnson SU distribution in M-estimation literature. Using some results in Lucas [6], Xaio [10], and Thompson [11] [12], we obtain our test statistic and its asymptotic properties.
In our simulation studies, we generate data assuming various time series models such as AR (1), MA (1), etc. with error processes assumed as standard normal, and chi-square distributions. Please note that, in practice, the data generating process, and therefore, the error process will be unknown. We use the M-estimation method to estimate the parameters of a selected model from the data thus generated. In the M-estimation method, we need to assume a probability density for the error process, unlike OLS estimation. This assumed density is called the “reference density”. We explore Johnson SU and Pearson Type IV as the reference densities. These choices are especially explored to investigate if there is any improvement in the local power. We compare our choices with two other well-known tests such as ADF, PADF and test with reference density as distribution with 3 degrees of freedom.
From the Monte Carlo simulations, we observe that the JHSU test, in finite samples, is as efficient as the augmented Dickey-Fuller (ADF) test for normal errors, and more powerful than many existing traditional tests for non-normal errors. The JHSU test is surprisingly easy to implement.
In Section 2, we sketch a brief outline of the Johnson SU distribution. Section 3 presents the time series model that we study in this paper and obtain the test statistic and its asymptotic distribution. The method of estimation of parameters, and the calculation of the test statistic, and the critical value have been explained in Section 4. Section 5 presents Monte Carlo simulation results. Section 6 describes the empirical studies, and we conclude in Section 7.
2. Johnson SU Distribution
Johnson [25] proposed three transformations, f, of the following form:
where Z is a standard normal variable and X is a continuous random variable whose distribution is unknown with shape parameters
and
, scale parameter
and location parameter
.
and
.
may be chosen as
,
or
.
For the Johnson SU distribution, f is chosen as
so that
The parameters
and
control skewness and kurtosis. The distribution is positively (negatively) skewed if
is negative (positive). Increasing
, holding
constant, reduces the kurtosis. Johnson SU distribution can capture a wide range of shapes depending on its parameter values.
The probability density function of Johnson SU distribution is given by
(1)
The mean
, where
and
. When mean is zero
.
3. The Model and Asymptotic Distribution of the Test Statistic
We assume the following data generating process (DGP) for our analysis.
,
The errors
are independently and identically distributed with expectation zero and finite variance
with
.
The term
is the lag polynomial
. All the roots of
lie outside the unit circle. First, we re-write this model in the augmented Dicky and Fuller format, which is defined below
where
,
. For the DGP, the null and alternative hypotheses are
(i.e.,
)
Let
be the M-estimator of
using Johnson SU as a reference density. Then
will minimize the following objective function
,
where
(g = Johnson SU density function).
For Johnson SU density,
is given below
(2)
where
and
are given by
(3)
(4)
and
(5)
Assumption 1. The function
is continuously differentiable and its second and higher order derivatives are bounded.
We denote the first and the second derivatives of
by
and
respectively. Further, we define the following
Following Thompson (2004) an approximate estimate of
is
(6)
where
.
is the matrix with row equal to
.
.
When time trend is not present (i.e.,
)
is defined by
.
is a
dimensional parameter vector that minimizes the objective function
.
is an estimate of
.
is
dimensional column vector with components
. For the case of Johnson SU distribution
is defined in equation (2.3.15). P is the projection matrix defined by
where I is the Identity matrix. For Johnson SU as a reference density
is defined below
(7)
,
and
are given in Equations (3)-(5), respectively.
Following Thompson [26], we can remove
from the above Equation (6) since it does not affect the asymptotic power but, in small samples, can affect the size of the test because of poor estimation of
. After removing
test statistic
in the t-ratio format is
. We reject null for small values of
. Our next task is to find the asymptotic distribution of
.
In this paper, we restrict our alternative hypothesis to the AR coefficient
being very close to 1. We set
, where C a constant, when making a limiting argument. N is the sample size. So, the parameter space is a shrinking neighbourhood of zero (see Chan and Wei [27] and Phillips [28]). In the presence of unit root, C is equal to zero, obviously.
Assumption 2
.
Asymptotic of the Test Statistic
The asymptotic distribution function of the test statistic is represented in terms of the function of Brownian motion. Let
be a standard Brownian motion defined on
and
be a related diffusion process
that satisfies the stochastic differential equation
.
Let
be another process defined, for the intercept only model, by
and for the model with time trend by
Thompson [12], under the assumptions (1) and (2), proved that the limiting distribution of
converges weakly to the random variable
, which is defined below
From the above, it is clear that, while
controls the null distribution (under null,
), the power is determined by both
and
, because power of the test is
,
is obtained from
, and
is the size of the test. Thompson [12] argues that
dominates the power function relative to
.
As the value of
increases, the asymptotic distribution shifts to the left, becauseC is negative when the alternative hypothesis is true. Since the alternate hypothesis is one-sided (left), the rejection zone is on the left tail of the distribution of the test statistic. Therefore, a left shift of the asymptotic distribution gives the source of power improvement.
Table 1 gives the lists of the functional form of
and
for different reference densities.
4. Calculation of the Test Statistic and Critical Value
To compute the test statistic, we adopt the following steps.
1) Select lag length
, using MAIC criterion developed by Ng and Perron [29] setting the maximum lag at
.
Table 1. Expressions of
and
for different reference density functions.
2) Run the following regression:
is the de-trended series according to Elliot, Rothenberg and Stock [4].
3) Estimate parameters of Johnson SU density from the estimated residuals of the above regression equation by maximum likelihood method with the condition
. (Since mean of the true error process is zero) where
and
are as in Section 2.
4) Find
by minimizing
and define
, where
is a (
) dimensional row vector, because the errors are estimated under the null hypothesis (see Hasan and Koenker [14]).
5) The estimates of the nuisance parameters
and
are calculated as:
,
where
and
is the sample means of
and
respectively.
6) Calculate test statistic
.
7) Compute approximate
critical value for model M by the polynomial in
given below.
The coefficients of the polynomial
are adapted from Thompson [12] (p. 368) and are reported in Table 2 for two different models, for ready reference.
8) Reject null hypothesis
for model M, if
at
level of significance.
5. Monte Carlo Evidence
In this section, using Monte Carlo Simulation, we evaluate the small sample performance of the tests for sample size 100. For our simulations, we have considered
*The suffix M is removed for notational simplicity. Source: Thompson [12] (p. 368).
two sets of values for
. For each such set, we have assumed three different error processes for
. Finally, we have assumed four different distributions for the errors
. Thus, a total of 24 different models have been used in the simulations. On all these 24 models, five different tests, based on different reference densities, have been investigated. All the calculations have been performed in R-studio.
Data have been generated according to the model defined below.
Two sets of values considered for
are
and
. Three selected error processes for
are:
1) IID
2) AR (1):
, and
3) MA (1):
.
We set the initial condition
.
The error process
has been generated from the following four distributions.
1) Standard normal distribution.
2) Student t distribution with
.
3) Lognormal distribution with mean centred at zero.
4) Chi-square distribution with
with mean centred at zero.
Five tests, with the following notations, have been used for all the 24 models described above.
1) ADF—Augmented Dicky-Fuller test.
2) T3—Student t distribution with
as the reference density.
3) PADF—Partially adaptive estimation method proposed by Lima and Xaio (2010).
4) PIV—Pearson Type IV distribution as reference density.
5) JHSU—Johnson SU distribution as reference density.
We have used the package Urca and the function ur.df has been used to perform the test setting the lag length at
. JHSU test has been performed according to the steps from 1 - 4 described in Section 4. To estimate the parameters of the Johnson SU density by the maximum likelihood method, we use the constroptim function.
We have performed 1000 replications of each test for a sample size of 100. All the results have been reported at a 5% significance level. The numbers in the tables below represent the rejection ratios of the null hypothesis by different tests among 1000 replications. We have also investigated the ERS (Elliot et al. [4]) test and compared it with ours. We have found that the power of the JHSU test is better than that of the ERS test for the asymmetric error process. Hence, we have not reported the results of the ERS test.
Tables 3-5 report the results for only intercept caseswith error process
as IID, AR (1), and MA (1) respectively. Tables 6-8 report the results when the time trend is included in the model and the error process
as IID, AR (1), and MA (1) respectively. The first column in each table represents the assumed distribution
Table 3. Rejection ratios of the null hypothesis of different tests among 1000 replications with drift only model i.e.
taking i.i.d error process, 5% significance level.
Table 4. Rejection ratios of the null hypothesis of different tests among 1000 replications with drift only model i.e.
taking AR (1) error process, 5% significance level.
Table 5. Rejection ratios of the null hypothesis of different tests among 1000 replications with drift only model i.e.
taking MA (1) error process, 5% significance level.
Table 6. Rejection ratios of the null hypothesis of different tests among 1000 replications with time trend model i.e.
taking i.i.d error process significance level at 5%.
Table 7. Rejection ratios of the null hypothesis of different tests among 1000 replications with time trend model i.e.
taking AR (1) error process significance level at 5%.
Table 8. Rejection ratios of the null hypothesis of different tests among 1000 replications with time trend model i.e.
taking MA (1) error process significance level at 5%.
of
, the second gives the values of C, the third column shows the sample size, and the fourth to eighth columns report the rejection ratios when the tests 1 - 5 (described above) respectively applied to the DGP. The boldface numbers in the table indicate the highest rejection ratios. The R-code of the simulation studies can be available upon request.
The above results clearly suggest that the JHSU test, in respect of power, has a very good small sample performance. JHSU test is as powerful as the ADF test when the innovation process is Gaussian and has substantial gain in power when the errors are non-Gaussian. In terms of power JHSU test also has higher power than that of PIV, T3, and PADF tests for Lognormal and chi-square error distribution processes.
6. Empirical Evidence
We have considered two data sets for application. The first is the extended Nelson and Plosser (1982) data set and the second is the nominal monthly interest rate of India from January 2005 to March 2017. Nelson and Plosser extended data set are openly available at NPEXT: Nelson and Plosser extended data in urca: Unit Root and Cointegration Tests for Time Series Data (rdrr.io). The second data are collected from International Monetary Fund International Financial Statistics (IMF-IFS) database (https://data.imf.org/?sk=4c514d48-b6ba-49ed-8ab9-52b0c1a0179b&sId=1409151240976).
First, we consider the case of the Nelson and Plosser data set. Many researchers have used the data of Nelson and Plosser [30] to investigate whether macroeconomic time series are random walks or stationary processes around a level or a trend. As their data set is considered a testing ground for new procedures, we also implement our proposed method on Nelson and Plosser data. Lag length is obtained by MAIC criterion setting maximum lag at
, where N is the sample size.
Table 9 reports the Jarque-Bera test statistic for all the series of Nelson and Plosser data sets. Table 10 contains the unit root analysis. We considered three tests, viz., ADF, ERS, and JHSU. We have taken the time trend model for our analysis.
We observe from Table 10, that the JHSU test rejects the hypothesis of unit root for “GNP per Capita” at 1% and for “Unemployment” at 5% level. For “Real GNP and “Unemployment”, ERS test rejects only at 5% level but for no series at 1% level. For “GNP per Capita” ADF test rejects at 5% level.
Normality assumption is rejected at 5% significance level for the “GNP per capita series”. JHSU test rejects the unit root hypothesis at 1% level and ADF test rejects it at 5% significance level. ERS test is not able to reject the null hypothesis. This clearly shows the power improvement in the power of the JHSU test.
From Table 10, we note that for the “Real GNP” series, ERS rejects the null but ADF cannot reject the null at 5% significance level. Also, for the “Real GNP” series, the Normality assumption is rejected (See Table 9) at 5% significance level. JHSU test cannot reject the null for the “Real GNP” series. This supports the finding of the JHSU test for the Real GNP series.
Table 9. Jarque-Bera statistic of Nelson Plosser data set.
Table 10. Unit root analysis of Nelson Plosser data set.
Note: * 5%; ** 1%.
Table 11. Descriptive statistics of interest rate of India.
Table 12. Unit root analysis of interest rate of India.
JHSU and ERS tests give the same result (reject the null at 5% significance level) when the normality assumption has not been rejected, e.g., for the “Unemployment” series.
In our second study, we consider the case of the nominal monthly interest rate of India from January 2005 to March 2017. Table 11 gives the descriptive statistics and Table 12 reports the unit root analysis.
Here, we use the drift-only model. In this study, the series has excess kurtosis (Table 11) and also rejects the normality assumption. JHSU test rejects the unit root hypothesis at 5% significance level (Table 12) but other tests do not. Again, this shows that the JHSU test has higher power than that of others.
7. Conclusions
In this paper, we have explored the unit root test based on Johnson SU distribution as a reference density. A step-by-step method for computing the test statistic is detailed. Monte Carlo evidence shows a significant power improvement over the ADF test when the innovations are non-Gaussian. The choice of JHSU is much better than other reference densities that are used in literature. JHSU test is very powerful for asymmetric error processes. JHSU test dominates the partially adaptive estimation method proposed by Lima and Xiao (2010). It also dominates Pearson Type IV and student t3 density-based tests when the error follows asymmetric distributions. For symmetric errors, the JHSU test performs 89as well as most other traditional procedures.
We have also obtained very satisfactory results when the proposed test procedure has been applied to real data sets. Therefore, the JHSU test can be a viable and much better option for the practitioners and researchers.
Apart from the application of this test procedure, future studies can be carried out on whether one should use a deterministic time trend model or drift while testing stationarity using the JHSU test.
Acknowledgements
The authors wish to thank the editor and the referee for their valuable comments that helped improve the presentation of the paper.