Pseudodistance Methods Using Simultaneously Sample Observations and Nearest Neighbour Distance Observations for Continuous Multivariate Models ()
1. Introduction
For statistical inferences methods for continuous multivariate models, we often assume to have a random sample of size n of multivariate observations
which are independent and identically distributed as the d-dimensional vector of random variable
with a d-dimensional density function
.
For the parametric set-up
and let the vector
denote the true vector of parameters, we would like to have statistical methods for estimating the vector
if the parametric model
can be assumed and inference methods to validate the assumption of the model
by means of various goodness-of-fit statistics. This leads to a composite null hypothesis and ideally we would like to use goodness-of-fit test statistics which follow a unique asymptotic distribution
,
assumed to be compact.
The multidimensional model testing often poses difficulties as often goodness-of-fit test statistics used either have very complicated distributions such as the case of statistics which make use of multivariate empirical characteristic functions, see Csörgö [1] or for the classical chi-square tests where the asymptotic distributions for simple and composite hypotheses are simple but observations must be grouped into cells and there is some arbitrariness on choosing cells, see Moore [2] , Klugman et al. [3] (pages 208-209) on extending chi-square tests for continuous multidimensional models. Goodness of fit test statistics using multivariate sample distribution function often has a very complicated null distribution, see Babu and Rao [4] and extensive simulations are needed to obtain the p values of the tests. For applications in various fields, it appears that there is a need for developing goodness of fit tests statistics which are relatively simple to implement with the property of the tests based on such statistics which are consistent.
Multivariate modelling is used in many fields which include actuarial sciences and finance. For financial applications, Moore [2] used the chi-square tests for testing whether the joint weekly returns of two assets follow a bivariate normal distribution but as mentioned earlier, for chi-square tests we need to partition the sample space into cells and the tests are not consistent despite the asymptotic null distributions of the statistics are simple.
In this paper, we shall introduce a class of pseudodistance
constructed based on a class of convex functions
which measures the discrepancy between the two density functions g and f, see details in Section 2.3. Goodness-of-fit test statistics for model testing based on
will preserve the property of having a simple asymptotic null distribution comparable to chi-square tests but unlike chi-square tests, the tests based on
are consistent for model testing.
It is also interesting to note that within this class
, the statistic based on
with
can also accommodate parameters being estimated using maximum likelihood (ML) method for composite hypothesis. On the estimation side, estimators based on
will have the potential of having good efficiencies and robustness properties. Furthermore, estimation and model testing can be handled in a unified way.
The inference methods proposed extends previous methods for the univariate models to multivariate continuous models. This paper can be considered as a follow up of previous papers by Luong [5] , Luong [6] . The neighbour distance (NND) notion is used in this paper to replace a similar notion of distance which f was used when considering spacings and order statistics, see Ranneby et al. [7] . Order statistics are used for defining spacings for univariate models.
The class of pseudo distances
is constructed using the following class of strictly convex functions
with its second derivative
. For each chosen
we then have a corresponding pseudodistance
and
is a discrepancy measure between density g and density f.
Explicitly, the function
takes the form
, (1)
is a known constant with
and in practice we choose
near 0, and
can also have the form
. (2)
Note that
as
decreases to 0 and
only needs
to be defined up to an additive and a positive multiplicative constant and provided that these constants are known inference procedures based on
with
using
decreasing to 0 have efficiencies very close to inference procedures using
with
.
Furthermore, if
is used to construct the pseudo distance
, estimation using this pseudo distance will give the maximum likelihood estimators. This
as a pseudodistance is up to a few terms which does not depend on
the Kullback-Leibler (KL) distance used to generate ML estimators. These few terms without involving
do not affect estimation but they are very significant for construction of goodness-of-fit test statistics as goodness of fit test statistics constructed using
will have an asymptotic normal distribution for model testing meanwhile goodness-of-fit test statistics using the KL pseudodistance do not have a simple asymptotic distribution especially for the composite null hypothesis case where parameters must be estimated using the ML estimators. We shall give more discussions in Section 2.2.
The paper is organized as follows. In Section 2, we introduce the auxiliary observations obtained from the NND observations. The class of pseudodistances
is also introduced in this section. Asymptotic properties of estimators based on
are considered in Section 3. Estimators obtained using
with
are identical to ML estimators which are fully efficient. If other
is used for
, the corresponding estimators have the potential of good efficiencies and some robustness properties. These properties allow flexibility for balancing efficiency and robustness. In Section 4, goodness-of-fit statistics based on the class
are shown to have an asymptotic normal distribution and in Section 5 an example is provided for illustration of the proposed techniques.
2. Pseudo Distances
2.1. Nearest Neighbour Distance (NND) Observations
For each vector of observation
in the random sample, we define
the nearest neighbour distance (NND) to
with
,
is the commonly used Euclidean distance
and
clearly can be obtained using the sample of d-variate observations.
In the literature, these
’s have been used to construct goodness of fit statistics, see Bickel and Breiman [8] , Zhou and Jammalamadaka [9] but often these statistics for multivariate models do not have a simple asymptotic distribution which might create difficulties for applications. Now, we can define
as given
by proposition 2 by Ranneby et al. [7] (page 433) with
or equivalently
,
,
is the usual constant pi used in
formulas to find volume or area and
is the commonly used gamma function.
Note that we have
which are n univariate auxiliary observations obtained from NND observations. Therefore, from the original observations of the sample
and using the n auxiliary observations, we can form the following
multivariate observations
.
These n observations for
are asymptotically independent and have a common density function given by the density of
below,
, (3)
see the end of Section 2 given by Kuljus and Ranneby [10] , (p1094). In fact, the situation is similar to the univariate case where spacings were used, see Luong [5] (pages 619-620).
Now we can consider the random criterion function
(4)
for the class of function h defined by expressions (1) and (2), we shall see subsequently that inference methods based on the objective functions (4) are pseudodistance methods based on a class of pseudodistance
where g and f are density functions.
Minimizing
with
, we obtain the pseudodistance
estimators which are identical to maximum likelihood (ML) estimators, ML estimators can be viewed as pseudodistance estimators based on the Kullback-Leibler (KL) distance but we shall see goodness-of-fit tests statistics are complicated with the use of the KL distance unlike the ones which are based on
and consequently based on
. The KL pseudodistance used to derive ML estimators will be discussed in Section 2.2 and the class of pseudodistances will be introduced in Section 2.2.
Furthermore, if we use
to construct
then
should be set near 0 but within the range
for robust estimation without relying on a, explicit multivariate density estimate which is needed for the minimum Hellinger method as proposed by Tamura and Boos [11] . Therefore, it appears that the class of pseudodistance methods being considered are very useful for applications and they are relatively simple to implement so that practitioners might want to use them for applied works.
2.2. Kullback-Leibler (KL) Pseudo-Distance
The negative of the log likelihood function can be expressed as
and ML estimators can be viewed as the values obtained by minimizing the observed version which can also be called sample version of the Kullback-Leibler (KL) pseudo-distance (
), i.e.,
defined as
, (5)
denotes convergence in probability,
and minimizing
is equivalent to maximize the log of the likelihood function.
The KL pseudo-distance is defined as
, is the KL pseudo-distance.
Howewer, for testing the validity of the model with the null composite hypothesis given by
and since
appears in the LHS of expression (5), it must be estimated and replacing
say by a multivariate density estimate
will make the distribution of the LHS of expression (5) complicated despite that we can replace
by
. This might explain the limited use of the KL pseudo-distance for construction of statistics for model testing with the use of
.
2.3. The Class of Pseudo-Distances Dh
We shall focus on pseudo-distance methods based on
for parametric model with emphasis on continuous multivariate models but some of the previous univariate results which are scattered can also be unified by viewing them as pseudo distance methods.
In general for pseudodistances we require the following property:
if
,
if and only if
, (6)
g and f are density functions. The property given by (6) are needed for establishing consistency of estimation and for consitency goodness of fit tests, see Broniatowski et al. [12] for more notions and properties of pseudodistances.
Since
in general is not observable if g is unknown, we shall see at the end of this section that can define an observed version
with the property
.
will satisfy the property given by expression (6) in probability which similar to
with the property
for the KL distance.
Now we work with following pair of observations to develop
methods,
.
For this sample, the observations are asymptotically independent using Propostion 3 by Ranneby et al. [7] (p413) and as
, the distribution of
tends to a common distribution, i.e., the common distribution is the distribution of the random vector
with joint density function given by
,
see Kuljus and Ranneby [10] (p1101).
Therefore, the results are very similar to the univariate case with the interpretation
being a multivariate density here instead of a univariate density, the results given by Luong [5] (page 624) continue to hold and we also have:
1)
follows a standard exponential distribution with density
.
2) Z and X are independent.
If we use Jensen’s inequality it follows that
for
, since
and
for
.
Now, we can define
and
is a known constant which does not depend on the parameters given by the vector
.
Under the parametric model, if we consider to minimize
but g is unknown, it leads to consider the observed objective function
defined below which is based on
with
,
note that we have
.
The pseudodistance
estimators given by the vector
based on
is obtained by minimizing
. Equivalently, it is obtained by minimizing
which is expression (4).
3. Asymptotic Properties of Dh estimators
3.1. Consistency
It is not difficult to see that the
estimators given by the vector
which minimizes expression (4) is consistent by defining
and by using assumptions and results of Section 3.1 as given by Luong [5] (pages 622-624).
The limit laws like uniform weak law of large numbers UWLLN and Central limit Theorem (CLT) are applicable by using the property of
being a mixing sequence which is due to
are asymptotically independent with a common distribution as
. Therefore,
.
3.2. Asymptotic Normality
Using CLT and results given by Section 2 in Luong [5] (pages 626-631), we can conclude that
,
denotes convergence in law,
is the commonly used information matrix with
, if the function
is used to define
and
and
if the function
is used to define
and
with the first and second derivatives of h denoted respectively by
and
. The random variable Z follows a standard exponential distribution as given by expression 25 in Luong [5] (page 631) and from the standard exponential distribution, we also have
and note that
as
.
The
estimators using
, might have some robustness property using M-estimation theory, see Luong [5] (page 632) and might be preferred over the ML estimators.
From the fact that the proposed
methods are density based but without requiring an explicit density estimate to implement hence they appear to be simpler for practitioners and can be used as alternative to other robust methods such as the Hellinger methods as proposed by Tamura and Boos [11] . Besides, the observed pseudodistances
based on
can also be used for construction of goodness-of-fit statistics and lead to statistics which are relatively simple to implemement.
4. Goodness-of-Fit Tests Statistics Using
For model selection and model testing we are primary interested on testing the null composite hypothesis
.
Howewer it might be easier to follow the procedure to construct test statistics by first consider the test based on
which is also implicitly based on
for the simple hypothesis first where there is no unkown parameter.
4.1. Simple Null Hypothesis
For simple
,
A natural statistics to use can be based on
and since
forms a mixing sequence, CLT can be applied with the distribution of each
tends to a standard exponential random variable and Slutsky’s Theorem can also be used if needed. Therefore, the following test statistic
can be used and
where
is the variance of
where Z follows a standard exponential distribution.
For an
level test, we can reject
if
where
is the
the percentile of the standard normal distribution.
Equivalently, we can reject
if
. (7)
Note that
and
is also the variance of
Now if we use
with
,
and using the moment generating function of
which is given by
with
being the gamma function so that the cumulant generating function
and by differentiating it, we can obtain the first two cumulants which are given by
,
,
and
are respectively the digamma function and the trigamma function and they are available in most of the statistical packages.
The test statistic given by expression (7) can be expressed explicilty as
(8)
and reject the simple
if
for an
level test,
.
Note that the test is consistent as
as
if
, so we will reject
with probability 1 should
but this property is not shared by chi-square tests. Also, there is also the difficulty of arbitrariness of grouping observations into cells for chi-square tests, see Bickel and Breiman [8] for more discussions.
Furthermore, if we use
with
,
,
and
.
The corresponding test statistic given by expression (7) can be expressed explicitly as
(9)
and reject the simple
if
4.2. Composite Null Hypothesis
For model testing, we consider the composite
, since
is unknown, first we estimate
by
which minimizes
, then we can form the following statistic,
and we shall show that
which is similar to the statistic for the simple
and unlike other statistics when parameters are estimated lead to complicated null distribution, we shall show that the statistics behave like the one used for simple
in Section 4.1 and the equivalent rejection rules are similar to the ones given by expression (8) and expression (9) depending on the choice of
used for
.
In fact, these expressions remain valid for the composite
provided that we replace
by
when they appear in these expressions.
As we have seen by using a version of CLT for mixing sequences if needed,
, now if we can establish
(10)
with
being a term which converges to
in probability and by Slutzky theorem,
Now, we will proceed to establish the property by expression (10). Using the Mean Value Theorem and the following expansion around
, we have
with
lies on the line segment joining
and
.
Since
and using
,
see Luong [5] (page 630). Also, since
is positive definite,
and
, we then have the relation as given by expression (10).
Furthermore, if we use
for
,
,
.
The use of the ML estimators
for chi-square distance type statistics often create complications when comes to derive the asymptotic distributions of these statistics, see Chernoff and Lehmann [13] (p580), Luong and Thompson [14] (p249-251).
For applications, it has been recognized that the maximum value attained by the log of the likelihood function can provide information on goodness-of-fit for the model being used, the test as given by expression (8) with
replaced by
formalizes the informal procedures on the use of the maximum value of the log likelihood function for assessing goodness-of-fit of the model, see Klugman et al. [15] but note that the condition of no tied observation is needed for the use of test based on the log of likelihood function as given by expression (8) with
replaced by
, otherwise there are some values of
and the log of these values are undefined meanwhile test based on
with
is well defined even with the presence of tied obsevations and in general we should fix a value for
near 0 for balancing efficiency and robustness for the estimation procedures.
5. Illustration
For illustration of the proposed methods, we use the multivariate normal model with d dimension; its density function is often parameterized using the mean
and the covariance matrix
and it is given by
(11)
is the determinant of the matrix
, see Anderson [16] (page 20).
There is redundancy when using elements of the matrix
as parameters as
being a covariance matrix; it is symmetric.
We can eliminate the redundancy by defining the vector of parameters as
with
.
The Vech operator when applied to
extracts the lower triangular elements of
and stacks them in a vector. Equivalently, we can use the vector of parameters
instead of
and
and express the multivariate normal density as
to avoid redundancy of the previous parameterization. We assume that we have a random sample of size n which allows us to obtain the auxiliary univariate observations
from NND observations and there is no tied observation so that
For illustration say we use
with
, the vector of estimators in this case coincides with maximum likelihood (ML) estimators, i.e.,
but for multivariate normal model, it is well known
can be obtained explicitly, see Anderson [16] (page 112).
Explicitly,
,
,
,
is the sample mean and
is the sample covariance matrix which can also be expressed as
.
For model testing then we can use the test statistic
and reject the model if the statistics gives a value smaller than
for an
level test.
As Tamura and Boos [11] have pointed out that,
might not be robust and hence proposed multivariate Hellinger density estimators but a multivariate density estimate is needed for their procedures. For robust estimation or in case of having tied observation we might want to use
with
with
being a positive number but near 0.
In this paper, we focus on presentations of methodologies of
, leaving simulation studies for assessing power of the tests, the use of other distributions than the normal distribution for the null distribution of goodness-of-fit tests statistics and assessing efficiency when sample sizes are small or in finite samples for subsequent works. Practitioners might be encouraged to use these
methods.
Acknowledgements
The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.