Minimum Quadratic Distance Methods Using Grouped Data for Parametric Families of Copulas

Minimum quadratic distance (MQD) methods are used to construct chi-square test statistics for simple and composite hypothesis for parametric families of copulas. The methods aim at grouped data which form a contingency table but by defining a rule to group the data using Quasi-Monte Carlo numbers and two marginal empirical quantiles, the methods can be extended to handle complete data. The rule implicitly defines points on the nonnegative quadrant to form quadratic distances and the similarities of the rule with the use of random cells for classical minimum chi-square methods are indicated. The methods are relatively simple to implement and might be useful for applied works in various fields such as actuarial science.

Share and Cite:

Luong, A. (2018) Minimum Quadratic Distance Methods Using Grouped Data for Parametric Families of Copulas. Open Journal of Statistics, 8, 427-456. doi: 10.4236/ojs.2018.83028.

1. Introduction

In actuarial science or biostatistics we often encounter bivariate data which are already grouped into cells forming a contingency table, see Partrat  (p 225), Gibbons and Chakraborti  (p 511-512) for examples, and the primary focus is on dependency study and we only want like to make inference on association parameters of the parametric survival copula used to model the dependency of the two components of the bivariate observations.

For the complete data, in actuarial science or biostatistics usually we assume to have a sample of nonnegative bivariate observations ${Z}_{i}={\left({X}_{i},{Y}_{i}\right)}^{\prime }$ which are independent and identically distributed (iid) as $Z={\left(X,Y\right)}^{\prime }$ with the bivariate survival function expressible as

${S}_{\theta }\left(x,y\right)={C}_{\theta }\left(\stackrel{¯}{F}\left(x\right),\stackrel{¯}{G}\left(y\right)\right)$ (1)

where ${C}_{\theta }\left(u,v\right),0\le u\le 1,0\le v\le 1$ is the survival copula function, $\stackrel{¯}{F}\left(x\right)=P\left(X>x\right)$ and $\stackrel{¯}{G}\left(y\right)=P\left(Y>y\right)$ are the marginal survival functions. The bivariate model with deductibles in actuarial science as given by Klugman and Parsa  can be considered as having complete data within this framework as we still have a sample of bivariate observations which are iid.

In this paper, we emphasize nonnegative distributions. So in general we use survival functions and survival copula functions but it is not difficult to see that the statistical procedures developed can be adjusted to handle the situation where we use distribution functions and distribution copula instead of survival function and survival copula. If we use distributions functions then the bivariate distribution function

$H\left(x,y\right)={C}_{\theta }\left(F\left(x\right),G\left(y\right)\right)$

where the marginal distribution functions are given respectively by $F\left(x\right)$ and $G\left(y\right)$ . In the paper by Dobric and Schmid  , distributions functions are used as the authors emphasize financial applications instead of actuarial science applications. It is not difficult to see that statistical procedures are similar.

For illustrations, we shall discuss of few examples of parametric model for survival copulas. In general, a survival copula can be viewed as a bivariate survival function but the bivariate sample of observations which is given by the complete data is not drawn directly from this bivariate survival function. This should be taken into account when developing inferences methods even when the data is complete. It is natural to have procedures which provide a unified approach for grouped data and for complete data but must be grouped so a rule for grouping the complete data needs to be specified. We shall see that a rule for grouping the data is equivalent to a rule for choosing points on the nonnegative quadrant. We propose inference procedures which are based on quadratic distance and which lead to chi-square tests statistics for the composite hypothesis.

${H}_{0}:C\left(u,v\right)\in \left\{{C}_{\theta }\left(u,v\right)\right\}$ (2)

with the vector of parameters given by $\theta ={\left({\theta }_{1},\cdots ,{\theta }_{m}\right)}^{\prime }$ and in most of the applications, we just need one or two parameters and the true vector of parameters for the copula model is denoted by ${\theta }_{0}$ . Also, by copula in general we mean survival copula.

In actuarial science we often encounter grouped data, see Klugman et al.  for the univariate case. Inferences procedures for bivariate censored data have been developed by Shih and Louis  , see the review paper by Genest et al.  but inference procedures for grouped data do not seem to receive attention and furthermore, despite that the chi-square tests statistics that Dobric and Schmid  propose to make use of a contingency table, complete data must be available first, and then transformed by the marginal empirical distribution functions, subsequently put into cells of a contingency table. By making use of multinomial distributions which are induced by a contingency table, chi-square tests can be proposed. In practice, if data are gouped into a contingency table without being transformed, then the tests procedures are no longer applicable. They also note that chi-square tests statistics can have good power along some direction of the alternatives yet being simple to apply and might be of interest for practitioners.

We also know that chi-square tests statistics in one dimension might not be consistent in all direction of the alternatives yet due to its simplicity to apply as there is a unique asymptotic chi-square distribution across the composite hypothesis, one can control the size of the test. Depending on the alternatives and by carefully choosing the intervals to partition the real line, chi-square tests can still have good power against some directions of the alternatives and in practice. Often we are primarily concerned about some type of alternatives instead of all alternatives. For these advantages, chi-square tests are still used despite there are more powerful tests such as the Cramer-Von Mises tests, see Greenwood and Nikulin  (p 124-126) for power under contaminated mixture distributions alternatives and Lehmann  (p 326-329) for discussions on power of chi-square tests which are related to the way to create intervals to group the data in one dimension.

Therefore, if we can retain the advantages of the chi-square tests in two dimensions of having a unique chi-square distribution across the null composite hypothesis and improve on the issue of arbitrariness of a grouping rule, the inference procedures might still be attractive for practitioners as implementing other tests procedures might need extensive simulations to approximate a null distribution which depends on ${\theta }_{0}\in \Omega$ .

In this paper, we would like to develop minimum quadratic distance (MQD) procedures for grouped data and the procedures can be extended to the situation of having complete data and they must be grouped by specifying a rule which make use of the Halton sequence of Quasi-Monte Carlo (QMC) numbers and two empirical quantiles from the two marginal distributions or marginal survival functions. Tests for copula models can be performed using chi-square tests statistics with data already grouped and if complete data is available they can be grouped according a more clearly defined rule. As mentioned earlier, the rule to select cells to group the data is a rule to select points on the nonnegative quadrant to construct quadratic distances. If complete data is available then it is established using QMC methods and based on the idea of selecting points in the nonnegative quadrant so that Cramer-Von Mises distances can be approximated by quadratic distances. The methods can also be applied to Copula models with a singular component when $u=v$ provided that the Copula function is differentiable with respect to the parameters given by $\theta$ . An example of such a copula is the one parameter Marshall Olkin(MO) copula, for discussions on MO copulas, see Dobrowolski and Kumar  and Marshall and Olkin  .

We briefly list some copula models often encountered in practice. Most of them just have one or two parameters. A subclass of Archimedean copulas has the representation using a generator which is the Laplace transform (LT) of a nonnegative random variable $T\ge 0$ denoted by $\psi \left(s;\theta \right)={\psi }_{\theta }\left(s\right)$ . The class can be represented as

${C}_{\theta }\left(u,v\right)={\psi }_{\theta }\left[{\psi }_{\theta }{}^{-1}\left(u\right)+{\psi }_{\theta }{}^{-1}\left(v\right)\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}0\le u\le 1,\text{\hspace{0.17em}}0\le v\le 1$ .

If we specify a gamma LT with ${\psi }_{\theta }\left(s\right)={\left(1+s\right)}^{-\theta },\theta >0$ , then we have the Clayton or Cook-Johnson copula model

${C}_{\theta }\left(u,v\right)={\left({u}^{-\theta }+{v}^{-\theta }-1\right)}^{-\frac{1}{\theta }},\theta >0$ .

If we specify a positive stable LT ${\psi }_{\theta }\left(s\right)={\text{e}}^{-{s}^{\theta }}$ then we have the positive stable copula model which is also called positive stable frailties model with

${C}_{\theta }\left(u,v\right)=\mathrm{exp}\left[-{\left\{{\left(-\mathrm{log}u\right)}^{\frac{1}{\theta }}+{\left(-\mathrm{log}v\right)}^{\frac{1}{\theta }}\right\}}^{\theta }\right],\text{\hspace{0.17em}}0<\theta <1$ ,

see Shih and Louis  for these families and for simulations from these copulas, and see the algorithms given by Mai and Scherer  (p 98-99).

Beside this subclass the one and two parameters Marshall Olkin copula models are also frequently used. The two parameters MO model can be expressed as

${C}_{\theta }\left(u,v\right)={u}^{1-{\theta }_{1}}v$ if ${u}^{{\theta }_{1}}\ge {v}^{{\theta }_{2}}$ and ${C}_{\theta }\left(u,v\right)=u{v}^{1-{\theta }_{2}}$ if ${u}^{{\theta }_{1}}\le {v}^{{\theta }_{2}}$ ,

$0<{\theta }_{1}<1,\text{\hspace{0.17em}}0<{\theta }_{2}<1,\text{\hspace{0.17em}}\theta ={\left({\theta }_{1},{\theta }_{2}\right)}^{\prime }$ .

The model has a singular component and if ${\theta }_{1}={\theta }_{2}$ , the MO copula model just has one parameter and

${C}_{\theta }\left(u,v\right)={u}^{1-\theta }v$ if $u\ge v$ and ${C}_{\theta }\left(u,v\right)=u{v}^{1-\theta }$ if $u\le v$ ,

note that ${C}_{\theta }\left(u,v\right)$ is singular for $u=v$ but a function of $\theta$ , ${C}_{\theta }\left(u,v\right)$ is differentiable. For further discussions on MO copulas see Dobrowolski and Kumar  and see Ross  (p 103-108) for simulations from MO copulas and Gaussian copulas. The Gaussian Copula model can be represented by

${C}_{\rho }\left(u,v\right)={\int }_{-\infty }^{{ф}^{-1}\left(u\right)}{\int }_{-\infty }^{{ф}^{-1}\left(v\right)}\frac{1}{2\text{π}\sqrt{1-{\rho }^{2}}}\mathrm{exp}\left\{-\frac{1}{2\left(1-{\rho }^{2}\right)}\left({x}^{2}+{y}^{2}-2xy\right)\right\}\text{d}x\text{d}y$ ,

with the standard normal univariate quantile function denoted by ${ф}^{-1}\left(.\right)$ and the integrand of the above integral is a bivariate normal density function with standard normal marginals and parameter ρ.

Copulas are often used to create bivariate distributions and for inference procedures for these distributions for actuarial science, see Klugman and Parsa  , Klugman et al.  , Frees and Valdez  for examples.

Before giving further details and properties of MQD methods, we shall give the logic behind the MQD procedures.

Let the bivariate empirical survival function be defined as

${S}_{n}\left(x,y\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}>x,{y}_{i}>y\right]$

with $I\left[.\right]$ being the usual indicator function, let $C\left(u,v\right)={C}_{{\theta }_{0}}\left(u,v\right)$ and define the two univariate empirical marginal survival functions as

${\stackrel{¯}{F}}_{n}\left(x\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}>x\right]$ and ${\stackrel{¯}{G}}_{n}\left(x\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{y}_{i}>x\right]$ ,

we then have the following convergence in probability properties,

${S}_{n}\left(x,y\right)\stackrel{p}{\to }S\left(x,y\right),\text{\hspace{0.17em}}{\stackrel{¯}{F}}_{n}\left(x\right)\stackrel{p}{\to }\stackrel{¯}{F}\left(x\right),\text{\hspace{0.17em}}{\stackrel{¯}{G}}_{n}\left(x\right)\stackrel{p}{\to }\stackrel{¯}{G}\left(x\right)$

with the true survival function and the marginal survival functions are given respectively by $S\left(x,y\right)$ , $\stackrel{¯}{F}\left(x\right)$ and $\stackrel{¯}{G}\left(y\right)$ . We shall assume that $\stackrel{¯}{F}\left(x\right)$ and $\stackrel{¯}{G}\left(y\right)$ are absolutely continuous, $S\left(x,y\right)$ is either absolutely continuous or $S\left(x,y\right)$ is absolutely continuous everywhere except when $x=y$ where the survival distribution can be singular as in the case of the bivariate exponential model introduced by Marshall and Olkin  .

Now if the parametric survival copula model is valid,

$S\left(x,y\right)=C\left(\stackrel{¯}{F}\left(x\right),\stackrel{¯}{G}\left(y\right)\right),\text{\hspace{0.17em}}C\left(u,v\right)={C}_{{\theta }_{0}}\left(u,v\right)$

so that

${S}_{n}\left(x,y\right)-{C}_{\theta }\left({\stackrel{¯}{F}}_{n}\left(x\right),{\stackrel{¯}{G}}_{n}\left(y\right)\right)\stackrel{p}{\to }0$ for $\theta ={\theta }_{0}$ .

For the time being assume that the M points given by ${\left({x}_{l},{y}_{l}\right)}^{\prime },l=1,\cdots ,M$ are already chosen, then we can define the vector of empirical components,

${\stackrel{^}{z}}_{n}={\left({S}_{n}\left({x}_{1},{y}_{1}\right),\cdots ,{S}_{n}\left({x}_{M},{y}_{M}\right)\right)}^{\prime }$

with the counterpart vector which makes use of the copula model,

${\stackrel{^}{z}}_{\theta }={\left({C}_{\theta }\left({\stackrel{¯}{F}}_{n}\left({x}_{1}\right),{\stackrel{¯}{G}}_{n}\left({y}_{1}\right)\right),\cdots ,{C}_{\theta }\left({\stackrel{¯}{F}}_{n}\left({x}_{M}\right),{\stackrel{¯}{G}}_{n}\left({y}_{M}\right)\right)\right)}^{\prime }$ ,

and form the vector of differences ${G}_{n}\left(\theta \right)={\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }$ , by choosing a symmetric positive definite matrix $W$ we can form a class of quadratic distances (QD) given by

${Q}_{n}\left(\theta \right)={\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }W\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ .

A positive definite matrix can be used to create a weighted Euclidean norm, so we can also let

${Q}_{n}\left(\theta \right)={‖{G}_{n}\left(\theta \right)‖}^{2}$ ,

$‖\text{ }.\text{ }‖$ is the weighted Euclidean norm induced by $W$ and if we let $W=I$ then we obtain the classical Euclidean norm. QD inferences procedures developed subsequently are based on ${Q}_{n}\left(\theta \right)$ which are similar to the univariate case. For MQD procedures with univariate observations, see Luong and Thompson  .

The paper is organized as follows.

In Section 3, MQD methods will be developed using predetermined grouped data such as data presented using a contingency table. The efficient quadratic distances is derived and can be used for estimation and model testing. Asymptotic theory is established for MQD estimators and chi-square tests using quadratic distances can be constructed for testing copula models. In Section 4, by viewing grouped data as defining a set of points on the nonnegative quadrant, a rule to select points is proposed based on Quasi-Monte-Carlo numbers and two sample quantiles if complete data is available and the methods can be extended to the situation where complete data is available. The methods can be seen as similar to minimum chi-square methods with random cells but with a rule to define these cells. The choice of random cells for minimum chi-square methods is less well defined. Section 5 illustrates the implementations of MQD methods using a limited simulation study by comparing the methods of moment estimator (MM) estimators based on sample Spearman rho which requires the availability of complete data versus the MQD estimator which uses grouped data for the one parameter Marshall-Olkin model and it appears that the chi-square tests have some power to detect alternatives which can be represented as mixture or contaminated copula model such as the mixture of one parameter Marshall-Olkin copula model and Gaussian copula model from the study. The findings appear to be in line with chi-square tests in one dimension which also display similar properties if intervals are chosen properly.

2. MQD Methods Using Grouped Data

2.1. Contingency Tables

Contingency table data can be viewed as a special form of two-dimensional grouped data. We will give some more details about this form of grouped data.

Assume that we have a sample ${Z}_{i}={\left({X}_{i},{Y}_{i}\right)}^{\prime },i=1,\cdots ,n$ which are independent and identically distributed as $Z={\left(X,Y\right)}^{\prime }$ which follows a non-negative continuous bivariate distribution with model survival function given by ${C}_{\theta }\left(\stackrel{¯}{F}\left(x\right),\stackrel{¯}{G}\left(y\right)\right)$ . The marginal survival functions are given respectively by $\stackrel{¯}{F}\left(x\right)$ and $\stackrel{¯}{G}\left(y\right)$ assumed to be absolutely continuous but there is no parametric model assumed for the marginals.

The vector of parameters is $\theta ={\left({\theta }_{1},\cdots ,{\theta }_{m}\right)}^{\prime }$ , the true vector of parameters is denoted by ${\theta }_{0}$ . We do not observe the original sample but observations are grouped and put into a contingency table and only the number which fall into each cells of the contingency table are recorded or equivalently the sample proportions which fall into these cells are recorded. Contingency tables are often encountered in actuarial science and biostatistics, see Partrat  (p 225), Gibbons and Chakraborti  (p 511-512) and we shall give a brief description below.

Let the nonnegative axis X be partitioned into disjoints interval ${\cup }_{i=1}^{I}{\left[{s}_{i-1},{s}_{i}\right)}^{\text{​}}$ with ${s}_{0}=0,{s}_{I}=\infty$ and similarly, the axis Y be partitioned into disjoints interval ${\cup }_{j=0}^{J}\left[{t}_{j-1},{t}_{j}\right)$ with ${t}_{0}=0,{t}_{J}=\infty$ .

The nonnegative quadrant can be partitioned into nonoverlapping cells of the form.

${C}_{ij}=\left[{s}_{i-1},{s}_{i}\right)×\left[{t}_{j-1},{t}_{j}\right),\text{\hspace{0.17em}}i=1,\cdots ,I,\text{\hspace{0.17em}}j=1,\cdots ,J$ .

The contingency table $T=\left({C}_{ij}\right)$ is formed which can be viewed as a matrix with elements given by

${C}_{ij},\text{\hspace{0.17em}}i=1,\cdots ,I,\text{\hspace{0.17em}}j=1,\cdots ,J$ .

The empirical bivariate survival function is as defined earlier with ${S}_{n}\left(x,y\right)\stackrel{p}{\to }S\left(x,y\right)$ , the underlying bivariate survival distribution. We assume that $S\left(x,y\right)$ is either absolutely continuous or it can have a singular component when $X=Y$ as in the case of the bivariate exponential distribution of Marshall Olkin  but absolutely continuous elsewhere. Implicitly, the marginal survival functions $\stackrel{¯}{F}\left(x\right)$ and $\stackrel{¯}{G}\left(y\right)$ are assumed to be absolutely continuous.

The sample proportion or empirical probability for one observation which falls into cell ${C}_{ij}$ can be obtained using ${S}_{n}\left(x,y\right)$

$\stackrel{^}{{p}_{n}}\left({C}_{ij}\right)={S}_{n}\left({s}_{i-1},{t}_{j-1}\right)-{S}_{n}\left({s}_{i-1},{t}_{j}\right)-{S}_{n}\left({s}_{i},{t}_{j-1}\right)+{S}_{n}\left({s}_{j},{t}_{j}\right)$ (3)

and the corresponding probability $\stackrel{^}{{p}_{\theta }}\left({C}_{ij}\right)$ using the copula model coupled with the empirical survival distributions ${\stackrel{¯}{F}}_{n}$ and ${\stackrel{¯}{G}}_{n}$ with $\stackrel{^}{{S}_{\theta }}\left(s,t\right)={C}_{\theta }\left({\stackrel{¯}{F}}_{n}\left(s\right),{\stackrel{¯}{G}}_{n}\left(t\right)\right)$ is given by

$\stackrel{^}{{p}_{\theta }}\left({C}_{ij}\right)=\stackrel{^}{{S}_{\theta }}\left({s}_{i-1},{t}_{j-1}\right)-\stackrel{^}{{S}_{\theta }}\left({s}_{i-1},{t}_{j}\right)-\stackrel{^}{{S}_{\theta }}\left({s}_{i},{t}_{j-1}\right)+\stackrel{^}{{S}_{\theta }}\left({s}_{j},{t}_{j}\right)$ .

It is not difficult to see that there is redundant information displayed by a contingency table, one way to see that there is duplication is to note

$\stackrel{^}{{S}_{\theta }}\left({s}_{i},{t}_{J}\right)=0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{n}\left({s}_{i},{t}_{J}\right)=0,\text{\hspace{0.17em}}i=1,\cdots ,I$ (4)

and similarly, $\stackrel{^}{{S}_{\theta }}\left({s}_{I},{t}_{j}\right)=0,\text{\hspace{0.17em}}{S}_{n}\left({s}_{I},{t}_{j}\right)=0,\text{\hspace{0.17em}}j=1,\cdots ,J$ .

Therefore, the set points given by $\left\{\left({s}_{I},{t}_{j}\right),\left({s}_{i},{t}_{J}\right),i=1,\cdots ,I,j=1,\cdots ,J\right\}$ can be discarded without affecting the information provided by the contingency table. Consequently, we can view a contingency table implicitly define a grid on the nonnegative quadrant with only $M=\left(I-1\right)\left(J-1\right)$ points. It is also clear that if we want a rule to choose cells, the same rule will allow us to choose points on the nonnegative quadrant.

The objective function of the proposed quadratic form will be given below. It is a natural extension of the objective function used in the univariate case. Define a vector with empirical components so that we only need one subscript by collapsing the points of the contingency table given by

$\left\{\left({s}_{i},{t}_{j}\right),i=1,\cdots ,I-1,j=1,\cdots ,J-1\right\}$ into a vector by putting the first row of the matrix as the first batch of elements of the vector and the second row being the second batch of elements so forth so on, i.e., let

$\stackrel{^}{{z}_{n}}={\left({S}_{n}\left({s}_{1},{t}_{1}\right),\cdots ,{S}_{n}\left({s}_{M},{t}_{M}\right)\right)}^{\prime },\text{\hspace{0.17em}}M=\left(I-1\right)\left(J-1\right)$ . (5)

and its counterpart which makes use of the copula model is

$\stackrel{^}{{z}_{\theta }}={\left(\stackrel{^}{{S}_{\theta }}\left({s}_{1},{t}_{1}\right),\cdots ,\stackrel{^}{{S}_{\theta }}\left({s}_{M},{t}_{M}\right)\right)}^{\prime }$ . (6)

The number of components of ${z}_{n}$ is M with the assumption $M>m$ .

A class of quadratic distances can be defined as

${Q}_{n}\left(\theta \right)={\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }W\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ (7)

with $W$ being a symmetric and positive definite matrix. In this class, we focus on two choices of $W$ .

Letting $W=I$ , we obtain the unweighted quadratic distance, this choice is not optimum but it produces consistent estimators and can be used as preliminary estimates for $\theta$ to start the numerical procedures for finding more efficient estimators. The matrix $W$ is defined up to a positive constant as minimizing the objective function multiplied by a positive constant still gives the same estimators and $\stackrel{^}{W}$ a consistent estimate of $W$ can be used to replace $W$ without affecting the asymptotic theory for estimation and asymptotic distribution for test statistics. Using quadratic distance theory or generalized methods of moment (GMM) theory, it is not difficult to see that an optimum choice for $W$ is to let $W={W}_{0}$ where ${\Omega }_{0}={W}_{0}^{-1}$ and ${\Omega }_{0}$ is an asymptotic covariance matrix which is given by

$\sqrt{n}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)\stackrel{L}{\to }N\left(0,{\Omega }_{0}\right),$

see Remark 2.4.3 given by Luong and Thompson  (p 245).

Clearly, ${\Omega }_{0}$ depends on ${\theta }_{0}$ . We shall obtain the expression for ${\Omega }_{0}$ and show that ${\Omega }_{0}$ can be estimated by ${\stackrel{^}{\Omega }}_{0}$ in the next section as we can obtain a preliminary consistent estimate for ${\theta }_{0}$ by using the unweighted quadratic distance or other quick methods; see the methods of moment using Spearman-rho in Section 5.2 for example. Consequently, by quadratic distance we mean the following efficient version with the objective function defined as

${Q}_{n}\left(\theta \right)={\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }\stackrel{^}{{W}_{0}}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ with $\stackrel{^}{{W}_{0}}={\stackrel{^}{\Omega }}_{0}^{-1}$ . (8)

The version with $W=I$ will be called unweighted quadratic distance. In the next section we shall use the influence function representation for $\sqrt{n}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ to derive ${\Omega }_{0}$ and we shall also propose ${\stackrel{^}{\Omega }}_{0}$ a consistent estimate for ${\Omega }_{0}$ .

2.2. Optimum Matrix W0

The matrix ${\Omega }_{0}$ which is the asymptotic covariance matrix of the vector $\sqrt{n}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ plays an important role for MQD methods as we can obtain estimators with good efficiencies for estimators using ${\Omega }_{0}$ or a consistent estimate of ${\Omega }_{0}$ and we also have chi-square tests statistics. Despite that ${\Omega }_{0}$ is unknown, its elements are not complicated and moreover, it can be replaced by a consistent estimate without affecting the asymptotic properties of the procedures. We shall give more details about this matrix and construct $\stackrel{^}{{\Omega }_{0}}$ , a consistent estimate of ${\Omega }_{0}$ .

Using influence representation for the vector of functions of $\sqrt{n}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ which depend on three functions ${S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}$ as discussed by Reid  , see technical appendix (TA1) in the Appendices for more details, it can be seen that ${\Omega }_{0}$ is the covariance matrix of the vector $h\left(x,y\right)$ under ${\theta }_{0}$ with

$h\left(x,y\right)=\left[\begin{array}{c}{{a}^{\prime }}_{1}{Y}_{1}\\ ⋮\\ {{a}^{\prime }}_{M}{Y}_{m}\end{array}\right]$ ,

${{a}^{\prime }}_{i}=\left(1,-\frac{\partial C\left(\stackrel{¯}{F}\left({s}_{i}\right),\stackrel{¯}{G}\left({t}_{i}\right)\right)}{\partial u},-\frac{\partial C\left(\stackrel{¯}{F}\left({s}_{i}\right),\stackrel{¯}{G}\left({t}_{i}\right)\right)}{\partial v}\right),\text{\hspace{0.17em}}i=1,\cdots ,M$ ,

${Y}_{i}=\left[\begin{array}{c}I\left[x>{s}_{i},y>{t}_{i}\right]-S\left({s}_{i},{t}_{i}\right)\\ I\left[x>{s}_{i}\right]-\stackrel{¯}{F}\left({s}_{i}\right)\\ I\left[y>{t}_{i}\right]>\stackrel{¯}{G}\left({t}_{i}\right)\end{array}\right],\text{\hspace{0.17em}}i=1,\cdots ,M$

and $I\left[.\right]$ is the usual indicator function,

$S\left({s}_{i},{t}_{i}\right)=C\left(\stackrel{¯}{F}\left({s}_{i}\right),\stackrel{¯}{G}\left({t}_{i}\right)\right)$ , $C={C}_{{\theta }_{0}}$ .

$\frac{\partial C\left(u,v\right)}{\partial u},\frac{\partial C\left(u,v\right)}{\partial v}$ are respectively the partial derivatives of $C\left(u,v\right)$ with respect to u and v.

It is not difficult to see that the elements of ${\Omega }_{0}$ are

${\Omega }_{0}\left(i,j\right)={{a}^{\prime }}_{i}Cov\left({Y}_{i},{Y}_{j}\right){a}_{j}$

with $Cov\left({Y}_{i},{Y}_{j}\right)=E\left({Y}_{i}{{Y}^{\prime }}_{j}\right)$ and since ${Y}_{i}$ and ${Y}_{j}$ are not identically distributed $Cov\left({Y}_{i},{Y}_{j}\right)$ is not symmetric, the matrix has 9 elements, see technical Appendix (TA2) in the Appendices for more details. The elements can be expressed as

$\begin{array}{l}{c}_{11}=S\left(\mathrm{max}\left({s}_{i},{s}_{j}\right),\mathrm{max}\left({t}_{i},{t}_{j}\right)\right)-S\left({s}_{i},{t}_{i}\right)S\left({s}_{j},{t}_{j}\right)\\ {c}_{12}=S\left(\mathrm{max}\left({s}_{i},{s}_{j}\right),{t}_{i}\right)-S\left({s}_{i},{t}_{i}\right)\stackrel{¯}{F}\left({s}_{j}\right)\\ {c}_{13}=S\left({s}_{i},\mathrm{max}\left({t}_{i},{t}_{j}\right)\right)-S\left({s}_{i},{t}_{i}\right)\stackrel{¯}{G}\left({t}_{j}\right)\\ {c}_{21}=S\left(\mathrm{max}\left({s}_{i},{s}_{j}\right),{t}_{j}\right)-S\left({s}_{j},{t}_{j}\right)\stackrel{¯}{F}\left({s}_{i}\right)\\ {c}_{22}=\stackrel{¯}{F}\left(\mathrm{max}\left({s}_{i},{s}_{j}\right)\right)-\stackrel{¯}{F}\left({s}_{i}\right)\stackrel{¯}{F}\left(sj\right)\end{array}$

$\begin{array}{l}{c}_{23}=S\left({s}_{i},{t}_{j}\right)-\stackrel{¯}{F}\left({s}_{i}\right)\stackrel{¯}{G}\left({t}_{j}\right)\\ {c}_{31}=S\left({s}_{j},\mathrm{max}\left({t}_{i},{t}_{j}\right)\right)-S\left({s}_{j},{t}_{j}\right)\stackrel{¯}{G}\left({t}_{i}\right)\\ {c}_{32}=S\left({s}_{i},{t}_{j}\right)-\stackrel{¯}{F}\left({s}_{j}\right)\stackrel{¯}{G}\left({t}_{i}\right)\\ {c}_{33}=\stackrel{¯}{G}\left(\mathrm{max}\left({t}_{i},{t}_{j}\right)\right)-\stackrel{¯}{G}\left({t}_{i}\right)\stackrel{¯}{G}\left({t}_{j}\right)\end{array}$ (9)

The elements ${c}_{ij}$ can be estimated empirically by replacing $S,\stackrel{¯}{F},\stackrel{¯}{G}$ in the expressions of ${c}_{ij}$ by ${S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}$ for $i=1,2,3,\text{\hspace{0.17em}}j=1,2,3$ . The estimates $\stackrel{^}{{c}_{ij}},i=1,2,3,\text{\hspace{0.17em}}j=1,2,3$ can be formed.

Therefore, we can form $\stackrel{^}{Cov}\left({Y}_{i},{Y}_{j}\right)$ which estimates $Cov\left({Y}_{i},{Y}_{j}\right)$ . Similarly, by replacing ${\theta }_{0}$ by a consistent preliminary estimate ${\theta }_{0}^{\left(0\right)}$ which can be obtained using the unweighted quadratic distance for example and replacing $\stackrel{¯}{F},\stackrel{¯}{G}$ by $\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}$ we can estimate ${a}_{i}$ by $\stackrel{^}{{a}_{i}},i=1,\cdots ,M$ .

$\stackrel{^}{{\Omega }_{0}}$ an estimate for ${\Omega }_{0}$ will have the elements given by

$\stackrel{^}{{\Omega }_{0}}\left(i,j\right)={{\stackrel{^}{a}}^{\prime }}_{i}\stackrel{^}{Cov}\left({Y}_{i},{Y}_{j}\right){\stackrel{^}{a}}_{j},\text{\hspace{0.17em}}i=1,\cdots ,3,\text{\hspace{0.17em}}j=1,\cdots ,3$ (10)

and define $\stackrel{^}{{W}_{0}}={\stackrel{^}{\Omega }}_{0}^{-1}$ . $\stackrel{^}{{W}_{0}}$ will be used as an optimum matrix for constructing quadratic distance as the asymptotic property remain unchanged. We can replace the unknown matrix ${W}_{0}={\Omega }_{0}^{-1}$ by its consitent estimate which is $\stackrel{^}{{W}_{0}}$ without affecting asymptotic theory for estimation and tests.

3. MQD Methods Using Grouped Data

3.1. Estimation

The MQD estimators can be seen as given by the vector $\stackrel{^}{\theta }$ which minimizes

${Q}_{n}\left(\theta \right)={\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }\stackrel{^}{{W}_{0}}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ . (11)

and since

$\stackrel{^}{{z}_{n}}={\left({S}_{n}\left({s}_{1},{t}_{1}\right),\cdots ,{S}_{n}\left({s}_{M},{t}_{M}\right)\right)}^{\prime }$ , (12)

$\stackrel{^}{{z}_{\theta }}={\left(\stackrel{^}{{S}_{\theta }}\left({s}_{1},{t}_{1}\right),\cdots ,\stackrel{^}{{S}_{\theta }}\left({s}_{M},{t}_{M}\right)\right)}^{\prime }$ , (13)

$\stackrel{^}{{S}_{\theta }}\left({s}_{i},{t}_{i}\right)={C}_{\theta }\left(\stackrel{¯}{{F}_{n}}\left({s}_{i}\right),\stackrel{¯}{{G}_{n}}\left({t}_{i}\right)\right),\text{\hspace{0.17em}}i=1,\cdots ,M$ , (14)

${G}_{n}\left(\theta \right)=\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ ,

we can also used the weighted Euclidean norm $‖\text{ }\cdot \text{ }‖$ with the use of $\stackrel{^}{{W}_{0}}$ and let

${Q}_{n}\left(\theta \right)={‖{G}_{n}\left(\theta \right)‖}^{2}$ . (15)

Consistency for quadratic distance estimators using predetermined grouped data or if complete data is available but must be grouped according a rule can be treated in a unified way using the following Theorem 1 which is essentially Theorem 3.1 of Pakes and Pollard  (p 1038) and the proof has been given by the authors. In fact, their Theorems 3.1 and 3.3 are also useful for Section 4 where we have complete data and we have choices to group the data into cells or equivalently forming the artificial sample points on the nonnegative quadrant to form the quadratic distances.

Theorem 1 (Consistency)

Under the following conditions $\stackrel{^}{\theta }$ converges in probability to ${\theta }_{0}$ :

1) $‖{G}_{n}\left(\stackrel{^}{\theta }\right)‖\le {o}_{p}\left(1\right)+{\mathrm{inf}}_{\theta \in \Omega }\left(‖{G}_{n}\left(\theta \right)‖\right)$ , the parameter space Ω is compact

2) $‖{G}_{n}\left({\theta }_{0}\right)‖={o}_{p}\left(1\right)$ ,

3) ${\mathrm{sup}}_{‖\theta -{\theta }_{0}‖>\delta }\left(\frac{1}{‖{G}_{n}\left(\theta \right)‖}\right)={O}_{p}\left(1\right)$ for each $\delta >0$ .

Theorem 3.1 states condition b) as ${G}_{n}\left({\theta }_{0}\right)={o}_{p}\left(1\right)$ but in the proof the authors just use $‖{G}_{n}\left({\theta }_{0}\right)‖={o}_{p}\left(1\right)$ so we state condition b) as $‖{G}_{n}\left({\theta }_{0}\right)‖={o}_{p}\left(1\right)$ .

An expression is ${o}_{p}\left(1\right)$ if it converges to 0 in probability, ${O}_{p}\left(1\right)$ if it is

bounded in probability and ${o}_{p}\left({n}^{-\frac{1}{2}}\right)$ if it converges to 0 in probability faster than ${n}^{-\frac{1}{2}}\to 0$ . We have ${\mathrm{inf}}_{\theta \in \Omega }\left(‖{G}_{n}\left(\theta \right)‖\right)$ occurs at the values of the vector values of the MQD estimators, so the conditions 1) and 2) are satisfied for both versions. Implicitly, we make the assumption that the parameter space Ω is compact. Also, for both versions $‖{G}_{n}\left(\theta \right)‖\stackrel{p}{\to }0$ only at $\theta ={\theta }_{0}$ in general if the number of components of ${G}_{n}\left(\theta \right)$ is greater than the number of parameters of the model, i.e., $M>m$ .

For $\theta \ne {\theta }_{0}$ we have $0<{Q}_{n}\left(\theta \right)\le B$ for some $B>0$ since survival functions evaluated at points are components of ${G}_{n}\left(\theta \right)$ and these functions are bounded. This implies that there exist real numbers u and v with $0 such that

$P\left(u\le {\mathrm{sup}}_{‖\theta -{\theta }_{0}‖>\delta }\left(\frac{1}{‖{G}_{n}\left(\theta \right)‖}\right)\le v\right)\to 1$ as $n\to \infty$ .

Therefore, the minimum quadratic distance (MQD) estimators are consistent, i.e., $\stackrel{^}{\theta }\stackrel{p}{\to }{\theta }_{0}$ . The Theorem 3.1 given by Pakes and Pollard  (p 1038-1039) is an elegant theorem using the norm concept of functional analysis. Now we turn our attention to the question of asymptotic normality for the quadratic distance estimators and it is possible to have unified approach using their Theorem 3.3, see Pakes and Pollard  (p 1040-1043) where we shall restate their Theorem as Theorem 2 and Corollary 1 given subsequently after the following discussions on the ideas behind their theorem, allowing us to get asymptotic normality results for estimators obtained from extremum of a smooth or nonsmooth objective function.

Note that ${G}_{n}\left(\theta \right)\stackrel{p}{\to }G\left(\theta \right)$ (16)

with

$G\left(\theta \right)={\left(S\left({s}_{1},{t}_{1}\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{1}\right),\stackrel{¯}{G}\left({t}_{1}\right)\right),\cdots ,S\left({s}_{M},{t}_{M}\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{M}\right),\stackrel{¯}{G}\left({t}_{M}\right)\right)\right)}^{\prime }$ . (17)

The points ${\left({s}_{1},{t}_{1}\right)}^{\prime },\cdots ,{\left({s}_{M},{t}_{M}\right)}^{\prime }$ are predetermined by a contingency table we give and we have no choice but to analyze the grouped data as they are presented.

Note that $G\left(\theta \right)$ is non-random and if we assume $G\left(\theta \right)$ is differentiable with repect to with derivative matrix $\Gamma \left(\theta \right)$ , then we can define the random function ${Q}_{n}^{a}\left(\theta \right)$ to approximate ${Q}_{n}\left(\theta \right)$ with

${Q}_{n}^{a}\left(\theta \right)={\left(‖{L}_{n}\left(\theta \right)‖\right)}^{2},\text{\hspace{0.17em}}{L}_{n}\left(\theta \right)={G}_{n}\left({\theta }_{0}\right)+\Gamma \left(\theta \right)\left(\theta -{\theta }_{0}\right)$ . (18)

By using $\frac{\partial {C}_{\theta }\left(u,v\right)}{\partial {\theta }_{j}}$ which is the partial derivative of ${C}_{\theta }\left(u,v\right)$ with repect to ${\theta }_{j},j=1,\cdots ,m$ , the matrix $\Gamma \left(\theta \right)$ can be displayed explicitly as

$\Gamma \left(\theta \right)=-\left[\begin{array}{ccc}\frac{\partial {C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{1}\right),\stackrel{¯}{G}\left({t}_{1}\right)\right)}{\partial {\theta }_{1}}& \cdots & \frac{\partial {C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{1}\right),\stackrel{¯}{G}\left({t}_{1}\right)\right)}{\partial {\theta }_{m}}\\ ⋮& \ddots & ⋮\\ \frac{\partial {C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{M}\right),\stackrel{¯}{G}\left({t}_{M}\right)\right)}{\partial {\theta }_{1}}& \cdots & \frac{\partial {C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{M}\right),\stackrel{¯}{G}\left({t}_{M}\right)\right)}{\partial {\theta }_{m}}\end{array}\right]$ . (19)

Note that ${Q}_{n}^{a}\left(\theta \right)$ is differentiable and a quadratic function of $\theta$ , the vector ${\theta }^{*}$ which minimizes ${Q}_{n}^{a}\left(\theta \right)$ can be obtained explicitly with

${\theta }^{*}-{\theta }_{0}=-{\left({\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\Gamma \right)}^{-1}{\Gamma }^{\prime }\stackrel{^}{{W}_{0}}{G}_{n}\left({\theta }_{0}\right)$ (20)

and since $\stackrel{^}{{W}_{0}}\stackrel{p}{\to }{W}_{0}$ . ${W}_{0}$ is assumed to be a positive define matrix; we have

$\begin{array}{c}\sqrt{n}\left({\theta }^{*}-{\theta }_{0}\right)=-{\left({\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\Gamma \right)}^{-1}{\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)\\ =-{\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }{W}_{0}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)+{o}_{p}\left(1\right).\end{array}$ (21)

Clearly set up fits into the scopes of their Theorem 3.3 where we shall rearrange the results to make them more suitable for MQD methods and verify that we can satisfy the regularity conditions of Theorem 3.3. We shall state Theorem 2 and Corollary 1 below which are essentially their Theorem (3.3) and the proofs have been given by Pakes and Pollard  . Note that the condition 4) is slightly more stringent but simpler to check than the condition 3) in their Theorem.

Theorem 2

Let $\stackrel{^}{\theta }$ be a vector of consistent estimators for ${\theta }_{0}$ , the unique vector which satisfies $G\left({\theta }_{0}\right)=0$ .

Under the following conditions:

1) The parameter space Ω is compact, $\stackrel{^}{\theta }$ is an interior point of Ω.

2) $‖{G}_{n}\left(\stackrel{^}{\theta }\right)‖\le {o}_{p}\left({n}^{-\frac{1}{2}}\right)+{\mathrm{inf}}_{\theta \in \Omega }‖{G}_{n}\left(\theta \right)‖$

3) $G\left(.\right)$ is differentiable at ${\theta }_{0}$ with a derivative matrix $\Gamma =\Gamma \left({\theta }_{0}\right)$ of full rank.

4) ${\mathrm{sup}}_{‖\theta -{\theta }_{0}‖\le {\delta }_{n}}\sqrt{n}‖{G}_{n}\left(\theta \right)-G\left(\theta \right)-{G}_{n}\left({\theta }_{0}\right)‖={o}_{p}\left(1\right)$ for every sequence $\left\{{\delta }_{n}\right\}$ of positive numbers which converge to zero.

5) $‖{G}_{n}\left({\theta }_{0}\right)‖={o}_{p}\left(1\right)$ .

6) ${\theta }_{0}$ is an interior point of Ω.

Then, we have the following representation which will give the asymptotic distribution of $\stackrel{^}{\theta }$ in Corollary 1, i.e.,

$\sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right)=-{\left({\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\Gamma \right)}^{-1}{\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)+{o}_{p}\left(1\right)$ , (21)

or equivalently, using equality in distribution,

$\sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right){=}^{d}-{\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}\sqrt{n}{\Gamma }^{\prime }{W}_{0}{G}_{n}\left({\theta }_{0}\right)$ (22)

or equivalently,

$\sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right){=}^{d}-{\left({\Gamma }^{\prime }\stackrel{^}{{W}_{0}}\Gamma \right)}^{-1}\sqrt{n}{\Gamma }^{\prime }\stackrel{^}{{W}_{0}}{G}_{n}\left({\theta }_{0}\right)$ (23)

The proofs of these results follow the results used to prove Theorem 3.3 given by Pakes and Pollard  (p 1040-1043). For expression (22) or expression (23) to hold, in general only condition 5) of Theorem 2 is needed and there is no need to assume that ${G}_{n}\left({\theta }_{0}\right)$ has an asymptotic distribution. From the results of Theorem 2, it is easy to see that we can obtain the main result of the following Corollary 1 which gives the asymptotic covariance matrix for the quadratic distance estimators for both versions.

Corollary 1

Let ${Y}_{n}=\sqrt{n}{\Gamma }^{\prime }{W}_{0}{G}_{n}\left({\theta }_{0}\right)$ , if ${Y}_{n}\stackrel{L}{\to }N\left(0,V\right)$ then $\sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right)\stackrel{L}{\to }N\left(0,T\right)$ with

$T={\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}V{\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}$ , (24)

The matrices $T$ and $V$ depend on ${\theta }_{0}$ , we also adopt the notations $T=T\left({\theta }_{0}\right),V=V\left({\theta }_{0}\right)$ .

We observe that when applying condition 4) of Theorem 2 to MQD methods in general involves technicalities. Note that to verify the condition 4, it is equivalent to verify

${\mathrm{sup}}_{‖\theta -{\theta }_{0}‖\le {\delta }_{n}}n{\left(‖{G}_{n}\left(\theta \right)-G\left(\theta \right)-{G}_{n}\left({\theta }_{0}\right)‖\right)}^{2}={o}_{p}\left(1\right)$ , (25)

a regularity condition for the approximation is of the right order which implies the condition 3 given by their Theorem 3.3, which might be the most difficult to check. The rest of the conditions for Theorem 2 are satisfied in general.

Let

${g}_{n}\left(\theta \right)=n{\left(‖{G}_{n}\left(\theta \right)-G\left(\theta \right)-{G}_{n}\left({\theta }_{0}\right)‖\right)}^{2}$ (26)

and define ${u}_{n}\left(\theta \right)={G}_{n}\left(\theta \right)-G\left(\theta \right)-{G}_{n}\left({\theta }_{0}\right)$ which can be expressed as

$\begin{array}{c}{u}_{n}\left(\theta \right)=\left({C}_{{\theta }_{0}}\left(\stackrel{¯}{{F}_{n}}\left({s}_{1}\right),\stackrel{¯}{{G}_{n}}\left({t}_{1}\right)\right)-S\left({s}_{1},{t}_{1}\right),\cdots ,{C}_{{\theta }_{0}}\left(\stackrel{¯}{{F}_{n}}\left({s}_{M}\right),\stackrel{¯}{{G}_{n}}\left({t}_{M}\right)\right)-S\left({s}_{M},{t}_{M}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}},\cdots ,{C}_{\theta }\left(\stackrel{¯}{{F}_{n}}\left({s}_{1}\right),\stackrel{¯}{{G}_{n}}\left({t}_{1}\right)\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{1}\right),\stackrel{¯}{G}\left({t}_{1}\right)\right),\cdots ,\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{C}_{\theta }\left(\stackrel{¯}{{F}_{n}}\left({s}_{M}\right),\stackrel{¯}{{G}_{n}}\left({t}_{M}\right)\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{M}\right),\stackrel{¯}{G}\left({t}_{M}\right)\right)\right).\end{array}$ (27)

Consequently, ${g}_{n}\left(\theta \right)$ can also be expressed as

${g}_{n}\left(\theta \right)=n{{u}^{\prime }}_{n}\left(\theta \right)\stackrel{^}{{W}_{0}}{{u}^{\prime }}_{n}\left(\theta \right)$ .

Since the elements of $\sqrt{n}{{u}^{\prime }}_{n}\left(\theta \right)$ are bounded in probability, it is not difficult to see that the sequence $\left\{{g}_{n}\left(\theta \right)\right\}$ is bounded in probability and continuous in probability with ${g}_{n}\left(\theta \right)\stackrel{p}{\to }{g}_{n}\left({\theta }^{\prime }\right)$ as $\theta \to {\theta }^{\prime }$ . Also note that ${g}_{n}\left({\theta }_{0}\right)=0$ . Therefore, results given in section of Luong et al.  (p 218) can be used to justify the sequence of functions. $\left\{{g}_{n}\left(\theta \right)\right\}$ attains its maximum on the compact set ${C}_{n}=\left\{\theta ‖\theta -{\theta }_{0}‖\le {\delta }_{n}\right\}$ in probability and hence has the property ${\mathrm{sup}}_{‖\theta -{\theta }_{0}‖\le {\delta }_{n}}{g}_{n}\left(\theta \right)\stackrel{p}{\to }0$ as $n\to \infty$ and $\theta \to {\theta }_{0}$ .

Since $\sqrt{n}{G}_{n}\left({\beta }_{0}\right)\stackrel{p}{\to }N\left(0,{W}_{0}^{-1}\right)$ .

Using results of Corollary 1, we have asymptotic normality for the MQD estimators which is given by

$\sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right)\stackrel{L}{\to }N\left(0,{\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}\right)$ , (28)

$\Gamma$ as given by expression (19) can be estimated once the parameters are estimated.

3.2. Model Testing

3.2.1. Simple Hypothesis

In this section, the quadratic distance ${Q}_{n}\left(\theta \right)$ will be used to construct goodness of fit test statistics for the simple hypothesis

H0: data coming from a specified distribution with distribution ${F}_{{\theta }_{0}}$ , ${\theta }_{0}$ is specified. The chi-square test statistic with its chi-square asymptotic distribution and its degree of freedom are given below, i.e.,

$n{Q}_{n}\left({\theta }_{0}\right)\stackrel{L}{\to }{\chi }^{2}\left(r=M\right)$ . (29)

It is not difficult to see that indeed we have the above asymptotic chi-square distribution as $n{Q}_{n}\left({\theta }_{0}\right)=\sqrt{n}{{G}^{\prime }}_{n}\left({\theta }_{0}\right)\stackrel{^}{{W}_{0}}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)$ and

$\sqrt{n}{G}_{n}\left({\theta }_{0}\right)\stackrel{L}{\to }N\left(0,{W}_{0}^{-1}\right)$ , ${W}_{0}^{-1}={\Omega }_{0},\text{\hspace{0.17em}}\stackrel{^}{{W}_{0}}\stackrel{p}{\to }{W}_{0}$ , using standard results for distribution of quadratic forms, see Luong and Thompson  (p 247) for example.

3.2.2. Composite Hypothesis

The quadratic distances ${Q}_{n}\left(\theta \right)$ can also be used for construction of the test satistics for the composite hypothesis

H0: data comes from a parametric model $\left\{{S}_{\theta }\right\}$ .The chi-square test statistic and its asymptotic distribution are given similarly in this case by

$n{Q}_{n}\left(\stackrel{^}{\beta }\right)\stackrel{L}{\to }{\chi }^{2}\left(r=M-m\right)$ , (30)

with $M>m$ .To justify the asymptotic chi-square distribution given above, note that we have the equality in probability, $n{Q}_{n}\left(\theta \right)=n{Q}_{n}^{a}\left(\stackrel{^}{\theta }\right)+{o}_{p}\left(1\right)$ . It suffices to consider the asymptotic distribution of $n{Q}_{n}^{a}\left(\stackrel{^}{\theta }\right)$ as we also have the following equalities in distribution,

$n{Q}_{n}\left(\stackrel{^}{\theta }\right){=}^{d}n{Q}_{n}^{a}\left(\stackrel{^}{\theta }\right)=n{‖{L}_{n}\left(\stackrel{^}{\theta }\right)‖}^{2}=\sqrt{n}{{L}^{\prime }}_{n}\left(\stackrel{^}{\theta }\right)\stackrel{^}{{W}_{0}}\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right)$ ,

${L}_{n}\left(\theta \right)$ as given by expression. Therefore we also have the following equalities in distribution, $\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right){=}^{d}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)+\Gamma \sqrt{n}\left(\stackrel{^}{\theta }-{\theta }_{0}\right)$ which can be reexpressed as

$\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right){=}^{d}\sqrt{n}{G}_{n}\left({\theta }_{0}\right)-\Gamma {\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }{W}_{0}\sqrt{n}{G}_{n}\left(\theta 0\right)$

or equivalently, $\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right){=}^{d}\left(I-\Gamma {\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }{W}_{0}\right)\sqrt{n}{G}_{n}\left({\theta }_{0}\right)$ with $\sqrt{n}{G}_{n}\left({\theta }_{0}\right)\stackrel{L}{\to }N\left(0,{W}_{0}^{-1}\right)$ .

We have

$\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right)\stackrel{L}{\to }N\left(0,\Sigma \right)$ ,

$\Sigma =\left(I-\Gamma {\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }{W}_{0}\right){W}_{0}^{-1}\left(I-{W}_{0}\Gamma {\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }\right)$ (31)

and note that $\Sigma {W}_{0}=B$ and the trace of the matrix $B=I-\Gamma {\left({\Gamma }^{\prime }{W}_{0}\Gamma \right)}^{-1}{\Gamma }^{\prime }{W}_{0}$ is $trace\left(B\right)=M-m$ ; the rank of the matrix $B$ is also equal to its trace using the techniques as given by Luong and Thompson  (p 248-249).

4. Estimation and Model Testing Using Complete Data

4.1. Preliminaries

In Section 4.1 and Section 4.2, we shall define a rule of selecting the points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ if complete data are available. Selecting points is equivalent to define the cells used to group the data and we shall see that random cells will be used as the points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ constructed using Quasi-Monte Carlo (QMC) numbers on the unit square multiplied by two chosen sample quantiles from the two marginal distributions will be used. They are random and can be viewed as sample points on the nonnegative quadrant forming an artificial sample. For minimum chi-square methods it appears to be difficult to have a rule to choose cells to group the data, see discussions by Greenwood and Nikulin  (p 194-208). We need a few preliminary notions tools and define sample quantiles then statistics can be viewed as functionals of the sample distribution; the notion of influence function is also introduced and this useful tool will be used to find their asymptotic variance of the functional.

We shall define the pth sample quantile of a distribution as we shall need two sample quantiles from the marginal distributions together with QMC numbers to construct an approximation of an integral. Our quadratic distance based on selected points can be viewed as an approximation of a continuous version given by an integral as given by expression (33).

From a bivariate distribution we have two marginal distributions $F\left(x\right)$ and $G\left(y\right)$ . The univariate sample pth quantile of the distribution $F\left(x\right)$ assumed to be continuous is based the sample distribution function

${F}_{n}\left(x\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}\le x\right]$ and it is defined to be ${\alpha }_{p}^{\left(n\right)}=\mathrm{inf}\left\{{F}_{n}\left(x\right)\ge p\right\}$ and its model counterpart is given by ${\alpha }_{p}=\mathrm{inf}\left\{F\left(x\right)\ge p\right\}$ . We also use the notation ${\alpha }_{p}^{\left(n\right)}={F}_{n}^{-1}\left(p\right)$ and ${\alpha }_{p}={F}^{-1}\left(p\right)$ . We define similarly the qth sample quantile for the distribution $G\left(y\right)$ as ${\beta }_{q}^{\left(n\right)}={G}_{n}^{-1}\left(q\right)$ and its model counterpart ${\beta }_{q}={G}^{-1}\left(q\right)$ with $0 .

The sample survival function is defined as

$\stackrel{¯}{{F}_{n}}\left(x\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}>x\right]=1-{F}_{n}\left(x\right)$

The sample quantile functions ${\alpha }_{p}^{\left(n\right)}$ or ${\beta }_{q}^{\left(n\right)}$ can be viewed as statistical functionals of the form $T\left({H}_{n}\right)$ with ${H}_{n}={F}_{n}$ or ${H}_{n}={G}_{n}$ . The influence function of $T\left({H}_{n}\right)$ is a valuable tool to study the asymptotic properties of the statistical functional and will be introduced below. Let H be the true distribution and ${H}_{n}$ is the usual empirical distribution which estimates H; also let ${\delta }_{x}$ be the degenerate distribution at x, i.e., ${\delta }_{x}\left(u\right)=1$ if $u\ge x$ and ${\delta }_{x}\left(u\right)=0$ , otherwise; the influence function of T viewed as a function of x, $I{C}_{T,H}\left(x\right)$ is defined as a functional directional derivative at H in the direction of $\left({\delta }_{x}-H\right)$ . Letting ${H}_{\epsilon }=H+\epsilon \left({\delta }_{x}-H\right)$ , $I{C}_{T,H}\left(x\right)$ is defined as

$I{C}_{T,H}\left(x\right)={\mathrm{lim}}_{\epsilon \to 0}\frac{T\left({H}_{\epsilon }\right)-T\left(H\right)}{\epsilon }={{T}^{\prime }}_{H}\left({\delta }_{x}-H\right)$ and ${{T}^{\prime }}_{H}$ is a linear functional.

Alternatively, it is easy to see that $I{C}_{T,H}\left(x\right)={\frac{\partial {H}_{\epsilon }}{\partial \epsilon }|}_{\epsilon =0}$ and this gives a convenient way to compute the influence function. It can be shown that the influence function of the pth sample quantile $T\left({H}_{n}\right)$ is given by

$I{C}_{T,H}\left(x\right)=\frac{p-1}{h\left({H}^{-1}\left(p\right)\right)},x<{H}^{-1}\left(p\right)$ and $I{C}_{T,H}\left(x\right)=\frac{p}{h\left({H}^{-1}\left(p\right)\right)},x>{H}^{-1}\left(p\right)$

with h being the density function of the distribution H which is assumed to be absolutely continuous, see Huber  (p 56), Hogg et al.  (p 593). A statistical functional with bounded influence function is considered to be robust, B-robust and consequently the pth sample quantile is robust. The sample quantiles are robust statistics.

Furthermore, as $I{C}_{T,H}\left(x\right)$ is based on a linear functional, the asymptotic variance of $T\left({H}_{n}\right)$ is simply $\frac{1}{n}V\left(I{C}_{T,H}\left(x\right)\right)$ with $V\left(.\right)$ being the variance of

the expression inside the bracket since in general we have $E\left(I{C}_{T,H}\left(x\right)\right)=0$ and we have following representation when $I{C}_{T,H}\left(x\right)$ is bounded as a function of $x$ ,

$T\left({H}_{n}\right)=T\left(H\right)+{{T}^{\prime }}_{F}\left({H}_{n}-H\right)+{o}_{p}\left(1n\right)$

and ${{T}^{\prime }}_{F}\left({H}_{n}-H\right)=\frac{1}{n}{\sum }_{i=1}^{n}{{T}^{\prime }}_{F}\left({\delta }_{{x}_{i}}-H\right)$ ,

${{T}^{\prime }}_{F}\left({\delta }_{{x}_{i}}-H\right)=I{C}_{T,H}\left({x}_{i}\right)$ , see Hogg et al.  (p 593). Consequently, in general we have for bounded influence functional with the use of means of central limit theorems (CLT) the following convergence in distribution

$\sqrt{n}\left(T\left({H}_{n}\right)-T\left(H\right)\right)\stackrel{L}{\to }N\left(0,{\sigma }_{IC}^{2}\right)$ , ${\sigma }_{IC}^{2}=V\left(I{C}_{T,H}\left(x\right)\right)$ .

The influence function representation of a functional which depends only on one function such as ${H}_{n}$ is the equivalence of a Taylor expansion of a univariate function and the influence function representation of a functional which depends on many functions is the equivalence of a Taylor expansion of a multivariate function with domain in an Euclidean space and having range being the real line. Since we work with marginal survival functions, we define the pth sample quantiles of the marginals survival functions as

${\stackrel{¯}{F}}_{n}^{-1}\left(p\right)={F}_{n}^{-1}\left(1-p\right),\text{\hspace{0.17em}}{\stackrel{¯}{G}}_{n}^{-1}\left(p\right)={G}_{n}^{-1}\left(1-p\right)$ .

The influence functions for ${\stackrel{¯}{F}}_{n}^{-1}\left(p\right)$ and ${\stackrel{¯}{G}}_{n}^{-1}\left(p\right)$ can be derived using the definitions of influence functions or obtained from the influence functions of ${F}_{n}^{-1}\left(1-p\right)$ and ${G}_{n}^{-1}\left(1-p\right)$ .

Subsequently, we shall introduce the Halton sequences with the bases ${b}_{1}=2$ and ${b}_{2}=3$ and the first M terms are denoted by

$\left({u}_{l},{v}_{l}\right)=\left({\phi }_{{b}_{1}}\left(l\right),{\phi }_{{b}_{2}}\left(l\right)\right),l=1,2,\cdots ,M$ .

We also use ${H}_{M}$ to denote set of points $\left\{\left({u}_{l},{v}_{l}\right),l=1,2,\cdots ,M\right\}$ . The sequence of points belong to the unit square $\left(0,1\right)×\left(0,1\right)$ can be obtained as follows.

For ${b}_{1}=2$ , we divide the interval $\left(0,1\right)$ into half ( ${b}_{1}=2$ ) then in fourth

( ${b}_{1}^{2}={2}^{2}$ ) so forth so on to obtain the sequence $\frac{1}{2},\frac{1}{4},\frac{3}{4},\cdots$ .

For ${b}_{2}=3$ , we divide the interval into third ( ${b}_{2}=3$ ) then in ninth ( ${b}_{2}^{2}={3}^{2}$ ) so forth so on to obtain the sequence $\frac{1}{3},\frac{2}{3},\frac{1}{9},\cdots$ . Now pairing them up we obtain the Halton sequence $\left(\frac{1}{2},\frac{1}{3}\right),\left(\frac{1}{4},\frac{2}{3}\right),\left(\frac{3}{4},\frac{1}{9}\right),\cdots$ . Matlab and R have packages to generate the sequences and see Glaserman  (p 293-297) for the related pseudo codes; also see the seminal paper by Halton  ; for the general principles of QMC methods, see Glasserman  (p 281-292). The Halton sequence together with two chosen sample quantiles from the two marginal distributions will allow us to choose points to match the bivariate empirical survival function with its model counterpart as we shall have an artificial sample with values on the nonnegative quadrant with the use of two empirical quantiles from the marginal distributions. These points can be viewed as sample points from an artificial sample and since they depend on quantiles which are robust, the artificial sample can be viewed as free of outliers and the methods which make use of them will be robust.

Note that the Halton sequence of numbers are deterministic and useful for approximating an integral, if we would like to compute numerically an integral of the form $A={\int }_{0}^{1}{\int }_{0}^{1}\psi \left(x,y\right)\text{d}x\text{d}y$ with $\psi \left(x,y\right)$ being a bivariate function. Using the M terms of the Halton sequence and QMC principles, it can be approximated as

$A\approx \frac{1}{M}{\sum }_{l=1}^{M}\psi \left({s}_{l},{t}_{l}\right)$ . (32)

but if we are used to integration by simulation we might want to think the M terms represent a quasi random sample of size M from a bivariate uniform distribution which is useful for approximating A.

From observations which are given by ${Z}_{i}={\left({X}_{i},{Y}_{i}\right)}^{\prime },i=1,\cdots ,n$ iid with common bivariate survival distribution $S\left(x,y\right)$ . Let the two marginal survival functions be denoted by $\stackrel{¯}{F}\left(x\right)$ and $\stackrel{¯}{G}\left(y\right)$ and they are absolutely continuous by assumption; also define the bivariate empirical distribution function which is similar to the bivariate empirical survival function as

${K}_{n}\left(x,y\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}\le x,{y}_{i}\le y\right]$ .

The two empirical marginal survival functions are defined respectively by

${\stackrel{¯}{F}}_{n}\left(x\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{x}_{i}>x\right]$ and ${\stackrel{¯}{G}}_{n}\left(y\right)=\frac{1}{n}{\sum }_{i=1}^{n}I\left[{y}_{i}>y\right]$ .

We might want to think that we would like to approximate the following Cramer-Von Mises distance expressed as an integral given by

${\int }_{0}^{\infty }{\int }_{0}^{\infty }{\left({S}_{n}\left(x,y\right)-{\stackrel{^}{S}}_{\theta }\left(x,y\right)\right)}^{2}\text{d}{K}_{n}\left(x,y\right)$ (33)

which is similar to univariate Cramér-Von Mises (CVM) distance and minimizing the distance with respect to $\theta$ will give the CVM estimator for $\theta$ , see Luong and Blier-Wong  for CVM estimation for example.

In the next section we shall give details on how to form a type of quasi sample or artificial sample of size M using the terms of the Halton sequence of M terms and the two sample quantiles of the marginal distributions F and G or equivalently using the corresponding empirical function quantiles as discuss earlier and this will allow us to define the sequence $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ so that the above integral can be approximated by the following finite sum of the type of an average of M terms

$\frac{1}{M}{\sum }_{l=1}^{M}{\left({S}_{n}\left({s}_{l},{t}_{l}\right)-{\stackrel{^}{S}}_{\theta }\left({s}_{l},{t}_{l}\right)\right)}^{2}$ . (34)

We can see the expression (34) is an unweighted quadratic distance using the identity matrix $I$ as weight matrix instead of $\stackrel{^}{{W}_{0}}$ . The unweighted quadratic distance still produces consistent estimators but possibly less efficient estimators than estimators using the quadratic distance with $\stackrel{^}{{W}_{0}}$ for large samples and for finite samples the estimators obtained using $I$ might still have reasonable performances and yet being simple to obtained.

The set of points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ is a set of points proposed to be used to form optimum quadratic distances in case that complete data is available. We shall see the set of points depend on two quantiles chosen from the two marginal distributions and they are random consequently. We might want to think that we end up working with random cells.

As for the minimum chi-square methods if random cells stabilize into fixed cells minimum chi-square methods in general have the same efficiency as based on stabilized fixed cells, see Pollard  (p 324-326) and Moore and Spruill  for the notion of random cells; quadratic distance methods will share the same properties. The chosen points are random but it will be shown that they do stabilize and therefore these random points can be viewed as fixed at stabilized points and despite that they are random, it does not affect efficiencies of the estimators or asymptotic distributions of goodness-of-fit test statistics which make use of them. These properties will be further discussed and studied in more details in the next section along with the introduction of an artificial sample of size M given by the points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ on the nonegative quadrant which give us a guideline on how to choose points if complete data is available.

4.2. Halton Sequences and an Artificial Sample

From the M terms of the Halton sequences, we have $\left({u}_{l},{v}_{l}\right),l=1,\cdots ,M$ .

Let $\eta =\frac{1}{\mathrm{max}\left({u}_{l},l=1,\cdots ,M\right)}$ and $\varrho =\frac{1}{\mathrm{max}\left({v}_{l},l=1,\cdots ,M\right)}$ , we can form the artificial sample with elements given by $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ with ${s}_{l}=\eta {u}_{l}{\stackrel{¯}{F}}_{n}^{-1}\left(p\right),{t}_{l}=\varrho {v}_{l}{\stackrel{¯}{G}}_{n}^{-1}\left(p\right)$ with $0.01\le p\le 0.05$ . Note that we have the following relationships between empirical quantile based on distributions and survival functions with ${\stackrel{¯}{F}}_{n}^{-1}\left(0.01\right)={F}_{n}^{-1}\left(0.99\right)$ and ${\stackrel{¯}{G}}_{n}^{-1}\left(0.01\right)={G}_{n}^{-1}\left(0.99\right)$ .

We can view $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ being a form of quasi random sample on the nonnegative quadrant and these are the points proposed to be used in case of complete data is available. In general, we might want to choose $20\le M\le 35$ if ${M}^{2}\le n$ and if n is small we try to ensure $M\le \sqrt{n}$ . Consequently as $n\to \infty$ , M remains bounded. If $M>35$ , there might be difficulty to obtain the matrix ${\stackrel{^}{W}}_{0}$ as ${\stackrel{^}{\Omega }}_{0}$ might be nearly singular. In practice we tend to replace ${\stackrel{^}{W}}_{0}$ by a near optimum matrix ${\stackrel{^}{W}}_{0}$ obtained from ${\stackrel{^}{\Omega }}_{0}$ by regularizing the eigenvalues of ${\stackrel{^}{\Omega }}_{0}$ which might not be stable which causes the matrix to be nearly singular hence ${\stackrel{^}{W}}_{0}$ will not be available; see Section 5.1 for more discussions on these issues.

Since ${\stackrel{¯}{F}}_{n}^{-1}\left(p\right)\stackrel{p}{\to }{\stackrel{¯}{F}}^{-1}\left(p\right)$ and ${\stackrel{¯}{G}}_{n}^{-1}\left(p\right)\stackrel{p}{\to }{\stackrel{¯}{G}}^{-1}\left(p\right)$ , $\left({s}_{l},{t}_{l}\right)\stackrel{p}{\to }\left({s}_{l}^{0},{t}_{l}^{0}\right)$ with ${s}_{l}^{0}=\eta {u}_{l}{\stackrel{¯}{F}}^{-1}\left(p\right)$ and for $l=1,\cdots ,M$ and the points $\left({s}_{l}^{0},{t}_{l}^{0}\right),l=1,\cdots ,M$ are non-random or fixed.

It turns out that quadratic distances for both versions constructed with the points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ are asymptotic equivalent to quadratic distances using the points $\left({s}_{l}^{0},{t}_{l}^{0}\right),l=1,\cdots ,M$ so that asymptotic theory developed using the points $\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M$ considered to be fixed continue to be valid; we shall show indeed this is the case. Similar conclusions have been established for the minimum chi-square methods with the use of random cells provide that these cells stabilize to fixed cells, see Theorem 2 given by Pollard  (p 324-326). We shall define a few notations to make the arguments easier to follow.

Define $\left\{\left(s,t\right)\right\}=\left\{\left({s}_{l},{t}_{l}\right),l=1,\cdots ,M\right\}$ and similarly let

$\left\{\left({s}^{0},{t}^{0}\right)\right\}=\left\{\left({s}_{l}^{0},{t}_{l}^{0}\right),l=1,\cdots ,M\right\}$ .

We work with the quadratic distance defined using $\left\{\left(s,t\right)\right\}$ which leads to consider quadratic of the form ${‖{G}_{n}\left(\beta \right)‖}^{2}$ . Now to emphasize $\stackrel{^}{{z}_{n}}$ and $\stackrel{^}{{z}_{\theta }}$ which depend on $\left\{\left(s,t\right)\right\}$ , we also use respectively the notations $\stackrel{^}{{z}_{n}}=\stackrel{^}{{z}_{n}}\left(\left\{\left(s,t\right)\right\}\right)$ and $\stackrel{^}{{z}_{\theta }}=\stackrel{^}{{z}_{\theta }}\left(\left\{\left(s,t\right)\right\}\right)$ and define

${\stackrel{^}{z}}_{n}^{0}=\stackrel{^}{{z}_{n}}\left(\left\{\left({s}^{0},{t}^{0}\right)\right\}\right),\text{\hspace{0.17em}}{\stackrel{^}{z}}_{\theta }^{0}=\stackrel{^}{{z}_{\theta }}\left(\left\{\left({s}^{0},{t}^{0}\right)\right\}\right)$ .

It suffices to verify that results of Theorem 1, Theorem 2 and its corollary in Section 3 continue to hold.

Observe that we have

$\left(\stackrel{^}{{z}_{n}}-\stackrel{^}{{z}_{\theta }}\right)\stackrel{p}{\to }G\left(\theta \right)$ (35)

and

$\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{\theta }^{0}\right)\stackrel{p}{\to }G\left(\theta \right)$ (36)

$G\left(\theta \right)={\left(S\left({s}_{1}^{0},{t}_{1}^{0}\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{1}^{0}\right),\stackrel{¯}{G}\left({t}_{1}^{0}\right)\right),\cdots ,S\left({s}_{M}^{0},{t}_{M}^{0}\right)-{C}_{\theta }\left(\stackrel{¯}{F}\left({s}_{M}^{0}\right),\stackrel{¯}{G}\left({t}_{M}^{0}\right)\right)\right)}^{\prime }$ .

This also means that we have the same limit in probability for $\left(\stackrel{^}{{z}_{n}}-\stackrel{^}{{z}_{\theta }}\right)$ and $\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{\theta }^{0}\right)$ as we have $\left\{\left(s,t\right)\right\}\stackrel{p}{\to }\left\{\left({s}^{0},{t}^{0}\right)\right\}$ and ${S}_{n}\left(x,y\right)\stackrel{p}{\to }S\left(x,y\right)$ .

Clearly, $\stackrel{^}{{W}_{0}}\left(\left\{\left(s,t\right)\right\}\right)\stackrel{p}{\to }{W}_{0}\left({s}^{0},{t}^{0}\right)$ .

It remains to establish $\sqrt{n}\left(\stackrel{^}{{z}_{n}}-\stackrel{^}{{z}_{\theta }}\right)=\sqrt{n}\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{\theta }^{0}\right)+{o}_{p}\left(1\right)$ .

Using results on the influence functions representations for functionals as discussed, it suffices to show that the vector $\left(\stackrel{^}{{z}_{n}}-\stackrel{^}{{z}_{\theta }}\right)$ has the same influence representation as the vector $\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{\theta }^{0}\right)$ to conclude that all the asymptotic results are valid even $\left\{\left(s,t\right)\right\}$ are random.

We shall derive the influence functions for elements of the vector of functional $\left(\stackrel{^}{{z}_{n}}-\stackrel{^}{{z}_{\theta }}\right)$ and show that they are the same for the corresponding elements of the vector of functional $\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{\theta }^{0}\right)$ . Let $S\left(x,y\right)$ be the true bivariate survival function and under the parametric model being considered, $S\left(x,y\right)={C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left(x\right),\stackrel{¯}{G}\left(y\right)\right)$ and we also use the notation $C\left(u,v\right)={C}_{{\theta }_{0}}\left(u,v\right)$ .

Let ${\delta }_{x,y}^{S}\left(u,v\right)$ be the degenerate bivariate survival function at the point $\left(x,y\right)$ , i.e., ${\delta }_{x,y}^{S}\left(u,v\right)=1$ if $u and $v and ${\delta }_{x,y}^{S}\left(u,v\right)=0$ , otherwise.

Let the degenerate survival function at x be defined as ${\delta }_{x}^{S}\left(u\right)=1$ if $x>u$ and ${\delta }_{x}^{S}\left(u\right)=0$ , otherwise. Similarly, let the degenerate survival function at be defined as ${\delta }_{y}^{S}\left(v\right)=1$ if $y>v$ and ${\delta }_{y}\left(v\right)=0$ , otherwise. Now we can define the following contaminated bivariate survival and marginal survival functions,

${S}_{\epsilon }\left(u,v\right)=S\left(u,v\right)+\epsilon \left({\delta }_{x,y}^{S}\left(u,v\right)-S\left(u,v\right)\right),\text{\hspace{0.17em}}0\le \epsilon \le 1$

which is a contaminated bivariate survival function and

${\stackrel{¯}{F}}_{{\epsilon }_{1}}\left(u\right)=\stackrel{¯}{F}\left(u\right)+{\epsilon }_{1}\left({\delta }_{x}^{S}\left(u\right)-\stackrel{¯}{F}\left(u\right)\right),\text{\hspace{0.17em}}0\le {\epsilon }_{1}\le 1.$

Similarly for the marginals,

${\stackrel{¯}{G}}_{{\epsilon }_{2}}\left(v\right)=\stackrel{¯}{G}\left(v\right)+{\epsilon }_{2}\left({\delta }_{y}^{S}\left(v\right)-\stackrel{¯}{G}\left(v\right)\right),\text{\hspace{0.17em}}0\le {\epsilon }_{2}\le 1.$

Now, we consider $\left({\stackrel{^}{z}}_{jn}-{\stackrel{^}{z}}_{j{\theta }_{0}}\right)$ the jth element of $\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ , $\left({\stackrel{^}{z}}_{jn}-{\stackrel{^}{z}}_{j{\theta }_{0}}\right)={T}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right),j=1,\cdots ,M$ with each

${T}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right)={S}_{n}\left({s}_{j}\left({\stackrel{¯}{F}}_{n}\right),{t}_{j}\left(\stackrel{¯}{{G}_{n}}\right)\right)-{C}_{{\theta }_{0}}\left({\stackrel{¯}{F}}_{n}\left({s}_{j}\left({\stackrel{¯}{F}}_{n}\right)\right),\stackrel{¯}{{G}_{n}}\left({t}_{j}\left({G}_{n}\right)\right)\right),j=1,\cdots ,M.$

Clearly, ${T}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right)$ depend on ${S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}$ and

$\begin{array}{l}{T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)\\ ={S}_{\epsilon }\left({s}_{j}\left({\stackrel{¯}{F}}_{{\epsilon }_{1}}\right),{t}_{j}\left(\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)\right)-{C}_{{\theta }_{0}}\left({\stackrel{¯}{F}}_{{\epsilon }_{1}}\left({s}_{j}\left({\stackrel{¯}{F}}_{{\epsilon }_{1}}\right)\right),\stackrel{¯}{{G}_{n}}\left({t}_{j}\left(\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)\right)\right),j=1,\cdots ,M,\end{array}$

but we can use the influence function representation of ${T}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right)$ , a technique proposed by Reid  (p 80-81) but in this case it will need three influence functions which are given by

${\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial \epsilon }|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}=I\left[x>{s}_{j}^{0},y>{t}_{j}^{0}\right]-S\left({s}_{j}^{0},{t}_{j}^{0}\right)$

which is bounded with respect to $\left(x,y\right)$ ,

$\begin{array}{l}{\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{1}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}\\ =\frac{\partial S\left({s}_{j}^{0},{t}_{j}^{0}\right)}{\partial s}{\frac{\partial {s}_{j}\left(\stackrel{¯}{{F}_{{\epsilon }_{1}}}\right)}{\partial {\epsilon }_{1}}|}_{{\epsilon }_{1}=0}-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{j}^{0}\right),\stackrel{¯}{G}\left({t}_{j}^{0}\right)\right)}{\partial u}\left(\frac{\partial \stackrel{¯}{F}\left({s}_{j}^{0}\right)}{\partial s}{\frac{\partial {s}_{j}\left(\stackrel{¯}{{F}_{{\epsilon }_{1}}}\right)}{\partial {\epsilon }_{1}}|}_{{\epsilon }_{1}=0}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{j}^{0}\right),\stackrel{¯}{G}\left({t}_{j}^{0}\right)\right)}{\partial u}\left({\delta }_{x}^{S}\left({s}_{j}^{0}\right)-\stackrel{¯}{F}\left({s}_{j}^{0}\right)\right)\end{array}$

and the expression is reduced to

${\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{1}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}=-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{j}^{0}\right),\stackrel{¯}{G}\left({t}_{j}^{0}\right)\right)}{\partial u}\left({\delta }_{x}^{S}\left({s}_{j}^{0}\right)-\stackrel{¯}{F}\left({s}_{j}^{0}\right)\right)$

by noting the first two terms of the the RHS of the above expression cancel each other since we have $S\left(s,t\right)={C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left(s\right),\stackrel{¯}{G}\left(t\right)\right)$ which implies

$\frac{\partial S\left(s,t\right)}{\partial s}=\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left(s\right),\stackrel{¯}{G}\left(t\right)\right)}{\partial u}\frac{\partial \stackrel{¯}{F}\left(s\right)}{\partial s}$ .

Similarly,

${\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{2}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}=-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{j}^{0}\right),\stackrel{¯}{G}\left({t}_{j}^{0}\right)\right)}{\partial v}\left({\delta }_{y}^{S}\left({t}_{j}^{0}\right)-\stackrel{¯}{G}\left({t}_{j}^{0}\right)\right)$

If we compare with the corresponding jth term of $\left({\stackrel{^}{z}}_{n}^{0}-{\stackrel{^}{z}}_{{\theta }_{0}}^{0}\right)$ given by the functional ${G}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right)={S}_{n}\left({s}_{l}^{0},{t}_{l}^{0}\right)-{C}_{{\theta }_{0}}\left({\stackrel{¯}{F}}_{n}\left({s}_{l}^{0}\right),\stackrel{¯}{{G}_{n}}\left({t}_{l}^{0}\right)\right)$ , we can verify the functional ${G}_{j}\left({S}_{n},{\stackrel{¯}{F}}_{n},\stackrel{¯}{{G}_{n}}\right)$ has the same influence functions as the functional ${T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)$ . It is not difficult to see that we have the equalities

${\frac{\partial {G}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial \epsilon }|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}={\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial \epsilon }|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0},$

${\frac{\partial {G}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{1}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}={\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{1}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0},$

${\frac{\partial {G}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{2}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0}={\frac{\partial {T}_{j}\left({S}_{\epsilon },{\stackrel{¯}{F}}_{{\epsilon }_{1}}\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{2}}|}_{\epsilon ={\epsilon }_{1}={\epsilon }_{2}=0},j=1,\cdots ,M.$

Therefore, all the asymptotic results of Section 3 remain valid and all these influence functions are bounded so that inference methods making use of these functionals are robust in general. Furthermore, we can consider the inference procedures based on quadratic distances as we have non-random points $\left\{\left({s}^{0},{t}^{0}\right)\right\}$ if they can be replaced by $\left\{\left(s,t\right)\right\}$ without affecting the asymptotic results already established in Section 3. For more discussions on random cells and influence function techniques for minimum chi-square methods and related quadratic distance methods, see Luong  .

5. Numerical Issues and a Limited Study

5.1. Numerical Issues

In this section we shall consider the numerical problem of not being able to obtain the matrix ${\stackrel{^}{W}}_{0}$ as ${\stackrel{^}{\Omega }}_{0}$ might be nearly singular and we need to replace ${\stackrel{^}{W}}_{0}$ by a near optimum matrix ${\stackrel{˜}{W}}_{0}$ obtained from ${\stackrel{^}{\Omega }}_{0}$ .The techniques of regularizing a matrix have been introduced by Carrasco and Florens  (p 809-810) for GMM estimation with continuum moment conditions, MQD methods can be viewed as similar to GMM with a finite number of moment conditions and clearly the techniques can also be applied for MQD methods. We use the spectral decomposition of ${\stackrel{^}{\Omega }}_{0}$ to obtain its eigenvalues and eigenvectors, see Hogg et al.  (p 179) for the spectral decomposition of a symmetric positive definite matrix which allows us to express

${\stackrel{^}{\Omega }}_{0}={\sum }_{i=1}^{M}{\lambda }_{i}{v}_{i}{{v}^{\prime }}_{i}$

where the ${{\lambda }^{\prime }}_{i}s$ are positive eigenvalues with corresponding eigenvectors given by the ${{v}^{\prime }}_{i}s$ of the matrix ${\stackrel{^}{\Omega }}_{0}$ . Now, observe that

${\stackrel{^}{W}}_{0}={\stackrel{^}{\Omega }}_{0}^{-1}={\sum }_{i=1}^{M}\frac{1}{{\lambda }_{i}}{v}_{i}{{v}^{\prime }}_{i}$

is not obtainable numerically. It is due to the eigenvalues which are not stable, the regularization of ${\stackrel{^}{\Omega }}_{0}$ will lead to the following matrix which hopefully is obtainable and approximate ${\stackrel{^}{W}}_{0}$ . It consists of perturbing the ${{\lambda }^{\prime }}_{i}s$ by a small positive number a and define the approximate optimum matrix as

${\stackrel{˜}{W}}_{0}={\sum }_{i=1}^{M}\frac{|{\lambda }_{i}|}{{\lambda }_{i}^{2}+a}{v}_{i}{{v}^{\prime }}_{i},a>0$ .

Carrasco and Florens  (p 809-810) for GMM estimation with continuum moment conditions have shown that asymptotic theory remains unchanged if $a\to 0$ at a suitable rate as $n\to \infty$ . This condition is difficult to verify in practice. However, we might want to continue to use the asymptotic theory in an approximate sense, i.e., we can replace ${\stackrel{^}{W}}_{0}$ by ${\stackrel{˜}{W}}_{0}$ and view such a replacement does not modify the asymptotic theory in practice.

A more rigorous approach to justify the chi-square distribution for goodness of fit tests is to divide into 2 steps, first using ${\stackrel{˜}{W}}_{0}$ to construct the distance for estimation and letting $\stackrel{^}{\theta }$ be the vector which minimizes

${\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)}^{\prime }{\stackrel{˜}{W}}_{0}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\theta }\right)$ .

Using Equation (31) we have

$\sqrt{n}{L}_{n}\left(\stackrel{^}{\theta }\right)\stackrel{L}{\to }N\left(0,\Sigma \right)$ , $\stackrel{^}{\Sigma }=\left(I-\stackrel{^}{\Gamma }{\left({\stackrel{^}{\Gamma }}^{\prime }{\stackrel{˜}{W}}_{0}\stackrel{^}{\Gamma }\right)}^{-1}{\stackrel{^}{\Gamma }}^{\prime }{\stackrel{˜}{W}}_{0}\right){\stackrel{^}{\Omega }}_{0}{\left(I-\stackrel{^}{\Gamma }{\left({\stackrel{^}{\Gamma }}^{\prime }{\stackrel{˜}{W}}_{0}\stackrel{^}{\Gamma }\right)}^{-1}{\stackrel{^}{\Gamma }}^{\prime }{\stackrel{˜}{W}}_{0}\right)}^{\prime }$ ,

also see expression (3.4.2) given by Luong and Thompson  (p 248). The matrices $\stackrel{^}{\Sigma }$ and $\stackrel{^}{\Gamma }$ are respectively consistent estimates of $\Sigma$ and $\Gamma$ .

It suffices to find the Moore-Penrose ${\stackrel{^}{\Sigma }}^{-}$ generalized inverse of $\stackrel{^}{\Sigma }$ and construct the test statistics as

$n{\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\stackrel{^}{\theta }}\right)}^{\prime }{\stackrel{^}{\Sigma }}^{-}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{\stackrel{^}{\theta }}\right)$ .

The asymptotic distribution of the test statistics will be again chi-square with $M-m$ degree of freedom using distribution theory for quadratic forms, see Luong and Thompson  (p 247) for example and for generalized inverses, see Harville  (p 493-514).

Note that if ${\stackrel{^}{W}}_{0}$ can be used for estimation then we can let ${\stackrel{^}{\Sigma }}^{-}={\stackrel{^}{W}}_{0}$ , i.e. there is no need to use two quadratic distances separately.

5.2. A Limited Simulation Study

For the study, we fix the number of points $M=25$ . The two samples quantiles are 0.99 quantiles or 0.01 survival functions quantiles if marginal empirical survival functions are used instead of distribution functions for estimation without construction of goodness-of-fit tests. The points used are constructed using the procedures given in Section 4.2. We consider the one parameter MO copula model with

${C}_{\theta }\left(u,v\right)={u}^{1-\theta }v$ if $u\ge v$ and ${C}_{\theta }\left(u,v\right)=u{v}^{1-\theta }$ if $u\le v$ . (37)

${C}_{\theta }\left(u,v\right)$ is differentiable with respect to $\theta$ and ${C}_{\theta }\left(u,v\right)$ is singular if $u=v$ and $\theta \in \left(0,1\right)$ see Dobrowolski and Kumar  (p 2). For this model, the model

Spearman rho ${\rho }_{SP}=\frac{3{\theta }^{2}}{4\theta -{\theta }^{2}}=\frac{3\theta }{4-\theta }$ , see Dobrowolski and Kumar  (p 5).

The sample Spearman rho $\stackrel{^}{{\rho }_{SP}}$ is simply the Pearson correlation coefficient but computed using ranks of the observations from the two empirical marginal distributions, see Conover  (p 314-318).

If complete data are available, equating $\stackrel{^}{{\rho }_{SP}}={\rho }_{SP}$ gives the moment estimator

$\stackrel{˜}{\theta }=\frac{4\stackrel{^}{{\rho }_{SP}}}{3+\stackrel{^}{{\rho }_{SP}}}$ and one might expect that the moment estimator has reasonable efficiency as we only has one parameter in this model and the estimate is based on ranks.

The moment estimate can be used to compute ${\stackrel{^}{\Omega }}_{0}^{-1}=\stackrel{^}{{W}_{0}}$ which is needed for chi-square tests and for estimation using quadratic distances. We use $M=25$ and there is no problem on inverting the matrix ${\stackrel{^}{\Omega }}_{0}$ . Clearly if data is already grouped we can use the unweighted quadratic distance to provide a consistent preliminary estimate for ${\theta }_{0}$ . The efficient MQD estimator is denoted by $\stackrel{^}{\theta }$ . In the simulation study since we have so many marginal survival functions which can be used so we decide to draw observations directly from the Copula Models. This is not what happens in real life situation but we want to test the procedures. We do not have the computing resources for a large scale study and try various marginal survival functions. More works need to be done but we want to illustrate the procedures.

We use sample size $n=2000$ and the number of samples used is $N=100$ . For comparison of of MQD estimator $\stackrel{^}{\theta }$ versus Methods of moment (MM) estimator $\stackrel{˜}{\theta }$ we use the ratio of relative efficiency

$ARE=\frac{MSE\left(\stackrel{^}{\theta }\right)}{MSE\left(\theta ˜\right)}$

where the mean square error of an estimator $\stackrel{^}{\pi }$ for ${\pi }_{0}$ is defined as

$MSE\left(\stackrel{^}{\pi }\right)=E{\left(\stackrel{^}{\pi }-{\pi }_{0}\right)}^{2}$ ,

which can be estimated using M samples each of size n.

The unweighted QD estimator is denoted by $\stackrel{^}{{\theta }_{I}}$ as the identity matrix I is used for the unweighted quadratic distance. The corresponding

$ARE=\frac{MSE\left(\stackrel{^}{{\theta }_{I}}\right)}{MSE\left(\theta ˜\right)}$

can similarly be used for comparison and it can be estimated using simulated samples.

The range of parameter being considered is $\theta =0.1,0.2,\cdots ,0.9$ , the results are summarized using the first table of Table 1 where we find that the MM estimator and the two quadratic distance estimators have practically equal efficiency up to 4 or 5 decimal precisions.

To study the size of the chi-square tests and the power of the tests let H0: The MO copula model ${C}^{MO}$ with ${C}_{\theta }\left(u,v\right)$ as given by expression (37) and $\theta =\frac{1}{2}$ . With $\theta =\frac{1}{2}$ , ${\rho }_{SP}=\frac{3}{7}$ . Observations are drawn from the model specified by by ${H}_{a}$ which specifies the model is a contaminated one given by

$\left(1-\lambda \right){C}^{MO}\left(u,v\right)+\lambda {C}^{Gaussian}\left(u,v\right),\text{\hspace{0.17em}}0<\lambda <1.$

${C}^{MO}\left(u,v\right)$ is as defined earlier, ${C}^{Gaussian}\left(u,v\right)$ is the Gaussian copula defined as

${C}_{\rho }\left(u,v\right)={\int }_{-\infty }^{{ф}^{-1}\left(u\right)}{\int }_{-\infty }^{{ф}^{-1}\left(v\right)}\frac{1}{2\text{π}\sqrt{1-{\rho }^{2}}}\mathrm{exp}\left\{-\frac{1}{2\left(1-{\rho }^{2}\right)}\left({x}^{2}+{y}^{2}-2xy\right)\right\}\text{d}x\text{d}y$

with $\rho =0.5$ .

Procedures to simulate from Gaussian and MO copulas are given in chapter 6 by Ross  (p 97-108). We use $M=25$ and $M=35$ , the sample size

(a)Power study using M = 25 points, n = 3000 and the alternative hypothesis specified as the contaminated model $\left(1-\lambda \right){C}^{MO}\left(u,v\right)+\lambda {C}^{Gaussian}\left(u,v\right)$ , 0 < λ < 1.
(b)Critical point for the test using the 95th percentile of a chi-square distribution, χ 0.95 2 ( 24 ) = 36.41 .Power study using M = 35 points, n = 3000 and the alternative hypothesis specified as the contaminated model ( 1 − λ ) C M O ( u , v ) + λ C G a u s s i a n ( u , v ) , 0 < λ < 1.

(a)

Table 1. Asymptotic relative efficiencies comparisons for MQD estimators versus MM estimator using N = 1000 samples of size n = 1000 for the one parameter MO copula Model.

Critical point for the test using the 95th percentile of a chi-square distribution, ${\chi }_{0.95}^{2}\left(34\right)=48.60$ .

$n=3000$ and we use $N=30$ . Dobric and Schmid  (p 1060-1061) in their study have used $n=2500$ and their chi-square tests have around 70 degrees of freedom. With $M=35$ only occasionally that ${\stackrel{^}{\Omega }}_{0}$ is nearly singular, if this happens we discard the sample. We do not have resources for larger scale study; each run takes around three minutes to complete. As most of the time we are drawing observations using an alternative model but for testing we must estimate the parameter $\theta$ of the MO model, the algorithm tends to take time to converge. The study is very limited as the number of simulated samples is small with $N=30$ and only a few copula models are considered but it seems to point to the potential uses of MQD chi-square tests. The tests especially with $M=35$ seem to have power especially along some directions which can be represented as a mixture type of models as shown by the means and standard deviations of the chi-square statistics as displayed in the second and third table of Table 1. More simulation works are needed to assess the power of the MQD tests using various copula models. There are not many statistical procedures for copula models using data that have been already grouped. MQD methods might be useful for this type of situation.

6. Conclusions

Minimum Quadratic Distance Methods (MQD) offer a unified for estimation and model testing using grouped data under the form of a contingency table for parametric copula models without having to assume parametric models for the marginal distributions. The methods share with minimum chi-square methods by having a unique asymptotic distribution across the composite hypothesis for testing which make the implementations relatively simple without requiring extensive simulations for approximating the null asymptotic distribution. It is shown in this paper that if complete data are available, a rule to define points based on QMC numbers can be proposed to alleviate the arbitrariness on the choice of points to construct quadratic distances. The rule will also make quadratic distances close to Cramer-Von Mises distances. It is well known that in one dimension, chi-square tests cannot be consistent against all alternatives but if the intervals are chosen properly the tests still can have good power against some form of alternatives considered to be useful for applications.

MQD tests statistics with the rule of choosing points might preserve the same properties and by being relative simple to implement, they can be useful for applied works. More numerical and simulation works are needed for further study the power of the MQD tests.

Acknowledgements

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support form the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.

Appendices

Technical Appendix 1 (TA1)

In this technical appendix, we shall consider influence function representation for the vector of functionals $\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ to justify the expression. ${\Omega }_{0}\left(i,j\right)$ is as given by expression (9) in Section 3.2.

Let $\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)={\left({T}^{\left(1\right)}\left({S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}\right),\cdots ,{T}^{\left(M\right)}\left({S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}\right)\right)}^{\prime }$ ,

consider the l-th element of $\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ , it is given by

${T}^{\left(l\right)}\left({S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}\right)={S}_{n}\left({s}_{l},{t}_{l}\right)-{C}_{{\theta }_{0}}\left({\stackrel{¯}{F}}_{n}\left({s}_{l}\right),{\stackrel{¯}{G}}_{n}\left({t}_{l}\right)\right)$

which is a functional which depend on three functions ${S}_{n},\stackrel{¯}{{F}_{n}},\stackrel{¯}{{G}_{n}}$ but we still can applied the techniques given by Reid  (p 80) to have an influence representation of the functional. Since it depends on three functions we shall have three coresponding influence functions. Let ${S}_{\epsilon }=S+\epsilon \left({\delta }_{x,y}^{S}-S\right)$ with ${\delta }_{x,y}^{S}\left(u,v\right)=1$ if $u and $v and ${\delta }_{x,y}^{S}\left(u,v\right)=0$ , elsewhere; also, similarly let ${\stackrel{¯}{F}}_{\epsilon \text{1}}=\stackrel{¯}{F}+{\epsilon }_{\text{1}}\left({\delta }_{x}^{S}-\stackrel{¯}{F}\right)$ with ${\delta }_{x}^{S}\left(u\right)=1$ if $u and ${\delta }_{x}^{S}\left(u\right)=0$ , elsewhere and let ${\stackrel{¯}{F}}_{{\epsilon }_{2}}=\stackrel{¯}{F}+{\epsilon }_{2}\left({\delta }_{x}^{S}-\stackrel{¯}{F}\right)$ with ${\delta }_{y}^{S}\left(v\right)=1$ if $v and ${\delta }_{y}^{S}\left(v\right)=0$ , elsewhere, with $0\le \epsilon ,{\epsilon }_{1},{\epsilon }_{2}\le 1$ . Consequently,

${T}^{\left(l\right)}\left({S}_{\epsilon },\stackrel{¯}{{F}_{{\epsilon }_{1}}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)={S}_{\epsilon }\left({s}_{l},{t}_{l}\right)-{C}_{{\theta }_{0}}\left({\stackrel{¯}{F}}_{{\epsilon }_{1}}\left({s}_{l}\right),{\stackrel{¯}{G}}_{{\epsilon }_{2}}\left({t}_{l}\right)\right)$ and ${T}^{\left(l\right)}\left(S,\stackrel{¯}{F},\stackrel{¯}{G}\right)=0$ .

The three influence functions are given respectively by

${\frac{\partial {T}^{\left(l\right)}\left({S}_{\epsilon },\stackrel{¯}{{F}_{{\epsilon }_{1}}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial \epsilon }|}_{\epsilon =0,{\epsilon }_{1}=0,{\epsilon }_{2}=0}=I{C}_{1}^{\left(l\right)}\left(x,y\right)=I\left[x>{s}_{l},y>{t}_{l}\right]-S\left({s}_{l},{t}_{l}\right),$

$\begin{array}{l}{\frac{\partial {T}^{\left(l\right)}\left({S}_{\epsilon },\stackrel{¯}{{F}_{{\epsilon }_{1}}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{1}}|}_{\epsilon =0,{\epsilon }_{1}=0,{\epsilon }_{2}=0}\\ =I{C}_{2}^{\left(l\right)}\left(x\right)=-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{l}\right),\stackrel{¯}{G}\left({t}_{l}\right)\right)}{\partial u}\left(I\left[x>{s}_{l}\right]-\stackrel{¯}{F}\left({s}_{l}\right)\right),\end{array}$

$\begin{array}{l}{\frac{\partial {T}^{\left(l\right)}\left({S}_{\epsilon },\stackrel{¯}{{F}_{{\epsilon }_{1}}},\stackrel{¯}{{G}_{{\epsilon }_{2}}}\right)}{\partial {\epsilon }_{2}}|}_{\epsilon =0,{\epsilon }_{1}=0,{\epsilon }_{2}=0}\\ =I{C}_{3}^{\left(l\right)}\left(y\right)=-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{l}\right),\stackrel{¯}{G}\left({t}_{l}\right)\right)}{\partial v}\left(I\left[y>{t}_{l}\right]-\stackrel{¯}{G}\left({t}_{l}\right)\right).\end{array}$

Consequently, we have the influence representation for the l-th element of $\sqrt{n}\left({\stackrel{^}{z}}_{n}-{\stackrel{^}{z}}_{{\theta }_{0}}\right)$ with

$\begin{array}{c}\sqrt{n}\left({\stackrel{^}{z}}_{n,l}-{\stackrel{^}{z}}_{{\theta }_{0,l}}\right)=\frac{1}{\sqrt{n}}{\sum }_{i=1}^{n}I{C}_{1}^{\left(l\right)}\left({x}_{i},{y}_{i}\right)+\frac{1}{\sqrt{n}}{\sum }_{i=1}^{n}I{C}_{2}^{\left(l\right)}\left({x}_{i}\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{1}{\sqrt{n}}{\sum }_{i=1}^{n}I{C}_{3}^{\left(l\right)}\left({y}_{i}\right)+{o}_{p}\left(1\right)\end{array}$

and since ${\left({x}_{i},{y}_{i}\right)}^{\prime }$ are iid we have the equality in distribution asymptotically,

$\sqrt{n}\left({\stackrel{^}{z}}_{n,l}-{\stackrel{^}{z}}_{{\theta }_{0,l}}\right){=}^{d}I{C}_{1}^{\left(l\right)}\left(x,y\right)+I{C}_{2}^{\left(l\right)}\left(x\right)+I{C}_{3}^{\left(l\right)}\left(y\right),\text{\hspace{0.17em}}l=1,\cdots ,M.$

Equivalently, using vector notations we have the following equality in distribution asymptotically by letting

${{a}^{\prime }}_{l}=\left(1,-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{l}\right),\stackrel{¯}{G}\left({t}_{l}\right)\right)}{\partial u},-\frac{\partial {C}_{{\theta }_{0}}\left(\stackrel{¯}{F}\left({s}_{l}\right),\stackrel{¯}{G}\left({t}_{l}\right)\right)}{\partial v}\right)$ ,

${Y}_{l}=\left[\begin{array}{c}I\left[x>{s}_{l},y>{t}_{l}\right]-S\left({s}_{l},{t}_{l}\right)\\ I\left[x>{s}_{l}\right]-\stackrel{¯}{F}\left({s}_{l}\right)\\ I\left[y>{t}_{l}\right]>\stackrel{¯}{G}\left({t}_{l}\right)\end{array}\right]$ .

$\sqrt{n}\left({\stackrel{^}{z}}_{n,l}-{\stackrel{^}{z}}_{{\theta }_{0,l}}\right){=}^{d}{{a}^{\prime }}_{l}{Y}_{l}$ , a result which is needed in Section 3.2.

Technical Appendix 2 (TA2)

In this technical appendix, we shall justify the validity of expression (9) of Section 3.2.

The covariance matrix $Cov\left({Y}_{i},{Y}_{j}\right)$ is defined as $E\left({Y}_{i}{{Y}^{\prime }}_{j}\right)$ , the vector

${Y}_{i}=\left(\begin{array}{c}I\left[x>{s}_{i},y>{t}_{i}\right]-S\left({s}_{i},{t}_{i}\right)\\ I\left[x>{s}_{i}\right]-\stackrel{¯}{F}\left({s}_{i}\right)\\ I\left[y>{t}_{i}\right]-\stackrel{¯}{G}\left({t}_{i}\right)\end{array}\right)$ and

${{Y}^{\prime }}_{j}=\left(I\left[x>{s}_{j},y>{t}_{j}\right]-S\left({s}_{j},{t}_{j}\right),I\left[x>{s}_{j}\right]-\stackrel{¯}{F}\left({s}_{j}\right),I\left[y>{t}_{j}\right]-\stackrel{¯}{G}\left({t}_{j}\right)\right)$ .

Therefore the elements of the matrix $Cov\left({Y}_{i},{Y}_{j}\right)$ are given by

${c}_{11}=E\left(\left(I\left[x>{s}_{i},y>{t}_{i}\right]-S\left({s}_{i},{t}_{i}\right)\right)\left(I\left[x>{s}_{j},y>{t}_{j}\right]-S\left({s}_{j},{t}_{j}\right)\right)\right)$

${c}_{12}=E\left(\left(I\left[x>{s}_{i},y>{t}_{i}\right]-S\left({s}_{i},{t}_{i}\right)\right)\left(I\left[x>{s}_{j}\right]-\stackrel{¯}{F}\left({s}_{j}\right)\right)\right)$

${c}_{13}=E\left(\left(I\left[x>{s}_{i},y>{t}_{i}\right]-S\left({s}_{i},{t}_{i}\right)\right)\left(I\left[y>{t}_{j}\right]-\stackrel{¯}{G}\left({t}_{j}\right)\right)\right)$

${c}_{21}=E\left(\left(I\left[x>{s}_{i}\right]-\stackrel{¯}{F}\left({s}_{i}\right)\right)\left(I\left[x>{s}_{j},y>{t}_{j}\right]-S\left({s}_{j},{t}_{j}\right)\right)\right)$

${c}_{22}=E\left(\left(I\left[x>{s}_{i}\right]-\stackrel{¯}{F}\left({s}_{i}\right)\right)\left(I\left[x>{s}_{j}\right]-\stackrel{¯}{F}\left({s}_{j}\right)\right)\right)$

${c}_{23}=E\left(I\left[x>{s}_{i}\right]-\stackrel{¯}{F}\left({s}_{i}\right)\left(I\left[y>{t}_{j}\right]-\stackrel{¯}{G}\left({t}_{j}\right)\right)\right)$

${c}_{31}=E\left(\left(I\left[y>{t}_{i}\right]-\stackrel{¯}{G}\left({t}_{i}\right)\right)\left(I\left[x>{s}_{j},y>{t}_{j}\right]-S\left({s}_{j},{t}_{j}\right)\right)\right)$

${c}_{32}=E\left(\left(I\left[y>{t}_{i}\right]-\stackrel{¯}{G}\left({t}_{i}\right)\right)\left(I\left[x>{s}_{j}\right]-\stackrel{¯}{F}\left({s}_{j}\right)\right)\right)$

${c}_{33}=E\left(\left(I\left[y>{t}_{i}\right]-\stackrel{¯}{G}\left({t}_{i}\right)\right)\left(I\left[y>{t}_{j}\right]-\stackrel{¯}{G}\left({t}_{j}\right)\right)\right)$

Now, note that the above equalities which give the elements of the matrix $Cov\left({Y}_{i},{Y}_{j}\right)$ can be reexpressed as the equalities as given by expression (9) in Section 3.2.

Conflicts of Interest

The authors declare no conflicts of interest.

  Partrat, C. (1995) Compound Model for Two Dependent Kinds of Claims. Insurance: Mathematics and Economics, 15, 219-231. https://doi.org/10.1016/0167-6687(94)90796-X  Gibbons, J.D. and Chakraborti, S. (2011) Nonparametric Statistical Inference. 5th Edition, CRC Press, Boca Raton. https://doi.org/10.1007/978-3-642-04898-2_420  Klugman, S.A. and Parsa, A. (1999) Fitting Bivariate Distributions with Copulas. Insurance: Mathematics and Economics, 24, 139-148. https://doi.org/10.1016/S0167-6687(98)00039-0  Dobric, J. and Schmid, F. (2005) Testing Goodness of Fit for Parametric Families of Copulas—Applications to Financial Data. Communications in Statistics: Simulation and Computation, 34, 1053-1068. https://doi.org/10.1080/03610910500308685  Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: From Data to Decisions. Wiley, New York.  Shih, J.H. and Louis, T.A. (1995) Inferences on the Association Parameter in Copula Models for Bivariate Survival Data. Biometrics, 51, 1384-1399. https://doi.org/10.2307/2533269  Genest, C., Rémillard, B. and Beaudoin, D. (2009) Goodness-of-Fit Tests for Copulas: A Review and Power Study. Insurance: Mathematics and Economics, 44, 199-213. https://doi.org/10.1016/j.insmatheco.2007.10.005  Greenwood, P. and Nikulin, M.S. (1996) A Guide to Chi-Square Testing. Wiley, New York.  Lehmann, E.L. (1999) Elements of Large Sample Theory. Springer, New York. https://doi.org/10.1007/b98855  Dobrowolski, E. and Kumar, P. (2014) Some Properties of the Marshall-Olkin and Generalized Cuadras-Augé Families of Copulas. The Australian Journal of Mathematical Analysis and Applications, 11, 1-13.  Marshall, A.W. and Olkin, I. (1967) A Multivariate Exponential Distribution. Journal of the American Statistical Association, 62, 30-44. https://doi.org/10.1080/01621459.1967.10482885  Mai, J.-F. and Scherer, M. (2012) Simulating Copulas: Stochastic Models, Sampling Algorithms and Applications. Imperial College Press, London. https://doi.org/10.1142/p842  Ross, S.M. (2012) Simulations. 5th Edition, Elsevier, New York.  Klugman, S.A., Panjer, H.H. and Willmot, G.E. (2012) Loss Models: Further Topics. Wiley, New York.  Frees, E.W. and Valdez, E.A. (1998) Understanding Relationships Using Copulas. North American Actuarial Journal, 2, 1-25.  Luong, A. and Thompson, M.E. (1987) Minimum-Distance Methods Based on Quadratic Distances for Transforms. Canadian Journal of Statistics, 15, 239-251. https://doi.org/10.2307/3314914  Reid, N. (1981) Influence Functions for Censored Data. Annals of Statistics, 9, 78-92. https://doi.org/10.1214/aos/1176345334  Pakes, A. and Pollard, D. (1989) Simulation Asymptotic of Optimization Estimators. Econometrica, 57, 1027-1057. https://doi.org/10.2307/1913622  Luong, A., Bilodeau, C. and Blier-Wong, C. (2018) Simulated Minimum Hellinger Distance Inference Methods for Count Data. Open Journal of Statistics, 8, 187-219. https://doi.org/10.4236/ojs.2018.81012  Huber, P. (1981) Robust Statistics. Wiley, New York. https://doi.org/10.1002/0471725250  Hogg, R.V, McKean, J.W. and Craig, A.T. (2013) Introduction to Mathematical Statistics. Pearson, New York.  Glasserman, P. (2003) Monte Carlo Methods in Financial Engineering. Springer, New York. https://doi.org/10.1007/978-0-387-21617-1  Halton, J. (1960) On the Efficiency of Certain Quasi Random Sequences of Points in evaluating Multi-Dimensional Integrals. Numerische Mathematik, 2, 84-90. https://doi.org/10.1007/BF01386213  Luong, A. and Blier-Wong, C. (2017) Simulated Minimum Cramér-Von Mises Distance Estimation for Some Actuarial and Financial Models. Open Journal of Statistics, 7, 815-833. https://doi.org/10.4236/ojs.2017.75058  Pollard, D. (1979) General Chi-Square Goodness-of-Fit Tests with Data Dependent Cells. Probability and Related Fields, 50, 317-331. https://doi.org/10.1007/BF00534153  Moore, D.S. and Spruill, M.C. (1975) Unified Large-Sample Theory of General Chi-Squared Statistics for Tests of Fit. Annals of Statistics, 3, 599-616. https://doi.org/10.1214/aos/1176343125  Luong, A. (2018) Simulated Quadratic Distance Methods Using Grouped Data for Some Bivariate Continuous Models. Open Journal of Statistics, 8, Article ID: 84193. https://doi.org/10.4236/ojs.2018.82024  Carrasco, M. and Florens, J.-P. (2000) Generalization of GMM to a Continuum of Moment Conditions. Econometric Theory, 16, 797-834. https://doi.org/10.1017/S0266466600166010  Harville, D.A. (1997) Matrix Algebra from a Statistician Perspective. Springer, New York. https://doi.org/10.1007/b98818  Conover, W.J. (1999) Practical Nonparametric Statistics. Wiley, New York. 