Cross Validation Based Model Averaging for Varying-Coefficient Models with Response Missing at Random

Abstract

In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.

Share and Cite:

Li, H. and Wang, X. (2024) Cross Validation Based Model Averaging for Varying-Coefficient Models with Response Missing at Random. Journal of Applied Mathematics and Physics, 12, 764-777. doi: 10.4236/jamp.2024.123047.

1. Introduction

In the past two decades, there have been more and more studies on model averaging and model selection. Model selection is to select a final model before estimation, but due to the uncertainty of the model, there will be a deviation caused by the selection of wrong model. Model averaging is to estimate a large number of candidate models, and then give greater weight to the sub-models with better prediction results, that is, to select the model after estimation, and the candidate model does not require the inclusion of the real model, which can effectively overcome the shortcomings of the model selection method. Therefore, model averaging is increasingly used in various fields.

Most of the model averaging methods have been devoted to the frequency model averaging, and mainly in parametric models. For example, Hansen (2007) [1] , Zhang et al. (2008) [2] , Wan et al. (2010) [3] proposed the Mallows-type model averaging (MMA) method. To further improve the method, Zhang et al. (2016) [4] proposed a weight selection criterion based on the Kullback-Leibler loss with penalty term for generalized linear models and generalized linear mixed-effects models. Using nonparametric kernel smoothing method to estimate nonparametric functions, Li et al. (2018) [5] , Zhang and Wang (2019) [6] extended the Mallows-type model averaging method to semiparametric models. Recently, Xia (2021) [7] extended the Mallows-type model averaging method to varying-coefficient models using B-spline to estimate the nonparametric functions. However, the above mentioned literatures focus on the case where the data is completely observed. In statistical studies such as economics, market research and medical research, missing data is a common phenomenon. According to the different missing mechanisms, it can be divided into three types, namely Missing completely at random (MCAR), Missing at random (MAR) and Missing not ant random (MNAR). Little attention has been paid to model averaging with missing data. One of the reasons is that the complexity of missing data makes it very challenging to extend existing methods with complete data to missing data. The study of model averaging with missing data is still a relatively new topic and needs further development.

In this paper, we focus on model averaging for varying-coefficient models with response missing at random (MAR), which has been widely studied in the field of missing data. As far as we know, there are few literatures on the model averaging with missing response. For example, Sun et al. (2014) [8] developed a model average estimation approach for linear regression models with response missing at random under a local misspecification framework, called smoothed focused vector information criteria (SFVIC). Zeng et al. (2018) [9] extended the SFVIC to varying-coefficient partially linear models with response missing at random by employing the profile least-squares estimation process and inverse probability weighted method to estimate regression coefficients of the models under the framework of local misspecification. However, these researches are limited to the local misspecification framework, and the model averaging without local misspecification framework is relatively small. For example, based on the weighted delete-one cross-validation (WDCV) criterion, Xie et al. (2021) [10] proposed a two-step model averaging procedure for high-dimensional linear regression models with missing response without local misspecification framework, and under certain conditions, they proved the WDCV criterion asymptotically achieved the lowest possible prediction loss. But their criterion required the covariates used in different candidate models cannot overlap. Based on the Mallows-type model averaging method, Wei et al. (2021) [11] established the HRCp criterion of linear models with missing response, which improved the shortcomings of MMA in estimating the error covariance matrix. Wei and Wang (2021) [12] developed a linear models averaging procedure with response missing at random by establishing a cross-validation-based weight choice criterion, and proved its asymptotic optimality. The three literatures mentioned above are limited to parametric linear models.

Inspired by Xia (2021) [7] , Wei and Wang (2021) [12] , and Wei et al. (2021) [11] , in this paper we shall extend the Jackknife model averaging (JMA) method (e.g. Hansen and Racine (2012) [13] and Zhang et al. (2013) [14] ) to varying-coefficient models with response missing at random and heteroscedasticity error. The varying-coefficient model is a common type of non-parametric model. It has no fixed regression function and is more flexible in form. It can further fit complex data and is more widely used in various fields. Following Xia (2021) [7] , we use B-spline to estimate the nonparametric varying-coefficient functions, and use the inverse probability weighted (IPW) method to deal with the missing data. In order to avoid the “curse of dimensionality”, we assume a parametric model for propensity score function. Compared with previous articles, our method has the following advantages. Firstly, we use B-spline to estimate the varying-coefficient functions, which results in less computational burden compared with the local kernel function method. Secondly, we choose the weight vector based on JMA method. Compared with the MMA method, our method does not need to estimate the covariance matrix of the error and is more suitable for the heteroscedasticity models, which makes it easier to deal with real data. In the theoretical proof, we do not need to consider the relationship between the estimated error covariance matrix and the real error covariance matrix, which makes our theoretical proof easier. Finally, our method can deal with three cases for candidate models (Nested case, Marginal case and Full case), not just the case where the candidate models are nested. Under certain conditions, our proposal is asymptotically optimal in the sense of achieving the lowest possible squared error, which shows that our method is feasible. The finite sample performance of our proposal is studied by numerical simulations and compared with some related methods. The simulation results display that our proposal is feasible.

The rest of this paper is organized as follows. Section 2 and Section 3 describe the method of model averaging estimator and weight selection respectively. Some conditions and theoretical result for asymptotic optimality are given in Section 4. The proofs of the main results are given in Section 5.

2. Model Framework and Model Averaging Estimation Method

We consider the following varying-coefficient model:

{ Y i = μ i + ε i μ i = j = 1 p X i j β j ( U i ) , i = 1 , 2 , , n , (1)

where Y i is response variable, the index variable U i is a scalar and ε i is a random error with E ( ε i | X i , U i ) = 0 and V a r ( ε i | X i , U i ) = σ i 2 . We assume μ i = E ( Y i | X i , U i ) = j = 1 p X i j β j ( U i ) = X i T β ( U i ) , where X i = ( X i 1 , , X i p ) T is a p-dimensional covariate, β ( U i ) = ( β 1 ( U i ) , , β p ( U i ) ) T corresponds to the vector of the unknown coefficient functions.

Suppose that X i and U i are completely observable, and Y i may be missing, where δ i = 1 if Y i is observed, δ i = 0 otherwise. In this paper, we assume that Y i is missing at random (MAR), that is

P ( δ i = 1 | Y i , X i , U i ) = P ( δ i = 1 | X i , U i ) : = π ( X i , U i ) . (2)

Let Z π , i = { π ( X i , U i ) } 1 δ i Y i , i = 1,2, , n , using model (1) and MAR assumption, we can obtain

{ Z π , i = μ i + ε π , i μ i = E ( Z π , i | X i , U i ) , i = 1,2, , n , (3)

where E ( ε π , i | X i , U i ) = 0 , V a r ( ε π , i | X i , U i ) = σ π , i 2 , σ π , i 2 = [ { π ( X i , U i ) } 1 1 ] μ i 2 + { π ( X i , U i ) } 1 σ i 2 .

We consider a series of approximate models. In particular, we use M n candidate model { M 1 , , M M n } to approximate the real data generating process of Y. The mth candidate model M m contains k m covariates, that is, k m = | M m | is the cardinality of M m . The total number of all possible candidate models is M n = 2 p , which contains an empty model that excludes all covariates. From model (3), under the mth model, we have

Z π , i = j = 1 k m X i j ( m ) β j ( m ) ( U i ) + ε π , i . (4)

The purpose of this paper is to construct an asymptotically optimal model average estimator of the conditional mean μ = ( μ 1 , μ 2 , , μ n ) T with missing response under MAR assumption.

Note that { Z π , i , X i , U i } is fully observed, if the selection probability function π ( X i , U i ) is known, the estimator of β j ( m ) ( u ) can be obtained by using B-spline to approximate the coefficient function for { Z π , i , X i , U i } . Specifically, we first approximate each β j ( m ) ( ) with a function in a polynomial spline space. Without losing generality, we assume that U has a compact set U = [ 0 , 1 ] . Let B ( u ) = ( B 1 ( u ) , , B L n ( u ) ) T is a B-spline function basis of order r, where L n = J n + r + 1 , J n is the number of interior knots and increases with the increase of sample size n. According to the B-spline theory in De Boor (2001) [15] , there exists a parameter vector γ j ( m ) satisfying

β j ( m ) ( u ) B ( u ) T γ j ( m ) ,

where β j ( m ) ( u ) is the jth component of β ( m ) ( u ) . So the mth model can be re-expressed as

Z π , i = j = 1 k m X i j ( m ) B ( U i ) T γ j ( m ) + ε π , i j = 1 k m W i ( m ) γ j ( m ) + ε π , i ,

where W i ( m ) = ( X i 1 B ( U i ) T , X i 2 B ( U i ) T , , X i k m B ( U i ) T ) T , i = 1,2, , n is a k m L n dimensional column vector, B ( U i ) = ( B 1 ( U i ) , , B L n ( U i ) ) T .

Therefore, β j ( m ) ( u ) can be obtained by solving the following least squares optimization problem

min γ j ( m ) i = 1 n [ Z π , i j = 1 k m W i ( m ) γ j ( m ) ] 2 . (5)

Let Z π = ( Z π ,1 , Z π 2 , , Z π n ) T , γ ( m ) = ( γ j ( m ) T , j M m ) T is a vector of length k m L n , then the solution of (5) is given by

γ ^ ( m ) = [ W ( m ) T W ( m ) ] 1 W ( m ) T Z π ,

where W ( m ) = ( W 1 ( m ) , , W n ( m ) ) T is a n × k m L n -dimensional matrix. Thus, the estimator of β ( m ) ( u ) is β ^ ( m ) ( u ) = ( β ^ j ( m ) ( u ) , j M m ) T with β ^ j ( m ) ( u ) = B ( u ) T γ ^ j ( m ) .

For the mth candidate model, the estimator of μ is then obtained as

μ ^ π ( m ) = W ( m ) [ W ( m ) T W ( m ) ] 1 W ( m ) T Z π = P ( m ) Z π , (6)

where P ( m ) = W ( m ) [ W ( m ) T W ( m ) ] 1 W ( m ) T . From (6), we know that μ ^ π ( m ) is linearly related to Z π , we can define the model average estimator of μ as the weighted average of μ ^ π ( m ) , that is

μ ^ π ( ω ) = m = 1 M n ω m μ ^ π ( m ) = m = 1 M n ω m P ( m ) Z π = P ( ω ) Z π , (7)

where P ( ω ) = m = 1 M n ω m P ( m ) , ω = ( ω 1 , , ω M n ) T is the weight vector, which satisfies

H n = { ω [ 0,1 ] M n : m = 1 M n ω m = 1 } .

3. Weight Selection

According to Hansen and Racine (2012) [13] , we give the details for the jackknife selection (also known as leave-one-out cross validation) of ω as follows.

The jackknife estimator of the mth model is denoted by

μ ˜ π ( m ) = ( μ ˜ π ( m ) 1 , μ ˜ π ( m ) 2 , , μ ˜ π ( m ) n ) T ,

where μ ˜ π ( m ) i = W ( m ) i [ W ( m ) ( i ) T W ( m ) ( i ) ] 1 W ( m ) ( i ) T Z π ( i ) , and W ( m ) ( i ) and Z π ( i ) are defined as matrices W ( m ) and column vectors Z π with the ith row deleted. After some calculations, we can derive the following relationship:

P ˜ ( m ) = D ( m ) ( P ( m ) I n ) + I n ,

where D ( m ) is an n × n diagonal matrix and the ith diagonal element is ( 1 P ( m ) , i i ) 1 with P ( m ) , i i the i-th diagonal element of P ( m ) , then μ ˜ π ( ω ) = P ˜ ( ω ) Z π , P ˜ ( ω ) = m = 1 M n ω m P ˜ ( m ) . Therefore, we obtain the following JMA criterion for selecting the weight vector ω :

C V π ( ω ) = Z π μ ˜ π ( ω ) 2 . (8)

In fact, π ( X i , U i ) in (8) is unknown and need to be modeled. Following Wei and Wang (2021) [12] , we assume that π ( X , U ; α ) is a parametric model of π ( X , U ) , where π ( ; α ) is a known function and can be correctly specified while α is an unknown parameter vector. Denote α ^ n as the maximum likelihood estimate (MLE) of α , π ^ ( X , U ) = π ( X , U ; α ^ n ) . Replacing π ( X , U ) in C V π ( ω ) with π ^ ( X , U ) , we can get the following criterion:

C V π ^ ( ω ) = Z π ^ μ ˜ π ^ ( ω ) 2 . (9)

Then the optimal weight vector ω ^ π ^ is defined as

ω ^ π ^ = arg min ω H n C V π ^ ( ω ) . (10)

Thus, the model average estimator of μ is μ ^ π ^ ( ω ^ π ^ ) , and named as the varying-coefficient jackknife model average estimator (VC-JMA).

4. Asymptotically Optimality

In this section, we investigate the asymptotic optimality of the proposed estimator μ ^ π ^ ( ω ^ π ^ ) . We first give some necessary symbols. Define

L π ( ω ) = μ μ ^ π ( ω ) 2 , R π ( ω ) = E { L π ( ω ) | X , U } , (11)

as the loss function and risk function of μ ^ π ( ω ) , respectively. Ω e π = diag ( σ π ,1 2 , , σ π , n 2 ) . Let ξ π = inf w H n R π ( ω ) , ω m 0 be a M n -dimensional unit vector whose the mth element is one and the other elements are zeros and X u = ( X ( 1 ) , X ( 2 ) , , X ( M n ) ) T .

Next, we give some conditions which are required for proving the asymptotic optimality of μ ^ π ^ ( ω ^ π ^ ) . All the limiting processes discussed here and throughout the paper are with respect to n .

(C1) The MLE α ^ n of α is n consistent and satisfies the regularity conditions of asymptotic normality. π ( X , U ; α ) is bounded away from 0 and twice continuously diferentiable with respect to α . For all α ’s in a neighborhood of α 0 , max 1 i n | π ( X i , U i ; α ) α | = O p ( 1 ) , where α 0 is the true value of α ^ n .

(C2) λ max { Ω e π } C e , p ( m ) , i i C p k m n 1 , a.s., where C e and C p are constant.

(C3) r u ξ π 1 a .s . 0 , where r u is the rank of X u .

(C4) M n ξ π 2 G m = 1 M n ( R π ( ω m 0 ) ) G a .s . 0 , μ Τ μ n 1 = O ( 1 ) , E ( ε π , i 4 G | X i , U i ) C v , a.s., where G is an integer satisfying 1 G < .

(C5) Functions β j ( u ) , j = 1, , p belong to a class of functions B , whose rth detivatives β j ( r ) ( u ) s exist and are Lipschitz of order β 0 .

(C6) The density f U of U is bounded away from 0 and infinity on its support.

Remark 1 As Zhang et al. (2013) [14] pointed out, condition (C2) is commonly used in the study of asymptotic optimality of cross-validation methods. The condition (C3) is based on Zhang et al. (2013) [14] ’s condition (22). For a detailed explanation of the condition, see Zhang et al. (2013) [14] . Conditions (C4) is commonly used in model averaging literature, see for example Wan et al. (2010) [3] , Zhang et al. (2013) [14] and Zhu et al. (2019) [16] .

Remark 2 Conditions (C5) and (C6) are two common assumptions in approximating nonparametric coefficient functions with B-spline basis functions, which can be seen in Fan et al. (2014) [17] .

With these conditions, the following theorem states the asymptotic optimality of μ ^ π ^ ( ω ^ π ^ ) .

Theorem 1 Suppose that conditions (C1) - (C6) hold, then

L π ^ ( ω ^ π ^ ) inf ω H n L π ^ ( ω ) p 1 , (12)

where L π ^ ( ω ) = μ μ ^ π ^ ( ω ) 2 .

Theorem 1 implies that our proposed estimator μ ^ π ^ ( ω ^ π ^ ) is asymptotically optimal because the selected weight vector ω ^ π ^ yields a squared error that is asymptotically identical to that of the infeasible optimal weight vector.

5. Theorem Proof

Before proving Theorem 1, we first introduce some symbols used in the proof process. We use c to represent a universal positive constant, which can take different values in different situations. For any matrix A, we define λ max { A } as the maximum singular value of A. Let ξ ˜ π = inf ω H n R ˜ π ( ω ) . According to the definitions of L π ( ω ) and R π ( ω ) in (11), we define the loss function and risk function of μ ˜ π ( ω ) in (8) by

L ˜ π ( ω ) = μ μ ˜ π ( ω ) 2 , R ˜ π ( ω ) = E { L ˜ π ( ω ) | X , U } ,

respectively, and then obtain

R ˜ π ( ω ) = A ˜ ( ω ) μ 2 + t r [ P ˜ ( ω ) T Ω e π P ˜ ( ω ) ] ,

where A ˜ = 1 P ˜ ( ω ) . Let Q ( m ) be an n × n diagonal matrix and the ith diagonal element is P ( m ) , i i ( 1 P ( m ) , i i ) 1 . Then according to the definition of D ( m ) given in Section 2.2, we obtain

P ˜ ( m ) = P ( m ) Q m A ( m ) ,

where A ( m ) = I n P ( m ) .

Next, give the four Lemmas needed to prove Theorem 1.

Lemma 1 Let

ω * = arg min ω H n ( L ˜ π ( ω ) + a π ( ω ) ) ,

if, as n ,

sup ω H n | a π ( ω ) | R ˜ π ( ω ) p 0 , sup ω H n | L ˜ π ( ω ) R ˜ π ( ω ) 1 | p 0 ,

and there exists a constant c such that

inf ω H n R ˜ π ( ω ) c > 0 ,

almost surely, then

L ˜ π ( ω * ) inf ω H n L ˜ π ( ω ) p 1.

The proof of Lemma 1 refers to Zhang et al. (2013) [14] .

Lemma 2 If (C4) is satisfied, we have

sup ω H n | L ˜ π ( ω ) R ˜ π ( ω ) 1 | = o p ( 1 ) . (13)

Proof of Lemma 2: Note that

| L ˜ π ( ω ) R ˜ π ( ω ) | = | μ μ ˜ π ( ω ) 2 A ˜ ( ω ) μ 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } | = | P ˜ ( ω ) ε π 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } 2 μ T A ˜ ( ω ) T P ˜ ( ω ) ε π | .

Therefore, to prove (13), it suffices to prove that

sup ω H n | P ˜ ( ω ) ε π 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } | R ˜ π ( ω ) = o p ( 1 ) , (14)

sup ω H n | μ T A ˜ ( ω ) T P ˜ ( ω ) ε π | R ˜ π ( ω ) = o p ( 1 ) . (15)

By Zhang et al. (2013) [14] , under the condition (C4), we have

M n ξ ˜ π 2 G m = 1 M n ( R ˜ π ( ω m 0 ) ) G a .s . 0.

We use conditions (C4), Chebyshev inequality and Theorem 2 of Whittle (1960) [18] to prove (14) and (15), respectively, as follows:

P { sup ω H n | P ˜ ( ω ) ε π 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } | / R ˜ π ( ω ) > δ | X , U } P { sup ω H n t = 1 M n m = 1 M n ω t ω m | ε π T P ˜ ( t ) T P ˜ ( t ) ε π t r { P ˜ ( t ) T Ω e π P ˜ ( m ) } | > δ ξ ˜ π | X , U } P { max 1 t M n max 1 m M n | P ˜ ( t ) T P ˜ ( t ) ε π t r { P ˜ ( t ) T Ω e π P ˜ ( m ) } | > δ ξ ˜ π | X , U } = P { { | ε π T P ˜ ( ω 1 0 ) T P ˜ ( ω 1 0 ) ε π t r { P ˜ ( ω 1 0 ) T Ω e π P ˜ ( ω 1 0 ) } | > δ ξ ˜ π | X , U } { | ε π T P ˜ ( ω 1 0 ) T P ˜ ( ω 2 0 ) ε π t r { P ˜ ( ω 1 0 ) T Ω e π P ˜ ( ω 2 0 ) } | > δ ξ ˜ π | X , U }

{ | ε π T P ˜ ( ω 1 0 ) T P ˜ ( ω M n 0 ) ε π t r { P ˜ ( ω 1 0 ) T Ω e π P ˜ ( ω M n 0 ) } | > δ ξ ˜ π | X , U } { | ε π T P ˜ ( ω 2 0 ) T P ˜ ( ω 1 0 ) ε π t r { P ˜ ( ω 2 0 ) T Ω e π P ˜ ( ω 1 0 ) } | > δ ξ ˜ π | X , U } { | ε π T P ˜ ( ω 2 0 ) T P ˜ ( ω 2 0 ) ε π t r { P ˜ ( ω 2 0 ) T Ω e π P ˜ ( ω 2 0 ) } | > δ ξ ˜ π | X , U }

{ | ε π T P ˜ ( ω 2 0 ) T P ˜ ( ω M n 0 ) ε π t r { P ˜ ( ω 2 0 ) T Ω e π P ˜ ( ω M n 0 ) } | > δ ξ ˜ π | X , U } { | ε π T P ˜ ( ω M n 0 ) T P ˜ ( ω M n 0 ) ε π t r { P ˜ ( ω M n 0 ) T Ω e π P ˜ ( ω M n 0 ) } | > δ ξ ˜ π | X , U } }

t = 1 M n m = 1 M n P { | ε π T P ˜ ( ω t 0 ) T P ˜ ( ω m 0 ) ε π t r { P ˜ ( ω t 0 ) T Ω e π P ˜ ( ω m 0 ) } | > δ ξ ˜ π | X , U } t = 1 M n m = 1 M n E ( ε π T P ˜ ( ω t 0 ) T P ˜ ( ω m 0 ) ε π t r { P ˜ ( ω t 0 ) T Ω e π P ˜ ( ω m 0 ) } 2 G | X , U ) δ 2 G ξ ˜ π 2 G c δ 2 G ξ ˜ π 2 G t = 1 M n m = 1 M n ( t r { P ˜ ( ω t 0 ) T P ˜ ( ω m 0 ) } ) G c δ 2 G ξ ˜ π 2 G t = 1 M n m = 1 M n ( t r { P ˜ ( ω m 0 ) T P ˜ ( ω m 0 ) } ) G c δ 2 G ξ ˜ π 2 G M n m = 1 M n ( R ˜ π ( ω m 0 ) ) G a .s . 0.

Therefore, we have

P { sup ω H n | P ˜ ( ω ) ε π 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } | / R ˜ π ( ω ) > δ } = E { P { sup ω H n | P ˜ ( ω ) ε π 2 t r { P ˜ ( ω ) T Ω e π P ˜ ( ω ) } | / R ˜ π ( ω ) > δ | X , U } } = o p ( 1 ) .

(14) is proven. The proof process of (15) is similar to that of (14), and hence the proof of the Lemma 2.

Lemma 3 If (C1) - (C4) are satisfied, then

sup ω H n | L π ^ ( ω ) R π ( ω ) 1 | = o p ( 1 ) . (16)

Proof of Lemma 3: By Lemma 1 of Wei and Wang (2021) [12] and Cauchy-Schwarz inequality, we have

| L π ^ ( ω ) R π ( ω ) 1 | | L π ( ω ) R π ( ω ) 1 | + 2 { L π ( ω ) } 1 / 2 μ ^ π ( ω ) μ ^ π ^ ( ω ) R π ( ω ) + μ ^ π ( ω ) μ ^ π ^ ( ω ) 2 R π ( ω ) ,

μ ^ π ( ω ) μ ^ π ^ ( ω ) 2 = P ( ω ) Z π P ( ω ) Z π ^ 2 [ λ max { P ( ω ) } ] 2 Z π Z π ^ 2 Z π Z π ^ 2 .

Therefore, to prove (16), it suffices to show that

sup ω H n | L π ( ω ) R π ( ω ) 1 | = o p ( 1 ) , (17)

sup ω H n Z π Z π ^ 2 R π ( ω ) = o p ( 1 ) . (18)

Note that

| L π ( ω ) R π ( ω ) 1 | = | P ( ω ) ε π 2 t r { P ( ω ) 2 Ω e π } 2 μ T A ( ω ) P ( ω ) ε π R π ( ω ) | | P ( ω ) ε π 2 t r { P ( ω ) 2 Ω e π } R π ( ω ) | + 2 | μ T A ( ω ) P ( ω ) ε π R π ( ω ) | .

Therefore, to prove (17), it suffices to show that

sup ω H n | μ T A ( ω ) P ( ω ) ε π R π ( ω ) | = o p ( 1 ) , (19)

sup ω H n | P ( ω ) ε π 2 t r { P ( ω ) 2 Ω e π } R π ( ω ) | = o p ( 1 ) . (20)

The proof steps of (19) and (20) are similar to those of (14) and (15), respectively. The detailed process is omitted here, so it is established under (C4) condition, i.e. (17) is proven, and next prove (18).

According to (C1) and Cauchy-Schwarz inequality, we have

sup ω H n Z π Z π ^ 2 R π ( ω ) ξ π 1 i = 1 n { 1 π ( X i , U i ; α ^ n ) 1 π ( X i , U i ) } 2 ( μ i + ε i ) 2 ξ π 1 { max 1 i n | 1 π ( X i , U i ; α ^ n ) 1 π ( X i , U i ) | } 2 i = 1 n ( μ i + ε i ) 2 ξ π 1 { n max 1 i n | 1 π ( X i , U i ; α ^ n ) 1 π ( X i , U i ) | } 2 c ( 1 n μ T μ + 1 n ε T ε ) .

According to (C3) and (C4), to prove (18), it suffices to prove that

( n ξ π ) 1 ε T ε = o p ( 1 ) , (21)

n max 1 i n | 1 π ( X i , U i ; α ^ n ) 1 π ( X i , U i ) | = O p ( 1 ) . (22)

By (C2) - (C4) and Markov inequality, for any δ > 0 , we have

P { ( n ξ π ) 1 ε Τ ε > δ | X , U } ( n ξ π δ ) 1 E { ε Τ ε | X , U } ( n ξ π δ ) 1 t r { W } ξ π 1 δ 1 λ max { Ω e π } C e ξ π 1 δ 1 a .s . 0.

This together with the dominated convergence theorem indicates ( n ξ π ) 1 ε T ε = o p ( 1 ) .

We perform Taylor expansion of { π ( X i , U i ; α ^ n ) } 1 around the true value α 0 and then obtain

n max 1 i n | 1 π ( X i , U i ; α ^ n ) 1 π ( X i , U i ) | = n max 1 i n | ( π ( X i , U i ; α ) α | α = α n , X i ) T π ( X i , U i , α ^ n ) π ( X i , U i ) ( α ^ n α 0 ) | c max 1 i n | π ( X i , U i ; α ) α | α = α n , X i | n α ^ n α 0 ,

where the last inequality is due to (C1) and Cauchy-Schwarz inequality, and α n , X i is a linear combination between α ^ n and α 0 . Since α ^ n is MLE, by (C1) we have n α ^ n α 0 = O p ( 1 ) . Because of the consistency of α ^ n and (C1), we have max 1 i n | π ( X i , U i ; α ) α | α = α n , X i | = O p ( 1 ) . These results imply (22), this completes the proof of (18) and hence the proof of the Lemma 3.

Lemma 4 If (C2) - (C4) are satisfied, then

L ˜ π ( ω ^ π ^ ) inf ω H n L ˜ π ( ω ) 1 = o p ( 1 ) . (23)

Proof of Lemma 4: After some careful calculations, we have

C V π ^ ( ω ) = L ˜ π ( ω ) + 2 a π ^ ( ω ) + Z π ^ μ 2 ,

where

a π ^ ( ω ) = ε π T A ˜ ( ω ) μ ε π T P ˜ ( ω ) ε π + 1 2 μ ˜ π ( ω ) μ ˜ π ^ ( ω ) 2 + ( Z π ^ Z π ) T { μ μ ˜ π ( ω ) } + ( Z π ^ Z π ) T { μ ˜ π ( ω ) μ ˜ π ^ ( ω ) } + { μ μ ˜ π ( ω ) } T { μ ˜ π ( ω ) μ ˜ π ^ ( ω ) } + ε π T P ˜ ( ω ) ( Z π Z π ^ ) .

Because Z π ^ μ 2 is independent of ω , then

ω ^ π ^ = arg min ω H n { L ˜ π ( ω ) + 2 a π ^ ( ω ) } .

By Lemma 1, we know that, to prove (23) it suffices to prove (13) and

sup ω H n | a π ^ ( ω ) | R ˜ π ( ω ) = o p ( 1 ) . (24)

Since (13) has been proved, the following only need to prove (24). By Cauchy-Schwarz inequality, we have

| a π ^ ( ω ) | | ε π T A ˜ ( ω ) μ | + | ε π T P ˜ ( ω ) ε π | + 1 2 μ ˜ π ^ ( ω ) μ ˜ π ( ω ) 2 + Z π ^ Z π { L ˜ π ( ω ) } 1 / 2 + Z π ^ Z π μ ˜ π ^ ( ω ) μ ˜ π ( ω ) + { L ˜ π ( ω ) } 1 / 2 μ ˜ π ^ ( ω ) μ ˜ π ( ω ) + { P ˜ ( ω ) } T ε π Z π ^ Z π .

So to prove (24), it suffices to verify

sup ω H n | ε π T A ˜ ( ω ) μ | R ˜ π ( ω ) = o p ( 1 ) , (25)

sup ω H n | ε π T P ˜ ( ω ) ε π | R ˜ π ( ω ) = o p ( 1 ) , (26)

sup ω H n ε π T P ˜ ( ω ) { P ˜ ( ω ) } T ε π R ˜ π ( ω ) = o p ( 1 ) , (27)

sup ω H n Z π ^ Z π 2 R ˜ π ( ω ) = o p ( 1 ) (28)

sup ω H n μ ˜ π ^ ( ω ) μ ˜ π ( ω ) 2 R ˜ π ( ω ) = o p ( 1 ) . (29)

The proofs of (25), (26) and (27) are similar to (14), and hold under the condition (C4).

sup ω H n Z π ^ Z π 2 R ˜ π ( ω ) = sup ω H n Z π ^ Z π 2 R π ( ω ) { R π ( ω ) R ˜ π ( ω ) 1 + 1 } sup ω H n Z π ^ Z π 2 R π ( ω ) sup ω H n | R π ( ω ) R ˜ π ( ω ) 1 | + sup ω H n Z π ^ Z π 2 R π ( ω ) . (30)

By Zhang et al. (2013) [14] , under (C2) - (C4), we have

sup ω H n | R π ( ω ) R ˜ π ( ω ) 1 | = o p ( 1 ) ,

which together with (18) implies (28), and next prove (29).

By Lemma 1 of Wei and Wang (2021) [12] , we have

μ ˜ π ^ ( ω ) μ ˜ π ( ω ) 2 = P ˜ ( ω ) ( Z π ^ Z π ) 2 λ max { { P ˜ ( ω ) } Τ P ˜ ( ω ) } Z π ^ Z π 2 [ λ max { P ˜ ( ω ) } ] 2 Z π ^ Z π 2 ( 1 + p ˜ ) 2 Z π ^ Z π 2 . (31)

By Zhang et al. (2013) [14] , we know that, under (C2), p ˜ a .s . 0 . This together with (31) as well as (28) implies (29), and (25) - (29) together prove (24). This completes the proof of the Lemma 4.

Proof of Theorem 1: Firstly, we note that

L π ^ ( ω ^ π ^ ) inf ω H n L π ^ ( ω ) 1 = sup ω H n { L π ^ ( ω ^ π ^ ) L π ^ ( ω ) 1 } = sup ω H n { L π ^ ( ω ^ π ^ ) R π ( ω ^ π ^ ) R π ( ω ^ π ^ ) R ˜ π ( ω ^ π ^ ) R ˜ π ( ω ^ π ^ ) L ˜ π ( ω ^ π ^ ) L ˜ π ( ω ^ π ^ ) L ˜ π ( ω ) L ˜ π ( ω ) R ˜ π ( ω ) R ˜ π ( ω ) R π ( ω ) R π ( ω ) L π ^ ( ω ) 1 } sup ω H n ( L π ^ ( ω ) R π ( ω ) ) sup ω H n ( R π ( ω ) R ˜ π ( ω ) ) sup ω H n ( R ˜ π ( ω ) L ˜ π ( ω ) ) sup ω H n ( L ˜ π ( ω ) R ˜ π ( ω ) ) × sup ω H n ( R ˜ π ( ω ) R π ( ω ) ) sup ω H n ( R π ( ω ) L π ^ ( ω ) ) L ˜ π ( ω ^ π ^ ) inf ω H n L ˜ π ( ω ) 1.

Therefore, to prove Theorem 1, it suffices to prove that

sup ω H n | R π ( ω ) R ˜ π ( ω ) 1 | = o p ( 1 ) , (32)

sup ω H n | L ˜ π ( ω ) R ˜ π ( ω ) 1 | = o p ( 1 ) , (33)

sup ω H n | L π ^ ( ω ) R π ( ω ) 1 | = o p ( 1 ) , (34)

L ˜ π ( ω ^ π ^ ) inf ω H n L ˜ π ( ω ) 1 = o p ( 1 ) . (35)

According to Zhang et al. (2013) [14] , we know that (32) holds under (C2) - (C4). Then according to Lemma 2, Lemma 3 and Lemma 4, we know that (33) - (35) hold, so the Theorem 1 is proved.

6. Conclusions

In this paper, we extend the JMA method to the nonparametric varying-coefficient models with response missing at random. Firstly, we use the inverse probability weighted method to deal with the missing data, then use B-spline to estimate the nonparametric functions, and finally use jackknife method to select weight ω . Under certain conditions, the asymptotic optimality of our method is proved.

In this paper, we only consider the varying-coefficient models. Undoubtedly, it is meaningful to extend our ideas to more complex models, such as partially linear models and semiparametric varying-coefficient models. However, this is a very challenging research. Second, for this artical, we assume that the parametric model of the selection probability function is correctly specified, and further research is needed to develop a model averaging method that is robust against the misspecification of the selection probability function. These will be interesting future research directions.

Acknowledgements

The research is supported by NSF projects (ZR2021MA077) of Shandong Province of China.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Hansen, B. (2007) Least Squares Model Averaging. Econometrica, 75, 1175-1189.
https://doi.org/10.1111/j.1468-0262.2007.00785.x
[2] Zhang, X., Wan, A.T.K. and Zou, G. (2008) Least Squares Model Combining by Mallows Criterion.
https://doi.org/10.2139/ssrn.1272288
[3] Wan, A.T.K, Zhang, X. and Zou, G. (2010) Least Squares Model Averaging by Mallows Criterion. Journal of Econometrics, 156, 277-283.
https://doi.org/10.1016/j.jeconom.2009.10.030
[4] Zhang, X., Yu, D., Zou, G. and Liang, H. (2016) Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models. Journal of the American Statistical Association, 111, 1775-1790.
https://doi.org/10.1080/01621459.2015.1115762
[5] Li, J., Xia, X., Wong, W.K. and Nott, D. (2018) Varying-Coefficient Semiparametric Model Averaging Prediction. Biometrics, 74, 1417-1426.
https://doi.org/10.1111/biom.12904
[6] Zhang, X. and Wang, W. (2019) Optimal Model Averaging Estimation for Partially Linear Models. Statistica Sinica, 29, 693-718.
https://doi.org/10.5705/ss.202015.0392
[7] Xia, X. (2021) Model Averaging Prediction for Nonparametric Varying-Coefffcient Models with B-Spline Smoothing. Statistical Papers, 62, 2885-2905.
https://doi.org/10.1007/s00362-020-01218-9
[8] Sun, Z., Su, Z. and Ma, J. (2014) Focused Vector Information Criterion Model Selection and Model Averaging Regression with Missing Response. Metrika, 77, 415-432.
https://doi.org/10.1007/s00184-013-0446-8
[9] Zeng, J., Cheng, W., Hu, G. and Rong, Y. (2018) Model Averaging Procedure for Varying-Coefficient Partially Linear Models with Missing Responses. Journal of the Korean Statistical Society, 47, 379-394.
https://doi.org/10.1016/j.jkss.2018.04.004
[10] Xie, J., Yan, X. and Tang, N. (2021) A Model-Averaging Method for High-Dimensional Regression with Missing Responses At Random. Statistica Sinica, 31, 1005-1026.
https://doi.org/10.5705/ss.202018.0297
[11] Wei, Y., Wang, Q. and Liu, W. (2021) Model Averaging for Linear Models with Responses Missing at Random. Annals of the Institute of Statistical Mathematics, 73, 535-553.
https://doi.org/10.1007/s10463-020-00759-y
[12] Wei, Y. and Wang, Q. (2021) Cross-Validation-Based Model Averaging in Linear Models with Response Missing at Random. Statistics and Probability Letters, 171, 108990.
https://doi.org/10.1016/j.spl.2020.108990
[13] Hansen, B. and Racine, J. (2012) Jackknife Model Averaging. Journal of Econometrics, 167, 38-46.
https://doi.org/10.1016/j.jeconom.2011.06.019
[14] Zhang, X., Wan, A.T.K. and Zou, G. (2013) Model Averaging by Jackknife Criterion in Models with Dependent Data. Journal of Econometrics, 174, 82-94.
https://doi.org/10.1016/j.jeconom.2013.01.004
[15] De Door, C. (2001) A Practical Guide to Splines. Springer, New York.
[16] Zhu, R., Wan, A.T.K., Zhang, X. and Zou, G. (2019) A Mallows-Type Model Averaging Estimator for the Varying-Coefficient Partially Linear Model. Journal of the American Statistical Association, 114, 882-892.
https://doi.org/10.1080/01621459.2018.1456936
[17] Fan, J., Ma, Y. and Dai, W. (2014) Nonparametric Independent Screening in Sparse Ultra-High Dimensional Varying Coefficient Models. American Statistical Association, 109, 1270-1284.
https://doi.org/10.1080/01621459.2013.879828
[18] Whittle, P. (1960) Bounds for the Moments of Linear and Quadratic Forms in Independent Variables. Theory of Probability & Its Applications, 5, 302-305.
https://doi.org/10.1137/1105028

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.