Nonparametric Feature Screening via the Variance of the Regression Function ()
1. Introduction
With advances in the data collection technology, ultrahigh-dimensional data can be easily collected in many research areas such as genetic data, microarray data, and high volumn financial data. In these examples, the number of predictors (p) is an exponential function of the the number of the observations (n). In other words,
for some
. The sparsity assumption, that only a small set of covariates has an effect on the response, makes the inference possible in ultrahigh-dimensional data.
The popular variable selection methods may suffer technical difficulties and performance issues when analyzing ultrahigh-dimensional data due to the simultaneous challenges of computational expediency, statistical accuracy, and algorithmic stability [1]. Motivated by this, Fan and Lv [2] recommended that a variable screening procedure be performed prior to variable selection. Working with a linear model, they introduced sure independence screening (SIS), a variable screening procedure based on Pearson’s correlation coefficient. Assuming Gaussian predictors and response variable, they showed that SIS possesses the sure screening property, which means that the true predictors will be chosen with probability one as the sample size approaches to infinity. Since then, several feature screening methods based on SIS have been developed. Fan, Feng and Song [3] introduced a nonparametric screening procedure (NIS), which uses a spline-based nonparametric estimation of the marginal regression functions, and ranks predictors by the Euclidean norm of the estimated marginal regression function (evaluated at the data points). Li, Zhong and Zhu [4] proposed a ranking procedure using the distance correlation (DC-SIS). DC-SIS can be used for grouped predictors and multivariate responses. Li et al. [5] propose a robust rank correlation screening (RRCS), which uses a ranking based on Kendall’s τ rank correlation coefficient. They show that this procedure can handle semiparametric models under monotonic constraint to the link function. This procedure can be also used when there exists outliers, influence points, or heavy tailed errors. Wang and Deng [6] introduced a model-free feature screening to handle multi-classification problems with both categorical and continuous covariates using Gini impurity to evaluate the predictive power of covariates. Chen and Deng [7] proposed another model-free feature screening for multi-classification using the Maximal Information Coefficient to evaluate the predictive power of the variables.
Variables that are relevant for prediction are of particular interest in most scientific research and its applications. The aforementioned feature screening methods fail to distinguish variables that have predictive significance from those that influence the variance function or other aspects of the conditional distribution of the response. We propose a method that screens out variables without (marginal) predictive significance. The basic idea is that if a variable
has no predictive significance, the regression function
has zero variance. This leads to a method which ranks the predictors according to the sample variance (evaluated at the data points) of the p estimated regression functions, called RV-SIS for regression variance sure independence screening. We show that RV-SIS possesses the sure independence screening property under a general nonparametric regression setting. While the proofs use Nadaraya-Watson estimators for the marginal regression functions, the proofs also hold (with mild modifications) for local linear estimators.
We conduct numerical simulation studies to compare the RV-SIS to SIS, DC-SIS, RRCS and NIS. The RV-SIS outperforms SIS, DC-SIS, RRCS and NIS in many different model settings. The RV-SIS procedure shows that it takes less computing time than both DC-SIS and NIS.
We conduct numerical simulation studies to compare the RV-SIS to SIS, DC-SIS, RRCS and NIS. The RV-SIS outperforms SIS, DC-SIS, RRCS and NIS in many different model settings. The RV-SIS procedure shows that it takes less computing time than both DC-SIS and NIS.
2. Nonparametric Independence Screening via the Variance of the Regression Function
2.1. Preliminaries
Consider a random sample
of iid
-dimensional random vectors, where
is univariate and
is a p-dimensional,
. Let
and write
(1)
where
. For
, we consider p marginal nonparametric regression functions
(2)
of Y on each variable
, and define the set of active and inactive predictors by
(3)
respectively. The proposed screening procedure relies on ranking the significance of the p covariates according to the magnitude of the variance of their respective marginal regression functions,
(4)
Note that
for
, while
for
, making
a natural quantity to discriminate between the two classes of predictors. In addition, the variance of the regression function appears as the mean shift, under local alternatives, of the procedure for testing the significance of a covariate proposed in Wang, Akritas and Van Keilegom [8]. This suggests that
is the best quantity to discriminate between the two classes of predictors.
If
denotes an estimator of
,
can be estimated by the sample variance of
. The methodology described here works with any type of nonparametric estimator of
, but the theory has been developed for Nadaraya-Watson type estimators.
For a kernel function
and bandwidth h, set
, where , and
(5)
for the estimator of
. The bandwidth will be of the order
, throughout this paper. The RV-SIS estimates
by
(6)
for some threshold parameter
. Thus, the RV-SIS procedure reduces the dimension of covariate vector from p to
, where
refers the cardinality of a set. The choice of
, which defines the RV-SIS procedure, is discussed below.
2.2. Thresholding Rule
We adopt the idea of the soft thresholding rule by Zhu et al. [9] as a method for choosing the threshold parameter
. This method consists of randomly generating a vector
of d auxiliary random variables from the uniform distribution between (0, 1),
for
, that are independent of both
and
. By design, the auxiliary variables are inactive predictors. The soft thresholding rule chooses the threshold parameter as
(7)
where
denotes the set of indices of the d auxiliary variables.
Theorem 1 provides an upper bound on the probability of selecting inactive predictors from using the proposed soft thresholding rule provided the following exchangeability condition holds.
Exchangeability Condition: Let
and
. Then, the probability that
is greater than
is equal to the probability that
is less than
.
Theorem 1. Under the exchangeability condition, for any integer
we have
(8)
For some constants
and
.
A practical issue using the soft thresholding is how to choose the size of auxiliary variable d. Numerical simulation results suggested that
works well on simulated data.
The RV-SIS procedure consists of the following steps:
1. Calculate a sample variance of the nonparametric estimator
of each covariate for
.
2. Construct d auxiliary random variables and compute a sample variance of the nonparametric estimator
of each auxiliary variable for
.
3. Select predictors whose sample variance of the estimator is greater than the maximum sample variance of the auxiliary variables.
2.3. Sure Screening Properties
In this section, we show that the RV-SIS possesses the sure screening property. The sure screening property is fundamental to a feature screening procedure. This property ensures that all active predictors are selected in the screened submodel with probability 1 as the sample size increases. The following conditions are required for technical proofs:
(C1) There exists positive constants
and
such that,
(a)
,
(b)
.
(C2) The kernel
has bounded support, is symmetric, and is Lipschitz continuous, i.e., it satisfies, for some
and for all
,
(C3) If
denotes the marginal density of the kth predictor, we have
,
, and
is uniformly continuous, for all
.
(C4) The conditional expected value
is a Lipschitz continuous for all
, that is for some
and for all
,
(C5) For some constants
and
, we have
where
is
.
In words, Condition (C1) requires that the moment generating functions of the absolute value of the error terms of the marginal regressions and the square difference between two covariates, is finite at least for some
. Conditions (C2) and (C3) are standard conditions for establishing uniform convergence rates of needed for the kernel density estimator. Condition (C5) sets a lower bound on the variance of the marginal regression functions of the active predictors.
Theorem 2. Let
, be defined in (4), (5), (3) and (6), respectively.
1. Under condition (C1)~(C4) for any
and
, there exists positive constants
, and
such that,
2. Under condition (C1) ~ (C5),
and
as in part 1,
where
is the cardinality of
.
The second part of Theorem 2 shows that the screened submodel includes all active predictors with the probability approaches to 1 with an exponential rate.
3. Numerical Results
3.1. Simulation Studies
Here we present the result of several simulation studies comparing performance of the SIS, DC-SIS, NIS, RRCS and RV-SIS methods. In all cases,
comes from a multivariate normal with mean zero and covariance
, and
. We use three different covariance matrices: (i)
, (ii)
, and (iii)
. We set the dimension of covariates p to be 2000 and the sample size n to be 200. We replicate the experiment 500 times and base the comparisons on the following three criteria.
R1: The 5%, 25%, 50%, 75%, 95% quantiles of the minimum model size that includes all active covariates.
R2: The proportion of times each individual active covariate is selected in models of size
,
and
.
R3: The proportion of times all active covariates are selected in models of size
,
and
.
Criterion R1 shows the performance of the ranking of the predictors of the different screening procedures. Criterion R2 and R3 shows the accuracy of the different screening procedure if we used the thresholding value suggested by Fan and Lv [2].
To compare the performance of the screening procedure for both linear and nonlinear cases, we used the following four models:
(a)
(b)
(c)
(d)
All models include an indicator variable. Model (a) is linear, model (b) includes an interaction of two active predictors, model(c) is additive but nonlinear, and model(d) is nonlinear with an interaction term.
Tables 1-3 present the simulation results for R1 using each of the above models with
,
, and
respectively. Tables 4-6 presents the simulation results for R2 and R3 with
,
, and
respectively.
These results show that the comparisons in term of the three criteria are similar. All procedures perform worse when we use the equal covariance matrix,
. SIS and RRCS perform rather poorly except in Model (a) where all methods have similar performance. For Model (b), NIS performs slightly better than RV-SIS, while RV-SIS performs somewhat better than DC-SIS when
, considerably better when
, and significantly better when
. In Models(c) and (d) DC-SIS and NIS have similar performance but RV-SIS performs significantly better than either of them.
Finally, Table 7 presents the execution time, in seconds, of the DC-SIS, NIS and RV-SIS for Model (d). The RV-SIS procedure takes significantly less time than the DC-SIS and slightly less time than the NIS.
Table 1. The 5%, 25%, 50%, 75%, and 95% quantiles of the minimum model size that includes all active covariates when the covariance matrix is
.
Model (a) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
4.00 |
4.00 |
4.00 |
5.00 |
7.00 |
4.00 |
4.00 |
4.00 |
5.00 |
6.00 |
NIS |
RRCS |
4.00 |
4.00 |
4.00 |
5.00 |
7.05 |
4.00 |
4.00 |
4.00 |
5.00 |
6.00 |
RV-SIS |
|
|
|
|
|
4.00 |
4.00 |
4.00 |
5.00 |
9.05 |
|
|
|
|
|
Model (b) |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
SIS |
DC-SIS |
84.60 |
526.75 |
1179.00 |
1655.00 |
1923.35 |
9.00 |
26.00 |
68.50 |
169.25 |
516.50 |
NIS |
RRCS |
4.00 |
4.00 |
6.00 |
14.00 |
100.20 |
214.85 |
786.50 |
1355.50 |
1708.75 |
1931.10 |
RV-SIS |
|
|
|
|
|
4.00 |
4.00 |
7.00 |
22.00 |
273.20 |
|
|
|
|
|
Model (c) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
232.00 |
853.50 |
1363.50 |
1689.75 |
1933.00 |
103.95 |
316.25 |
565.00 |
860.00 |
1420.50 |
NIS |
RRCS |
55.00 |
312.25 |
749.00 |
1264.25 |
1786.15 |
255.65 |
929.00 |
1384.50 |
1732.25 |
1943.10 |
RV-SIS |
|
|
|
|
|
5.00 |
15.00 |
62.50 |
277.00 |
1208.10 |
|
|
|
|
Model (d) |
SIS |
DC-SIS |
106.90 |
583.75 |
1149.50 |
1628.75 |
1930.00 |
102.90 |
326.25 |
654.50 |
1069.00 |
1583.70 |
NIS |
RRCS |
33.50 |
389.00 |
882.00 |
1463.25 |
1915.00 |
231.55 |
832.25 |
1337.00 |
1678.25 |
1944.05 |
RV-SIS |
|
|
|
|
|
6.00 |
20.00 |
89.00 |
327.25 |
1144.55 |
|
|
|
|
|
Table 2. The 5%, 25%, 50%, 75%, and 95% quantiles of the minimum model size that includes all active covariates when the covariance matrix is
.
Model (a) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
8.00 |
11.00 |
17.00 |
37.25 |
249.55 |
6.00 |
9.00 |
12.00 |
17.00 |
76.05 |
NIS |
RRCS |
6.00 |
9.00 |
13.00 |
26.00 |
153.40 |
6.00 |
9.00 |
13.00 |
22.00 |
141.35 |
RV-SIS |
|
|
|
|
|
5.00 |
8.00 |
11.00 |
26.25 |
146.60 |
|
|
|
|
|
Model (b) |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
SIS |
DC-SIS |
29.90 |
256.50 |
924.00 |
1544.75 |
1935.05 |
8.00 |
10.00 |
13.00 |
18.00 |
40.00 |
NIS |
RRCS |
4.00 |
6.00 |
8.00 |
10.00 |
22.00 |
111.60 |
502.50 |
1133.00 |
1636.00 |
1938.10 |
RV-SIS |
|
|
|
|
|
4.00 |
6.00 |
7.00 |
10.00 |
32.05 |
|
|
|
|
|
Model (c) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
93.90 |
520.25 |
1122.50 |
1647.25 |
1925.15 |
40.95 |
148.00 |
334.00 |
629.00 |
1149.85 |
NIS |
RRCS |
16.00 |
74.00 |
239.50 |
625.00 |
1454.60 |
145.85 |
595.50 |
1207.00 |
1585.25 |
1930.10 |
RV-SIS |
|
|
|
|
|
9.00 |
17.00 |
55.50 |
244.00 |
978.30 |
|
|
|
|
|
Model (d) |
SIS |
DC-SIS |
34.80 |
183.75 |
701.50 |
1449.25 |
1899.40 |
31.90 |
142.00 |
344.00 |
675.25 |
1322.10 |
NIS |
RRCS |
18.00 |
106.00 |
418.00 |
1111.00 |
1815.20 |
83.90 |
373.50 |
979.50 |
1534.25 |
1930.05 |
RV-SIS |
|
|
|
|
|
9.00 |
20.00 |
45.00 |
171.50 |
893.40 |
|
|
|
|
|
Table 3. The 5%, 25%, 50%, 75%, and 95% quantiles of the minimum model size that includes all active covariates when the covariance matrix is
.
Model (a) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
36.00 |
47.00 |
55.00 |
67.00 |
91.00 |
36.95 |
47.00 |
55.00 |
67.00 |
95.00 |
NIS |
RRCS |
36.00 |
47.00 |
55.00 |
67.00 |
90.10 |
36.00 |
47.00 |
56.00 |
68.00 |
94.00 |
RV-SIS |
|
|
|
|
|
37.95 |
47.00 |
55.00 |
67.00 |
94.00 |
|
|
|
|
|
Model (b) |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
SIS |
DC-SIS |
145.95 |
232.00 |
411.00 |
910.75 |
1979.70 |
122.00 |
196.75 |
322.00 |
651.25 |
1716.40 |
NIS |
RRCS |
78.00 |
119.75 |
180.00 |
267.75 |
919.40 |
150.95 |
273.00 |
449.00 |
1089.00 |
2000.00 |
RV-SIS |
|
|
|
|
|
77.00 |
119.75 |
180.50 |
314.00 |
1164.50 |
|
|
|
|
|
Model (c) |
SIS |
DC-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
151.90 |
245.75 |
417.00 |
806.50 |
1999.05 |
132.90 |
227.00 |
367.50 |
732.25 |
1902.45 |
NIS |
RRCS |
115.95 |
192.75 |
310.50 |
593.25 |
1713.60 |
148.95 |
251.75 |
451.50 |
1013.00 |
2000.00 |
RV-SIS |
|
|
|
|
|
69.00 |
99.00 |
142.00 |
208.00 |
456.05 |
|
|
|
|
|
Model (d) |
SIS |
DC-SIS |
29.00 |
41.00 |
54.00 |
84.00 |
160.05 |
27.95 |
39.00 |
53.00 |
77.00 |
169.10 |
NIS |
RRCS |
28.00 |
39.00 |
52.00 |
78.00 |
164.40 |
27.95 |
39.00 |
53.00 |
82.00 |
181.35 |
RV-SIS |
|
|
|
|
|
24.00 |
31.00 |
40.00 |
54.25 |
101.15 |
|
|
|
|
|
Table 4. The proportion of times each individual active covariate and all active covariates are selected in models of size
,
and
when the covariance matrix is
.
Model (a) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
1.00 |
1.00 |
0.99 |
1.00 |
0.99 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
d2 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
d3 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
1.00 |
0.99 |
|
|
|
|
|
d2 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
|
|
|
|
|
d3 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
|
|
|
|
|
Model (b) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.08 |
0.08 |
1.00 |
1.00 |
0.03 |
0.51 |
0.51 |
1.00 |
1.00 |
0.33 |
0.03 |
0.04 |
1.00 |
1.00 |
0.01 |
d2 |
0.12 |
0.14 |
1.00 |
1.00 |
0.05 |
0.68 |
0.68 |
1.00 |
1.00 |
0.52 |
0.07 |
0.07 |
1.00 |
1.00 |
0.02 |
d3 |
0.16 |
0.17 |
1.00 |
1.00 |
0.06 |
0.76 |
0.78 |
1.00 |
1.00 |
0.65 |
0.09 |
0.10 |
1.00 |
1.00 |
0.02 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.95 |
0.93 |
1.00 |
1.00 |
0.89 |
0.89 |
0.88 |
1.00 |
1.00 |
0.80 |
|
|
|
|
|
d2 |
0.97 |
0.96 |
1.00 |
1.00 |
0.94 |
0.94 |
0.93 |
1.00 |
1.00 |
0.88 |
|
|
|
|
|
d3 |
0.98 |
0.97 |
1.00 |
1.00 |
0.96 |
0.95 |
0.94 |
1.00 |
1.00 |
0.90 |
|
|
|
|
|
Model (c) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.01 |
0.03 |
1.00 |
1.00 |
0.00 |
0.04 |
0.13 |
1.00 |
1.00 |
0.01 |
0.01 |
0.02 |
1.00 |
1.00 |
0.00 |
d2 |
0.03 |
0.05 |
1.00 |
1.00 |
0.00 |
0.08 |
0.26 |
1.00 |
1.00 |
0.04 |
0.04 |
0.03 |
1.00 |
1.00 |
0.00 |
d3 |
0.06 |
0.07 |
1.00 |
1.00 |
0.01 |
0.12 |
0.37 |
1.00 |
1.00 |
0.06 |
0.06 |
0.05 |
1.00 |
1.00 |
0.00 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.04 |
0.59 |
1.00 |
1.00 |
0.03 |
0.94 |
0.42 |
1.00 |
1.00 |
0.40 |
|
|
|
|
|
d2 |
0.09 |
0.68 |
1.00 |
1.00 |
0.07 |
0.97 |
0.54 |
1.00 |
1.00 |
0.53 |
|
|
|
|
|
d3 |
0.12 |
0.74 |
1.00 |
1.00 |
0.10 |
0.99 |
0.60 |
1.00 |
1.00 |
0.59 |
|
|
|
|
|
Model (d) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.05 |
0.10 |
1.00 |
0.99 |
0.02 |
0.04 |
0.11 |
1.00 |
1.00 |
0.01 |
0.04 |
0.04 |
1.00 |
1.00 |
0.01 |
d2 |
0.08 |
0.17 |
1.00 |
1.00 |
0.04 |
0.08 |
0.24 |
1.00 |
1.00 |
0.03 |
0.06 |
0.07 |
1.00 |
1.00 |
0.01 |
d3 |
0.11 |
0.21 |
1.00 |
1.00 |
0.06 |
0.11 |
0.35 |
1.00 |
1.00 |
0.06 |
0.08 |
0.10 |
1.00 |
1.00 |
0.02 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.09 |
0.35 |
1.00 |
0.99 |
0.05 |
0.58 |
0.61 |
1.00 |
0.97 |
0.35 |
|
|
|
|
|
d2 |
0.14 |
0.41 |
1.00 |
1.00 |
0.09 |
0.71 |
0.69 |
1.00 |
0.98 |
0.49 |
|
|
|
|
|
d3 |
0.19 |
0.46 |
1.00 |
1.00 |
0.12 |
0.79 |
0.72 |
1.00 |
0.99 |
0.57 |
|
|
|
|
|
Table 5. The proportion of times each individual active covariate and all active covariates are selected in models of size
,
and
when the covariance matrix is
.
Model (a) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
1.00 |
1.00 |
0.75 |
1.00 |
0.75 |
1.00 |
1.00 |
0.90 |
1.00 |
0.90 |
1.00 |
1.00 |
0.84 |
1.00 |
0.84 |
d2 |
1.00 |
1.00 |
0.85 |
1.00 |
0.85 |
1.00 |
1.00 |
0.95 |
1.00 |
0.95 |
1.00 |
1.00 |
0.91 |
1.00 |
0.92 |
d3 |
1.00 |
1.00 |
0.89 |
1.00 |
0.89 |
1.00 |
1.00 |
0.97 |
1.00 |
0.97 |
1.00 |
1.00 |
0.94 |
1.00 |
0.94 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
1.00 |
1.00 |
0.82 |
1.00 |
0.82 |
1.00 |
1.00 |
0.81 |
1.00 |
0.81 |
|
|
|
|
|
d2 |
1.00 |
1.00 |
0.91 |
1.00 |
0.92 |
1.00 |
1.00 |
0.90 |
1.00 |
0.90 |
|
|
|
|
|
d3 |
1.00 |
1.00 |
0.94 |
1.00 |
0.94 |
1.00 |
1.00 |
0.92 |
1.00 |
0.93 |
|
|
|
|
|
Model (b) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.10 |
0.11 |
0.98 |
1.00 |
0.07 |
0.97 |
0.96 |
1.00 |
1.00 |
0.94 |
0.02 |
0.04 |
1.00 |
1.00 |
0.01 |
d2 |
0.16 |
0.18 |
0.99 |
1.00 |
0.11 |
0.99 |
0.99 |
1.00 |
1.00 |
0.99 |
0.07 |
0.08 |
1.00 |
1.00 |
0.03 |
d3 |
0.20 |
0.23 |
1.00 |
1.00 |
0.15 |
0.99 |
1.00 |
1.00 |
1.00 |
0.99 |
0.11 |
0.12 |
1.00 |
1.00 |
0.05 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
1.00 |
1.00 |
0.99 |
1.00 |
0.98 |
1.00 |
1.00 |
0.96 |
1.00 |
0.96 |
|
|
|
|
|
d2 |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
1.00 |
1.00 |
0.99 |
1.00 |
0.98 |
|
|
|
|
|
d3 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
1.00 |
0.99 |
|
|
|
|
|
Model (c) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.03 |
0.05 |
0.99 |
1.00 |
0.02 |
0.09 |
0.12 |
1.00 |
1.00 |
0.05 |
0.02 |
0.03 |
0.99 |
1.00 |
0.01 |
d2 |
0.07 |
0.09 |
1.00 |
1.00 |
0.05 |
0.17 |
0.24 |
1.00 |
1.00 |
0.11 |
0.05 |
0.07 |
1.00 |
1.00 |
0.03 |
d3 |
0.09 |
0.11 |
1.00 |
1.00 |
0.06 |
0.27 |
0.33 |
1.00 |
1.00 |
0.18 |
0.07 |
0.09 |
1.00 |
1.00 |
0.04 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.16 |
0.55 |
1.00 |
1.00 |
0.14 |
0.99 |
0.43 |
1.00 |
1.00 |
0.43 |
|
|
|
|
|
d2 |
0.29 |
0.68 |
1.00 |
1.00 |
0.26 |
1.00 |
0.54 |
1.00 |
1.00 |
0.55 |
|
|
|
|
|
d3 |
0.38 |
0.74 |
1.00 |
1.00 |
0.35 |
1.00 |
0.62 |
1.00 |
1.00 |
0.62 |
|
|
|
|
|
Model (d) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.09 |
0.20 |
1.00 |
0.86 |
0.06 |
0.07 |
0.20 |
1.00 |
0.99 |
0.06 |
0.03 |
0.09 |
1.00 |
0.96 |
0.02 |
d2 |
0.17 |
0.27 |
1.00 |
0.94 |
0.15 |
0.17 |
0.34 |
1.00 |
0.99 |
0.15 |
0.07 |
0.14 |
1.00 |
0.98 |
0.05 |
d3 |
0.23 |
0.31 |
1.00 |
0.96 |
0.19 |
0.24 |
0.44 |
1.00 |
1.00 |
0.21 |
0.10 |
0.19 |
1.00 |
0.99 |
0.08 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.20 |
0.38 |
1.00 |
0.88 |
0.13 |
0.88 |
0.60 |
1.00 |
0.83 |
0.45 |
|
|
|
|
|
d2 |
0.28 |
0.46 |
1.00 |
0.95 |
0.21 |
0.94 |
0.70 |
1.00 |
0.92 |
0.61 |
|
|
|
|
|
d3 |
0.34 |
0.50 |
1.00 |
0.96 |
0.26 |
0.96 |
0.76 |
1.00 |
0.94 |
0.69 |
|
|
|
|
|
Table 6. The proportion of times each individual active covariate and all active covariates are selected in models of size
,
and
when the covariance matrix is
.
Model (a) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.09 |
0.09 |
0.64 |
1.00 |
0.06 |
0.13 |
0.12 |
0.43 |
1.00 |
0.06 |
0.13 |
0.12 |
0.40 |
1.00 |
0.05 |
d2 |
0.85 |
0.85 |
1.00 |
1.00 |
0.86 |
0.86 |
0.86 |
0.99 |
1.00 |
0.86 |
0.87 |
0.86 |
0.98 |
1.00 |
0.86 |
d3 |
0.99 |
0.99 |
1.00 |
1.00 |
0.99 |
0.98 |
0.98 |
1.00 |
1.00 |
0.99 |
0.99 |
0.99 |
1.00 |
1.00 |
0.99 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.09 |
0.09 |
0.71 |
1.00 |
0.07 |
0.10 |
0.09 |
0.73 |
1.00 |
0.07 |
|
|
|
|
|
d2 |
0.87 |
0.86 |
1.00 |
1.00 |
0.87 |
0.85 |
0.85 |
1.00 |
1.00 |
0.85 |
|
|
|
|
|
d3 |
0.98 |
0.98 |
1.00 |
1.00 |
0.98 |
0.98 |
0.98 |
1.00 |
1.00 |
0.98 |
|
|
|
|
|
Model (b) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.00 |
0.00 |
0.13 |
1.00 |
0.00 |
0.00 |
0.00 |
0.08 |
1.00 |
0.00 |
0.00 |
0.00 |
0.07 |
1.00 |
0.00 |
d2 |
0.00 |
0.00 |
0.92 |
1.00 |
0.00 |
0.00 |
0.00 |
0.86 |
1.00 |
0.00 |
0.00 |
0.00 |
0.84 |
1.00 |
0.00 |
d3 |
0.01 |
0.01 |
1.00 |
1.00 |
0.01 |
0.03 |
0.03 |
0.99 |
1.00 |
0.03 |
0.01 |
0.01 |
0.99 |
1.00 |
0.01 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.00 |
0.00 |
0.27 |
1.00 |
0.00 |
0.00 |
0.00 |
0.31 |
1.00 |
0.00 |
|
|
|
|
|
d2 |
0.04 |
0.04 |
0.96 |
1.00 |
0.04 |
0.04 |
0.04 |
0.97 |
1.00 |
0.04 |
|
|
|
|
|
d3 |
0.22 |
0.21 |
1.00 |
1.00 |
0.23 |
0.21 |
0.21 |
1.00 |
1.00 |
0.22 |
|
|
|
|
|
Model (c) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.00 |
0.00 |
0.15 |
1.00 |
0.00 |
0.00 |
0.00 |
0.07 |
1.00 |
0.00 |
0.00 |
0.00 |
0.07 |
1.00 |
0.00 |
d2 |
0.00 |
0.00 |
0.91 |
1.00 |
0.00 |
0.00 |
0.00 |
0.83 |
1.00 |
0.00 |
0.00 |
0.00 |
0.81 |
1.00 |
0.00 |
d3 |
0.01 |
0.01 |
1.00 |
1.00 |
0.01 |
0.02 |
0.03 |
0.97 |
1.00 |
0.03 |
0.02 |
0.02 |
0.96 |
1.00 |
0.02 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.00 |
0.00 |
0.27 |
1.00 |
0.00 |
0.00 |
0.00 |
0.32 |
1.00 |
0.00 |
|
|
|
|
|
d2 |
0.00 |
0.00 |
0.95 |
1.00 |
0.00 |
0.09 |
0.09 |
0.96 |
1.00 |
0.09 |
|
|
|
|
|
d3 |
0.05 |
0.05 |
1.00 |
1.00 |
0.05 |
0.32 |
0.32 |
1.00 |
1.00 |
0.34 |
|
|
|
|
|
Model (d) |
|
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
X1 |
X2 |
X12 |
X22 |
ALL |
|
SIS |
|
DC-SIS |
|
RRCS |
|
d1 |
0.19 |
0.19 |
1.00 |
1.00 |
0.18 |
0.22 |
0.24 |
1.00 |
1.00 |
0.22 |
0.24 |
0.23 |
1.00 |
1.00 |
0.23 |
d2 |
0.71 |
0.71 |
1.00 |
1.00 |
0.72 |
0.74 |
0.74 |
1.00 |
1.00 |
0.74 |
0.72 |
0.72 |
1.00 |
1.00 |
0.72 |
d3 |
0.88 |
0.88 |
1.00 |
1.00 |
0.88 |
0.87 |
0.87 |
1.00 |
1.00 |
0.88 |
0.86 |
0.86 |
1.00 |
1.00 |
0.86 |
|
NIS |
|
RV-SIS |
|
|
|
d1 |
0.22 |
0.21 |
1.00 |
1.00 |
0.21 |
0.45 |
0.45 |
1.00 |
1.00 |
0.44 |
|
|
|
|
|
d2 |
0.73 |
0.74 |
1.00 |
1.00 |
0.74 |
0.87 |
0.87 |
1.00 |
1.00 |
0.87 |
|
|
|
|
|
d3 |
0.88 |
0.88 |
1.00 |
1.00 |
0.88 |
0.97 |
0.97 |
1.00 |
1.00 |
0.97 |
|
|
|
|
|
Table 7. The comparison of execution time of DC-SIS and RV-SIS in seconds for Model (d) when the covariance matrix is
.
Model (d) |
DC-SIS |
NIS |
RV-SIS |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
5% |
25% |
50% |
75% |
95% |
18.92 |
19.17 |
19.30 |
19.45 |
19.86 |
2.32 |
2.35 |
2.36 |
2.38 |
2.46 |
1.81 |
1.82 |
1.82 |
1.83 |
1.90 |
3.2. Thresholding Simulation
In this section, we use simulations to compare the soft thresholding rule to the hard thresholding approaches for selecting the submodel. We consider following three models relating the response Y to covariates
, where
:
(e)
(f)
(g)
,
where the covariate vector has the p-variate normal distribution with mean zero and covariance
,
, and the coefficients
were randomly generated from the uniform distribution between (1, 2.5), and kept fixed throughout the simulation. From each of these models, we generated 500 data sets of size
.
For the soft thresholding approach, we randomly generate the auxiliary variable
, where the
are independent Unif (0, 1). For the hard thresholding we consider three model sizes:
,
,
. The two approaches are compared in terms of the proportion of each active covariate is selected. We also record the 5%, 25%, 50%, 75% and 95% quantiles of the submodel size using the soft thresholding rule.
The 5%, 25%, 50%, 75% and 95% quantiles of the submodel size using the soft thresholding rule for Models (e), (f) and (g), are presented in Table 9. The proportion that each of the active covariates is selected with the different approaches for Models (e), (f) and (g) are shown in Tables 6-8, respectively.
From Table 9, it is seen that all percentiles decrease as the number of active covariates decreases; this is a nice feature of the soft thresholding approach. Also, for all models, the median submodel size falls between d1 and d2, but is always closer to d1. Regarding the proportion that each active predictor is included in the submodel, Table 6 and Table 7 show that soft thresholding outperforms hard thresholding with d1 in Model (e), but does slightly worse in Model(f); hard thresholding with d2 and d3 outperform soft thresholding. Finally, Table 8 shows that all active predictors were selected 100% of the time by all approaches.
Table 8. The proportion of times each individual active covariate are selected in models of size
and using soft thresholding rule for Model (e).
Model (e) |
Hard Threshold with model size d1 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
0.718 |
0.786 |
0.806 |
0.880 |
0.878 |
0.910 |
0.582 |
0.512 |
0.704 |
0.918 |
X11 |
X12 |
X13 |
X14 |
X15 |
X16 |
X17 |
X18 |
X19 |
X20 |
0.862 |
0.744 |
0.934 |
0.892 |
0.882 |
0.944 |
0.770 |
0.730 |
0.688 |
0.930 |
X21 |
X22 |
X23 |
X24 |
X25 |
|
|
|
|
|
0.960 |
0.948 |
0.974 |
0.888 |
0.388 |
|
|
|
|
|
Hard Threshold with model size d2 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
0.924 |
0.868 |
0.974 |
0.950 |
0.950 |
0.986 |
0.840 |
0.834 |
0.816 |
0.970 |
X11 |
X12 |
X13 |
X14 |
X15 |
X16 |
X17 |
X18 |
X19 |
X20 |
0.884 |
0.858 |
0.922 |
0.894 |
0.890 |
0.918 |
0.796 |
0.792 |
0.754 |
0.920 |
X21 |
X22 |
X23 |
X24 |
X25 |
|
|
|
|
|
0.986 |
0.978 |
0.988 |
0.952 |
0.554 |
|
|
|
|
|
Hard Threshold with model size d3 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
0.884 |
0.920 |
0.928 |
0.958 |
0.952 |
0.968 |
0.776 |
0.750 |
0.862 |
0.970 |
X11 |
X12 |
X13 |
X14 |
X15 |
X16 |
X17 |
X18 |
X19 |
X20 |
0.946 |
0.902 |
0.982 |
0.966 |
0.964 |
0.990 |
0.904 |
0.876 |
0.878 |
0.984 |
X21 |
X22 |
X23 |
X24 |
X25 |
|
|
|
|
|
0.992 |
0.984 |
0.988 |
0.960 |
0.634 |
|
|
|
|
|
Soft Threshold |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
0.746 |
0.816 |
0.844 |
0.898 |
0.894 |
0.920 |
0.650 |
0.602 |
0.742 |
0.928 |
X11 |
X12 |
X13 |
X14 |
X15 |
X16 |
X17 |
X18 |
X19 |
X20 |
0.892 |
0.800 |
0.942 |
0.918 |
0.900 |
0.954 |
0.788 |
0.770 |
0.750 |
0.942 |
X21 |
X22 |
X23 |
X24 |
X25 |
|
|
|
|
|
0.966 |
0.952 |
0.970 |
0.918 |
0.462 |
|
|
|
|
|
Table 9. The proportion of times each individual active covariate are selected in models of size
and using soft thresholding rule for Model (f).
Model (f) |
Hard Threshold with model size d1 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
0.976 |
0.986 |
0.988 |
Hard Threshold with model size d2 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
0.990 |
0.992 |
0.994 |
Hard Threshold with model size d3 |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
0.996 |
0.996 |
1.000 |
Soft Threshold |
X1 |
X2 |
X3 |
X4 |
X5 |
X6 |
X7 |
X8 |
X9 |
X10 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
0.994 |
0.968 |
0.978 |
0.986 |
Table 10. The proportion of times each individual active covariate are selected in models of size
and using soft thresholding rule for Model (g).
Model (g) |
Hard Threshold with model size d1 |
X1 |
X2 |
X3 |
X4 |
X5 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
Hard Threshold with model size d2 |
X1 |
X2 |
X3 |
X4 |
X5 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
Hard Threshold with model size d3 |
X1 |
X2 |
X3 |
X4 |
X5 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
Soft Threshold |
X1 |
X2 |
X3 |
X4 |
X5 |
1.000 |
1.000 |
1.000 |
1.000 |
1.000 |
Table 11. The 5%, 25%, 50%, 75%, and 95% quantiles of submodel size using soft thresholding rule for Models (e), (f), and (g).
Model (e) |
5% |
25% |
50% |
75% |
95% |
20.00 |
37.00 |
53.00 |
75.00 |
116.00 |
Model (f) |
5% |
25% |
50% |
75% |
95% |
13.00 |
26.00 |
43.00 |
66.25 |
109.00 |
Model (g) |
5% |
25% |
50% |
75% |
95% |
8.00 |
19.75 |
38.00 |
62.00 |
100.15 |
3.3. A Real Data Example
Here we apply the DC-SIS, NIS and RV-SIS methods to identify the most influential genes for over-expression of a G protein-coupled receptor (Ro1) in mice in the Cardiomyopathy microarray dataset [10]. In this data set, which has also been used in [4],
and
, with the covariates corresponding to expression levels of different genes. Figure 1 shows the scatterplots of the expression levels of two genes versus Ro1, with fitted cubic spline curves. Because these curves, which are typical for most genes, suggest nonlinear effects, we did not apply SIS to this data.
The top two most influential genes identified by RV-SIS, DC-SIS and NIS are (Msa.2877.0, Msa.741.0), (Msa.2134.0, Msa. 2877.0) and (Msa.2877.0, Msa.1166.0), respectively. To compare the models chosen by the three methods, we fit a semiparametric single index model (SIM).
for
,
where
are the top two variables chosen by RV-SIS, DC-SIS and NIS, respectively, and use the nonparametric coefficient of determination,
Figure 1. The spline curve of Msa.2877.0 and Msa.741.0.
R2; see [11]. The R2-value achieved by RV-SIS, DC-SIS and NIS are 0.927, 0.976 and 0.844, respectively.
The top four most influential genes identified by RV-SIS, DC-SIS and NIS are (Msa.2877.0, Msa.741.0, Msa.1166.0, Msa.26025.0), (Msa.2134.0, Msa. 2877.0, Msa.26025.0, Msa.5583.0) and (Msa.2877.0, Msa.1166.0, Msa.741.0, Msa.18571.0), respectively. Fitting again semiparametric SIMs we obtain R2-values of 0.9995776, 0.9990484 and 0.9290883 for RV-SIS, DC-SIS and NIS, respectively.
It is seen that, though the selected sets of variables are not identical, RV-SIS, DC-SIS have similar behavior in terms of the nonparametric R2 criterion, while NIS does somewhat worse.
Kim et al. [12] analyzed the ovarian cancer data from The Cancer Genome Atlas (TCGA) to identify the important genes for predicting the ovarian cancer. This data consists of 258 subject and 12,042 gene expressions. We apply RV-SIS and NIS procedures to identify the most influential gene expression for predicting ovarian cancer.
The submodel is selected by the soft thresholding for RV-SIS and by the data driven thresholding using permuted Y then use 99.9th quantile value for NIS. The submodel contains 12 covariates from RV-SIS procedure and 9 covariates from the NIS procedure. We used top 12 covariates from the RV-SIS and top 9 and 12 covariates from the NIS to compare the performance. We fit logistic regression, Random Forest, and Klein and Spady’s binary choice estimator (KS) using the submodel selected by RV-SIS and NIS for classification. We record the overall correct classification ratio, specificity, and sensitivity to compare the performance.
Table 12 shows that RV-SIS with Klein and Spady’s binary choice estimator perform the best in overall classification, specificity, and sensitivity.
4. Discussion
In this article, we propose the screening procedure, RV-SIS, in a general nonparametric setting. Using a soft thresholding rule for the size of the submodel, it
Table 12. The overall classification rate, sensitivity, and specificity by NIS, RV-SIS, and random forest.
Model |
Overall Classification |
Specificity |
Sensitivity |
RV-SIS-KS (12) |
0.794 |
0.664 |
0.891 |
NIS-KS (12) |
0.755 |
0.582 |
0.885 |
NIS-KS (9) |
0.720 |
0.582 |
0.824 |
RV-SIS-Logistic (12) |
0.713 |
0.600 |
0.797 |
NIS-Logistic (12) |
0.689 |
0.554 |
0.790 |
NIS-Logistic (9) |
0.682 |
0.563 |
0.770 |
RV-SIS-RF (12) |
0.733 |
0.655 |
0.791 |
NIS-RF (12) |
0.744 |
0.636 |
0.824 |
NIS-RF (9) |
0.713 |
0.655 |
0.757 |
is shown that RV-SIS possesses the sure screening property.
RV-SIS uses the variance of the marginal regression function in order to rank the predictors. Compared to rankings based on a measure of marginal correlation, the advantage of this ranking is that predictors are ranked according to their predictive significance. Simulations suggest that RV-SIS is more efficient in selecting predictors which influence the response in a nonlinear or nonmonotone fashion; on the other hand, RV-SIS will not select covariates that influence other aspects of the conditional distribution of the response, such as the variance function. The execution time for RV-SIS is competitive compared to other nonparametric methods, making RV-SIS a good candidate for applications to ultrahigh-dimensional data.
One issue of practical importance is the choice of the submodel size. Our simulations suggest that soft thresholding has a competitive performance compare to hard thresholding. Moreover, soft thresholding provides an upper bound on the probability of more than r false discoveries. However, thresholding rules do not make a direct link to the false discovery rate. Doing so requires selecting the submodel by suitably determining the cutoff value for the ranking criterion based on its asymptotic distribution. This problem will be addressed in future research.
Similar to other existing screening procedures, RV-SIS relies on a marginal measure between each covariate and the response for ranking of predictors. Due to this, the predictors which are influential jointly but not marginally will not be identified. [13] proposed a process of resuscitation in their partition method for identifying influential predictors that are not identified by marginal observable effects. Resuscitation can also be accomplished by extending the RV-SIS procedure to suitably obtained residuals. This will also be addressed in future research.
Appendix
A1. Some Lemmas
In all that follows,
is a generic notation for any of the marginal densities
. Lemmas 1, 2, 3, and 4 are used to prove the Theorem 2.
Lemma 1. For any random variable X which has a moment generating function
for
,
,
If
, then,
,
Proof. It follows directly from Theorem 5.6.1.A of [14] (2009, pp 201).□
Lemma 2. Suppose
be the kernel density estimator of
. Under conditions (C2) and (C3), and
, we have
Proof. It follows by writing
, using Theorem 5 of [15] with
to get , and
, which follows by a direct calculation.
Lemma 3. Let
be the weight function of the Nadaraya-Watson estimator. Then, under the same assumptions as in Lemma 2, we have
Proof. Noting that
is a symmetric kernel function, by Lemma 2 it is easily seen that
□
Lemma 4. Under condition (C1)-(a), (C2), (C3), and (C4) and any
there exists positive constants
, and
such that,
Proof. By adding and subtracting we have the inequality
Note that the dependence of A and B on i is suppressed for convenience. Consider first A. Letting
, where M will be allowed to tend to ∞ with n, and
, and noting that
, we have the following inequality
Arguing conditionally on
, and using Markov’s inequality and Lemma 1,
Similarly,
Thus, also unconditionally, we have that for each i,
For the
part,
We first show that
is bounded by
for n large enough. By the Cauchy-Schwartz and Markov inequalities, we have
By condition (C1)-(a), there exists a constant t such that
. Also, by Lemma 2,
, uniformly in i. Then, by choosing
, some
, we have
, for n large enough. Hence, for n large enough,
To bound this, note first that
Indeed, if the event on the left hand side holds it must be that
for at least one j since, otherwise
for all j which contradicts
. Thus, by condition (C1)-(a), it follows that
Then by choosing
,
, we have
(9)
Consider now part B. By condition (C4) and for n large enough, we have
(10)
Therefore, by (9) and (10), we have that for all n large enough
It follows that under condition (C1)-(a), (C2), (C3), and (C4), and any
, there exists positive constants
and
such that,
by substituting
for h.
A2. Proof of Theorem 2 for Part 1 Write
where
. For convenience in notation, we will omit the subscript k from
and
,
, for the rest of this proof. For
we have
The following inequalities all follow by Lemma 4 (so that
):
Combining the above we have
(11)
Consider now
, and let
be the kernel of the U-statistic
. For a constant M, we decompose
as
, where
Similarly, we decompose
as
, where
Then we have following inequality
(12)
By Lemma 1 we have that for any
,
(13)
Next, using the representation
, where
is an average of
i.i.d random variables, and denotes the summation over all possible permutations of
(cf. Serfling [14], 1981, pp. 180-181), we have
where Jensen’s inequality was also used. Substituting this in (13) we have,
Therefore, for
given in (12) we have
(14)
Consider now
given in (12). Note first that
for all n sufficient large. Also, by the Cauch-Schwartz and Markov inequalities, we have
so that, by choosing
,
, condition (C1)-(b) yields
for n sufficient large. Thus, for n large enough,
To bound this, observe that
. Thus, by Markov’s inequality and condition (C1)-(b), it follows that
(15)
Combining (12), (14) with
, for
, and (15), we have
(16)
for some positive constants
and
.
By (11), (11) and (16), for
we have
It follows that for
The last equality holds by choosing
for a constant
,
and
.
For part 2 of Theorem 2, if
, then there must exists some
such that
.
. It follows from condition (C5) that
for some
. Thus, . Using part 1 of this theorem we have
□