1. Introduction
Consider the linear regression model
(1)
where
is the response variable,
is a vector of covariates,
is the error term, and
is the unknown parameter of interest. We focus on adaptive lasso estimators defined as the solution of
(2)
where
,
, and
is
-consistent preliminary estimator of
, with
. Adaptive lasso estimators have been introduced in [1], as refinements of the lasso approach proposed in [2]. Indeed, the adaptive lasso fulfils the oracle properties in the sense introduced in [3]. Note that in the seminal definition of adaptive lasso, the penalization term in Equation (2) is multiplied by a tuning parameter
. In our study, we simply set
, since our asymptotic results do not require that
.
Since the introduction in [2], the statistical properties of lasso estimators, solution of (2) with
, have been analyzed in several studies. As shown in [2], the lasso may combine estimation and variable selection. However, as pointed out in [4] and [5], among others, to achieve correct model selection lasso estimators require certain irrepresentable conditions. In general, the lasso selects more predictive variables than the number of true variables. To overcome this problem, [1] introduces adaptive lasso estimators defined in (2), and shows that they can achieve the oracle properties; see, e.g., [2] [6] and [7]. More precisely, adaptive lasso estimators may combine efficient parameter estimation and correct variable selection in one step. Following this intuition, recently several studies propose penalized estimators with these desirable properties in different context; see, e.g., [8] [9] [10] and [11]. We refer to [12] and [13] for a detailed discussion of lasso estimators.
In this paper, we extend the analysis of the statistical properties of lasso estimators by studying the asymptotic properties of adaptive lasso estimators when some components of the unknown parameter
are strictly different than zero, while other components may be zero or may converge to zero with rate
, with
. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. We derive conditions that allow selecting the tuning parameter
as a function of the
-consistent estimator
, in order to ensure that adaptive lasso estimates of
-coefficients of the unknown parameter
indeed collapse to zero. Furthermore, in this case, we also derive the asymptotic properties of adaptive lasso estimators for nonzero components. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.
2. Asymptotic Properties
Before presenting the main results, we formally introduce the following assumptions.
Assumption 1
1) The parameter
, where
, and
, with
and
.
2) The preliminary estimator
satisfies
, for some
.
Assumption 1 1) implies that some components of the unknown parameter
are strictly different than zero, while other components may be zero or may converge to zero with rate
. Assumption 1 2) simply implies that
is
-consistent estimator of
.
Furthermore, consider also following high-level assumptions.
Assumption 2
1)
converges in probability to a positive definite matrix D, as
.
2)
converges weakly to normal with mean 0 and variance
, as
.
Assumption 2 provides a set of high-level conditions that are typically satisfied in linear regression models (1) when p is fixed and n is large. Currently, we are not interested in analyzing high-dimensional models. We believe that first, it is important to understand the asymptotic properties of adaptive lasso estimators of
-coefficients in standard settings with standard (high-level) assumptions. The extension to high-dimensional linear regression models is left for future research.
To simplify the exposition of our results, we consider the linear regression model (1) with
,
,
,
, and
. We introduce the function
(3)
where
. For
, let
. Then, we have,
(4)
(5)
Note that since
, then
. On the other hand, since
, then when
,
. It turns out that for large n, when
(6)
then, the dominant term in Equation (5) is
. Furthermore, the sign of
fully determines the sign of
. Indeed,
, when
, while
, when
. Therefore,
, where
is the adaptive lasso estimator of
.
In the next step, we derive the asymptotic properties of the adaptive lasso estimator
of
, when
. We consider first the case
. Then, in Equation (4), besides
, also
. Therefore, for large n,
, when,
It turns out that by Assumption 2, for the adaptive lasso estimator
we obtain the usual
-asymptotic normal distribution.
Finally, we consider the case
. Note that in this case, when
, then
. It turns out that in Equation (4), for large n,
, when
(7)
Note that we can rearrange the left term of Equation (7) as follows,
(8)
Consequently, we have,
(9)
Under Assumption 2, Equation (9) establishes
-consistency combined with (biased)
-asymptotic normality for the adaptive lasso estimator
of
.
Obviously, this approach can be easily applied also to the general case
, under Assumption 1. These results suggest the introduction of following Extended Oracle Properties.
Extended Oracle Properties
Consider the linear regression model (1) with Assumption 1. Then, a statistical procedure fulfils the Extended Oracle Properties when:
1) Identifies the (asymptotically) nonzero components
.
2) For
, ensures the optimal
-asymptotic normality for the (asymptotically) nonzero components
.
3) For
, ensures
-consistency combined with (biased)
-asymptotic normality for the (asymptotically) nonzero components
.
From a practical point of view, in Equation (6) we provide conditions that allow to select the tuning exponent
as a function of the preliminary
-consistent estimator, in order to ensure that adaptive lasso estimates of
-components indeed collapse to zero. For instance, for
, we simply have
. These conditions are quite intuitive. Indeed, when
decreases (large neighborhoods of zero), then
increases (stronger penalization). These results may be useful for practitioner that would like to include in their model only the (asymptotically) relevant variables. From a theoretical point of view to the best of our knowledge, we have never seen
-consistency combined with (biased)
-asymptotic normality. Since
is unknown, inference based on these results seems quite challenging. Also modified residual [14] and pairs [15] bootstrap seem not appropriate. Probably, multiplicative bootstrap procedures recently introduced in penalized regression models may help also in this case.
3. Conclusion
In this paper, we extend the analysis of the statistical properties of lasso estimators by studying the asymptotic properties of adaptive lasso estimators when some components of the unknown parameter
are strictly different than zero, while other components may be zero or may converge to zero with rate
, with
. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters of adaptive lasso estimators in order to ensure that adaptive lasso estimates of
-components of the unknown parameter of interest
indeed collapse to zero when
. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for strictly nonzero components. When
, we obtain the usual
-asymptotic normal distribution, while when
, we show
-consistency combined with (biased)
-asymptotic normality for nonzero components. Several Monte Carlo simulations about 1) selection of the tuning exponent
, and 2)
-consistency combined with (biased)
-asymptotic normality confirm the theoretical results and are available from the author. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.