Bayesian Regularized Quantile Regression Analysis Based on Asymmetric Laplace Distribution ()
1. Introduction
Since the pioneering work by Koenker and Bassett in 1978, quantile regression (QR) has been deeply studied and widely applied to descript the elaborate relationship between the dependent variable and predictors [1]. Compared with the traditional mean regression, quantile regression has more robustness to data with outliers. In 1999, Koenker and Machado connected the asymmetric Laplace distribution (ALD) to QR model and defined a goodness-of-fit criterion for quantile regression, which is the natural analog of R2 statistic of least squares regression [2]. In 2001, Yu and Moyeed first proposed a Bayesian quantile regression model that the error follows an asymmetric Laplace ( AL) distribution, and proved the maximization of likelihood-based inference used independently distributed asymmetric Laplace densities was equivalent to the minimization of the loss function [3]. In 2010, Hewson and Yu suggested quantile regression model for binary data within the Bayesian framework [4]. In 2011, Reich et al. introduced Bayesian spatial quantile regression model [5]. In 2013, Sriram et al. showed that the misspecified likelihood in the ALD approach still leads to consistent results [6]. In 2009, Kozumi and Kobayashi built a more efficient Gibbs sampler for fitted the quantile regression model based on a location-scale mixture of the asymmetric Laplace distribution to draw samples from the posterior distribution [7]. In 2012, Khare and Hobert proved that this new sampling algorithm converges at a geometric rate [8]. In 2015, Sriram proposed a correction to the MCMC iterations to construct asymptotically valid intervals [9].
In 2004, Koenker added the Lasso regularization method to the mixed-effect quantile regression model for the first time, and the Lasso penalty made the random effect shrink to zero [10]. In 2007, Wang et al. considered the least absolute deviance (LAD) estimate with adaptive Lasso penalty (LAD-lasso) and proved its oracle property [11]. In 2008, Li and Zhu considered quantile regression with the Lasso penalty and developed its piecewise linear solution path [12]. In 2009, Wu and Liu studied the quantile regression with the SCAD method and the adaptive Lasso method [13]. In 2008, Park and Casella studied the Lasso penalty from the Bayesian angle, and proposed that the hierarchical model can be effectively solved by the Gibbs sampler, thereby introducing the regularization method [14]. In 2010, Li et al. studied the regularization method in quantile regression from the perspective of Bayesian and proposed to set the prior distribution of parameters to Laplace prior, and use Gibbs sampler to sampling Bayesian Lasso quantile regression [15]. In 2012, Alhamzawi et al. proposed Bayesian adaptive Lasso quantile regression, by setting different penalty parameters for different variables, and setting the penalty parameter to inverse gamma distribution, and the inverse gamma priori Parameters are treated as unknowns and estimated along with other parameters [16]. In 2018, Adlouni et al. showed that a regularized quantile regression model with B-Splines based on five penalties (Lasso, Ridge, SCAD0, SCAD1 and SCAD2) in Bayesian framework [17].
Based on the existing literature, the Bayesian quantile regression is realized by expressing the asymmetric Laplace distribution as scale mixtures of the standard normal distribution and the standard exponential distribution, and the Gibbs sampler is used to simulate the distributed parameters. The regularized quantile regression under the Bayesian framework is compared with the non-Bayesian regularized quantile regression method. Finally, the prostate cancer data sets are used to illustrate the advantages and disadvantages of these two approaches.
2. Methods
2.1. Quantile Regression
Given data
, with covariate vector
and
is the response variable. The
quantile regression model for the response
given
takes the form of
(1)
where
is the inverse cumulative distribution function and
is the unknown coefficients vector that is dependent on the quantile
,
.
The regression parameter
can be estimated by minimizating the following objective function
(2)
where
is the loss function and
denotes the indicator function.
In 2001, Yu and Moyeed [3] argued that the minimization problem in equation (2) is equivalent to maximizing the likelihood function of
by assuming
’s are random variables from a skewed Laplace distribution with
and
. The density function of a skewed Laplace distribution is given by
(3)
where,
is the scale parameter,
is the location parameter and
is the asymmetrc parameter.
Then the likelihood function of the sample
can be expressed as
(4)
Tsionas [18], Kozumi and Kobayashi [7] have demonstrated that ALD can be viewed as a mixture of standard an exponential distribution,
and a standard normal distribution
. Assume that z and
represent a standard exponential distribution and a standard normal distribution, respectively.
For
,
, if
, random variable
, Therefore, it can be known that the independent variable
of the quantile regression is equivalent to
(5)
2.2. Bayesian Quantile Regression with Lasso and Adaptive Lasso Penalty
The Bayesian quantile regression parameter estimation model with Lasso penalty (Li and Zhu) [12] is
(6)
Li et al. [15] set the prior distribution of the parameter
to a Laplace prior
. It is also assumed that the error term
obeys the asymmetric Laplace distribution (ALD).
Bayesian quantile regression with adaptive Lasso penalty (BQR-AL) is based on different penalty parameters are applied to different regression coefficients. Therefore, the parameter estimation model of BQR-AL is
(7)
Alhamzawi and Yu [16] proposed different penalization parameters for different regression coefficients, and the prior of the penalty parameters is set to the inverse gamma distribution, and treat the hyperparameters of the inverse gamma prior as unknowns. Thus, the Laplace prior on
by
(8)
Andrews and Mallows [19] mentioned that
(9)
Let
, (8) can be written as
(10)
also equivalent to
(11)
so there are
(12)
The prior distribution of
is set to the inverse gamma prior, so the distribution density function of
is
(13)
where
and
are two hyperparameters. Yi and Xu [20] pointed out that the values of these two hyperparameters determine the degree of compression of the variables in (13), because small
large
will lead to greater compression, so in order to avoid the impact of special values on the estimation of regression coefficients, we consider
and
as unknown parameters.
In summary, the Bayesian quantile regression hierarchical model with adaptive Lasso penalty is
(14)
2.3. Gibbs Sampling
From the hierarchical model, the joint posterior density function of each parameter is
where
The full condition posterior distribution of each parameter is
(15)
Here
(16)
Since the full condition posterior distribution
of
does not have a closed form, it is a logarithmic convex function. Gilks [21] proposed using the adaptive rejection sampling algorithm to sample this distribution.
3. Simulation Studies
Based on the MCMC algorithm of Gibbs sampling, Bayesian estimation is carried out on the model. The simulation studies used to compare the regularized quantile regression under the Bayesian and the non-Bayesian framework. These methods include Bayesian quantile regression with adaptive Lasso penalty (BQR-AL), Bayesian quantile regression with Lasso penalty (BQR-L), quantile regression with Lasso penalty (QR-L), quantile regression with SCAD penalty (QR-SCAD) and quantile regression (QR).
3.1. Independent and Identically Distributed Random Errors
Here, we follow the same simulation strategy introduced by Li, Xi and Lin [15] in the simulation studies 1, 2 and 3 with different parameter values for the error distributions.
We consider a linear model
where
’s have the
quantile equal to zero.
For i.i.d. random errors, this paper will consider the following four forms of simulation
Simulation 1:
,
Simulation 2:
,
Simulation 3:
,
Simulation 4:
.
In the first three simulation studies, the rows of
is generated in a multivariate normal distribution
with
. In Simulation 4, we first generate
and
from
, then let
,
,
,
,
, where
, where
,
.
In each simulation, we consider the error distributions in our simulation follows
1) A normal distribution
with the
quantile equal to zero.
2) A Laplace distribution
with the
quantile equal to zero.
3) A t distribution with three degrees of freedom,
.
4) A
distribution with three degrees of freedom,
.
5) A asymmetric Laplace distribution
with the
quantile equal to zero.
The number of observations in one simulated sample is n = 200. The simulation is repeated 50 times for each error distribution. The evaluation index is the median mean absolute deviation (MMAD), i.e.
The quantile regression model for the quantile
is estimated separately. The simulation results are shown in Figures 1-4.
Figure 1 shows that under the condition of simulation 1, that is, sparse coefficients, the MMD values of BQR-AL and BQR-L are lower than those of the frequency school method, in which the MMAD value satisfying the error term compliance distribution
is smaller.
Figure 2 corresponds to the dense coefficient in simulation 2, and it can be seen that the MMAD values of BQR-AL and BQR-L are very small and close to each other.
Figure 3 corresponds to the case where the coefficient of the simulation 3 is sparse, and the same conclusion as the simulation 1 can still be obtained. And the effect of Bayesian regularized quantile regression is more obvious.
For simulation 4, since the number of variables is larger than the sample size, the design matrix is a singular matrix, so the QR and QR-SCAD methods cannot be run in this simulation, and the other methods can still operate normally. This also proves the advantages of the regularization method. The results are shown in Figure 4.
Figure 1. The panels represent the MMADs in simulation 1.
Figure 2. The panels represent the MMADs in simulation 2.
Figure 3. The panels represent the MMADs in simulation 3.
Figure 4. The panels represent the MMADs in simulation 4.
Figures 1-4 show that, in terms of the MMAD, BQR-AL and BQR-L method performs better than the other regularized quantile regression method. The results of the MMAD for simulation 1 - 4 are reported in Figures 1-4. From these simulation, we can learn the following results:
1) In the above simulation, the MMADs of BQR-AL and BQR-L tend to give lower MMAD compared with the other regularized quantile regression under non-Bayesian for all distributions under considerations. It is shown that the stability and repeatability of the Bayesian regularized quantile regression are better.
2) In the case of sparse and very sparse regression coefficient, the MMAD value of BQR-AL is the smallest. In the case of dense regression coefficient, the MMAD value of BQR-L is smaller. Moreover, the estimation effect of the two methods is similar.
3) The BQR-AL and BQR-L methods can achieve good results under all error term distributions. It is shown that the regularized Bayesian quantile regression method is robust to the assumption of the error term, and the two methods are satisfactory even if the error term deviates from ALD.
4) No matter what the distribution of the original data is, when the error distribution is ALD, the regularized quantile regression method under Bayesian framework has high accuracy, especially the BQR-AL method, and the estimated value of its parameters is the closest to the real value.
In addition to observing the MMADs of each method, this paper can also observe the estimation of its parameters. Due to the limited space, in this paper, the parameter estimation results of error obeying ALD distribution in simulation 1 are simulated:
It can be seen from the parameter estimates in Table 1, the QR class method generally gives less biased parameter estimates, but this does not guarantee a good quantile prediction, as implied by the MMADs in Figure 4.
3.2. Non-i.i.d. Random Errors
Consider the following model when the error term is subject to a non-i.i.d.
where
,
,
, where
and
. The remaining five noise variables
are generated from the independent standard normal distribution. The results are shown in Table 2 and are based on 50 repetitions, each with sample size n = 200.
It can be known from Table 2 that the BQR-L method have smaller MMAD values, indicating that the regularized quantile regression method under the Bayesian framework is also superior to the regularized quantile regression method under the non-Bayesian framework when the error obeys the non-i.i.d..
3.3. Prostate Cancer Data Set
This section mainly analyzes prostate cancer data in the “bayesQR” package [22]. The dataset was first proposed by Stamey et al. [23], including a medical
Table 1. Parameter estimates for simulation 1 in the case of i.i.d.
Table 2. MMADs of each method in the case of non-i.i.d.
record of 97 male patients undergoing radical prostatectomy, containing the level of prostate antigen y (lpsa) and eight influencing factors. These influencing factors were: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log of capsular penetration (lcp), Gleason score (gleason) and percentage of Gleason score 4 or 5 (pgg 45). As with the numerical simulation of the second part, still consider
here.
In Table 3, we will compare three methods: QR, BQR-L, and BQR-AL. The QR method will use the “quantreg” package in R and the default rank method to get the confidence interval. Here, the 95% interval is considered and parameter estimation is performed. For the Bayesian method, the MCMC algorithm is used to perform 5000 simulations on the posterior distribution by default, and the
first 1000 samples are discarded. The results are shown in Table 3.
It can be seen from Table 3 that the parameter estimates of the BQR-L and BQR-AL methods are very close to the classical quantile regression and the credible intervals of the BQR-L and BQR-AL methods are narrower than the QR, in which the BQR-AL interval to be more accurate than BQR-L, from this point of view, the Bayesian quantile regression method with penalty is estimated to be more accurate than the non-Bayesian regularized quantile regression method.
Of course, we can also intuitively understand by drawing images. In this section, for a more intuitive observation, the estimated values of the various methods for
are plotted, and similar results are obtained for other quantiles. In order to be intuitive, the estimated values of each method will be translated, and the image is drawn as shown in Figure 5.
Table 3. Parameter estimation and 95% interval of prostate cancer data.
Figure 5. Regression estimates under five different methods and 95% credible intervals for BQR-AL and BQR-L.
It can be clearly seen from Figure 5 that the quantile regression with Bayesian can provide a credible interval, while the non-Bayesian quantile regression is not always available and the credible interval of BQR-AL is narrower than BQR-L. The Bayesian quantile regression with penalty is estimated to be more accurate than the non-Bayesian regularized quantile regression. The effect of BQR-AL is better.
4. Conclusion
Bayesian quantile regression with adaptive Lasso penalty is an extension and improvement of the Lasso method. Adaptive Lasso penalty is based on different penalty parameters are applied to different regression coefficients. This method can effectively eliminate the influence of noise variables and obtain more accurate parameter estimation. Through the Gibbs sampling algorithm, this paper systematically compares the regularized quantile regression under the non-Bayesian and Bayesian framework, and finds that when the error term obeys the independent identically distributed or heteroscedasticity distribution, both BQR-AL and BQR-L have higher accuracy and are superior to non-Bayesian methods. When the error obeys ALD, the BQR-AL method has the highest accuracy for the MMAD under the same quantile, and its parameter estimate is also the closest to the true value in general. In the real data set, we can also find the same conclusion. Therefore, we can say that the Bayesian penalty regression method can get a good effect under the condition that the coefficient is sparse or dense, and it can be described in full aspect at different quantile points, and it will occupy a very important position in the future high-dimensional data analysis.
Funding
This work was supported by the National Natural Science Foundation of China [grant numbers 61763008, 71762008]; Guangxi Science and Technology Plan Project [grant numbers 2018GXNSFAA294131, 2018GXNSFAA050005, 2016GXNSFAA380194].