BAYESIAN QUANTILE REGRESSION WITH SCALE MIXTURE OF UNIFORM PRIOR DISTRIBUTIONS

Abstract: In this paper, we propose a new Bayesian Lasso quantile regression method for variable selection assigning independent scale-mixture of uniform distributions for the regression coefficients. Then, a simple and efficient MCMC algorithm was presented for Bayesian sampler. Simulation studies and a real data set are used to investigate the performance of the proposed method compared to some other existing methods. Both simulated and real data examples show that the proposed method performs quite good compared to the other methods under a variety of scenarios.


Introduction
Quantile regression (QReg) models have become very common since the pioneering research of [16]. It have been employed in many different fields: Microarray study [29], agricultural economics [19], ecological studies [10], body mass index [32,33], growth chart [30], and so on. Compared to the standard mean regression models, QReg models belong to a robust family [17]. Unlike standard mean regression, QReg does not need any assumption about the residual distribution providing greater statistical efficiency than standard mean regression when the error is non-normal. The linear QReg model assumes that the outcome of interest y i can be written as where x ′ i is a 1 × k vector of explanatory variables, β τ is a k × 1 vector of unknown quantities and τ is the quantile level. Here, ε i is the residual term whose density is restricted to have the τ th quantile equal to 0. Similar to the standard mean regression, QReg aims at evaluating the conditional quantiles of the outcome of interest y i given a explanatory vector x i . It can be proved [16] that the QReg coefficients of β τ can be estimated by where ρ τ (s) is the check function defined by ρ τ (s) = s{τ − I(τ < 0)}. For simplicity of notation, we will omit τ from the notation β τ in the remainder of this paper. One significant issue in building a regression model is the selection of the active regressors. The selection process aims to increase the prediction accuracy and to get high interpretation [5]. Nowadays, there has been considerable attention on sparse methods that includes all regressors in the model and uses informative priors to shrink inactive regression coefficients toward zero. See, for example, Lasso [27], the adaptive Lasso [36], dantzig selector [11], matrix completion [12], compressive sensing [7], Lasso QReg [22] and adaptive Lasso QReg [31]. A comprehensive account of these and other recent methods can be found in [28]. Similarly, from a Bayesian framework, [25] proposed Bayesian Lasso for linear regression models by specifying scale mixture of normal (SMN) prior distributions on the regression coefficients and independent exponential prior distributions on their variances. [26] developed Bayesian adaptive Lasso by allowing different shrinkage parameters for different coefficients. Based on the later approaches [21] suggested Bayesian Lasso QReg and [5] proposed Bayesian adaptive Lasso QReg. Some further extensions of the Lasso QReg models have also suggested by [8], [4] and [2], among others. Compared to the frequentist counterparts, the Bayesian models usually offer a valid measure of standard error based on a MCMC. It also offers a convenient method of incorporating regression coefficients uncertainty into predictive inferences. Moreover, the Bayesian formulation offers a flexible way of estimating the penalty parameter along with regression coeffecients.
Our objective is to develop a Bayesian formulation for regularization in linear QReg. Recently, for linear regression, [24] provided a different approach of Lasso-based model by employing the scale mixture of uniform formulation of the Laplace density. The performance of this method was illustrated via simulation studies and a real dataset. [24] show that the new method performs quite well compared to some other existing methods. In this paper, we propose a new formulation for Bayesian lasso QReg by employing the scale mixture of uniform formulation. Then we develop a fully Bayesian treatment that leads to a simple and efficient Gibbs sampling algorithm with tractable full conditional posterior distributions.
The rest of this paper is organized as follows. In Section 2, we briefly review the Bayeian QReg method and propose a hierarchical model based on the scale mixture of uniform distribution. The Gibbs sampler is introduced in Section 3. We illustrate the performance of the proposed method by simulation studies in Section 4 followed by a real data example in Section 5. In Section 6, we conclude the paper.

Bayesian QReg
Within the Bayesian QReg formulation, a popular choice of the error distribution has been skewed Laplace distribution (SLD); see [34]. The probability density function (pdf) of a SLD is where σ is the scale parameter and τ determines the quantile level. The joint distribution of ε = (ε 1 , · · · , ε n ) ′ is Following [18] the check loss function (2) is closely equivalent to (4). In particular, maximizing (4) is equivalent to minimizing (2). The relationship between (2) and (4) can be employed to represent the QReg method in the likelihood framework. Recently [20] provided a Bayesian approach for QReg by reformulated the SLD as SMNs family. More specifically, let ε i ∼ N((1−2τ )m i , 2σ −1 m i ) then the SLD for ε i arises when m i ∼ Exp(τ (1 − τ )σ). Under this motivation, [20] proposed a geometrically ergodic MCMC relying on conditional conjugacy to draw the parameters from the posteriors. Rewriting the SLD for the errors as a SMN and introducing the exponential mixing density result in the following hierarchy: Under the above hierarchical formulation, the posterior distribution of interest p(β τ |σ, m 1 , · · · , m n ) is a multiverate normal.

Bayesian QReg with the Lasso Penalty
The Lasso QReg [22] is given by where λ (λ ≥ 0) is the shrankage parameter. [21] proposed the Bayesian Lasso for linear QReg model by putting a Laplace prior takes the form p(β j |λ, σ) = (σλ/2) exp{−σλ|β j |} on the jth QReg coefficient. More specifically, they putting a scale mixture of normal prior distributions on β τ and exponential prior distributions for the variance parameters assuming they are independent. In this paper, we put a Laplace prior on β j takes the form p(β j |λ) = λ/2 exp{−λ|β j |} and develope an alternative hierarchical Bayesian model of the Lasso model. Following [24] and [3], the Laplace prior on β j can be written as: where u j is a mixing variable. We further put Gamma priors on σ and λ. Using (5) and (7), our Bayesian hierarchical model can be formulated as follows: Here, Exp(τ (1 − τ )σ) refers to the exponential distribution with rate parameter τ (1 − τ )σ.
Here, the pdf of the InvGa is given by [13]: • Updating β j : The full conditional distribution of β j is truncated normal with meanβ j and variance S 2 β j wherē • Updating u j : The full conditional distribution of u j is a left-truncated exponential distribution given by Updating u j can be done by using inversion method [24] as follows: The full conditional distribution of σ is gamma with shape parameter a+3n/2 and rate parameter n i=1 ( • Updating λ: The full conditional distribution of λ is Gamma(c + 2p, d + p j=1 |β j |).

Simulation Studies
In this section, the performance of the proposed model is investigated by simulations. The proposed model is compared with Bayesian Lasso quantile regression reported in [21]. The results of the classical frequentist approache using the R function rq() in the R package quantreg and the results of Bayesian quantile regression using the R function MCMCquantreg in the R package MCMCpack are also reported. We consider three choices of τ , 0.5, 0.75 and 0.95. For each simulation study, we generated the error term ε i from three distributions: a normal distribution N(µ, 9) with µ adjusted so that the τ th quantile is zero, a t (3) distribution with three degrees of freedom, and a χ 2 (3) distribution with three degrees of freedom. For each simulation study and each choice of the error distribution, we run 100 simulations. In each simulation study, our algorithm is run for 11000 iterations and the first 1000 were removed as burn in. Methods are evaluated based on median of mean absolute deviations (MMAD), where MMAD=median(mean(|x ′ iβ − x ′ i β true |)).
The MMADs and the standard deviations (SD) of the MMADs are reported in Table 1. It is readily observed that in all the quantile levels and error distributions under considerations, the proposed method (LassoU) uniformly achieves smaller MMADs and SD, which shows that LassoU estimates more accurately. In general, the two Bayesian regularized QReg methods (LassoU and LassoN) perform better than the other two methods. Compared to the rq results, the Bayesian methods still perform well even when the error distribution  assumption is violated. For convergence diagnosis, we present the multivariate potential scale reduction factor (MPSRF) reported by [9] for one particular run.
From Figure (1), we can observe that the MPSRF becomes stable and close to 1 after about 2000 iterations for each p ∈ {0.50, 0.75, 0.95}, which shows that the convergence of the posterior distribution for the proposed method was quick and the mixing was good.
From Table 2, it can be observed that our proposed method has smallest MMADs in 8 out of 9 comparative experiments, which shows that LassoU es- timates are more accurately than the other methods. Although none of these three error distributions are assumed in the Bayesian QReg methods, the simulation results show that the performance of the Bayesian methods are quite robust to the error distribution assumption.

Simulation 3
For simulation 3, the performance of the proposed approach is illustrate for dense sparse models. Here, we consider the true model where β = (0.00, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85) ′ , including the intercept term. From Table 3, it can be observed that the proposed method has smallest MMADs in 8 out of 9 comparative experiments, which shows again that our estimates are more accurately than the other methods. Most noticeably, when τ = 0.95 the median of mean absolute deviation (MMAD) generated by the proposed method for the three error distributions is much smaller than the median of mean absolute deviation generated by other methods.
Instead of looking at the estimation of MMAD and SD for the Simulation studies 1, 2 and 3, we could also look at the estimation of β in direct way. Since the simulation results would be too many to list in a table, we only choose the case where τ = 0.95 and the error distribution follows N(µ, 9) for illustrations. From Table 4, we see that our method performs well in estimating the regression

Air Pollution Data
In this section, we use QReg methods to analyze the air pollution data [23] previously analyzed in [1]. This dataset was measured by the Public Roads Administration in Norway and consists of 500 observations, 7 covariates and one response variable. The response variable is the log (concentration of NO2 per hour), and the seven covariates are: the log (number of cars per hour) referred to as x 1 , temperature (x 2 ), wind speed in meters per second (x 3 ), the temperature difference (x 4 ), wind direction (x 5 ), time of day in hours (x 6 ), and day number (x 7 ).
Similar to Section 4, in this section we compare four methods: rq, MCMCquantreg, LassoN, and our approach. The methods are evaluted based on the mean squared error (MSE) and 95% intervals for τ ∈ {0.50, 0.75, 0.95}. The results of the MSE are listed in Table 5. From Table 5, in general, we can see that our proposed method tends to produce lower MSE and standard devia-   Table 6, it can be seen that our estimates are very close to the rq estimated and our credible intervals are much narrower than the intervals given by the other three methods. Although, our intervals are narrower than the others, the estimates of rq and MCMCquantreg lie inside our intervals. Hence, the analysis shows strong support for the apply of the proposed method to inference for QReg.

Conclusion and Discussion
In this paper, we propose a new hierarchical model of Bayesian Lasso quantile regression by utilizing the scale mixture of uniform representation of the Laplace density. This representation leads to a simple and efficient Gibbs sampling algorithm for Bayesian Lasso quantile regression with tractable full conditional posterior distributions. The proposed method does not put any distributional assumption on the response of interest, and thus is able to accommodate nonnormal errors, which are popular in many areas. Simulation studies show that the proposed Gibbs sampler is effective in shrinkage and estimation the regression coefficients under a variety of scenarios. Our simulation scenarios also indicate that our method works well even when the true distribution for the error term is not ALD. This phenomenon was also observed by [35], [21], [5], [14], [6] and [15], among others. In addition, the analysis of the air pollution data shows strong support for the use of our proposed method.