Quantile Regression in STATA. Few Advantages of the Model with Example.

The traditional linear regression provides the relationship between the regressors and the response variable based on the conditional mean. This type of regression explain the relationship partially because the researcher may not describe the relationship at different points of y. This drawback is nullified by the quantile regression and it uses the conditional median function instead of mean function. As we all know, median is the 50% of the data or a quantile q = 0.5 or the 50th percentile of the data i.e F(yq) = q then yq = F-1(q).

The main task of any regression analysis is to minimize the error term. Unlike in usual regression method, the quantile regression or the median regression or the least absolute deviations (LAD) minimizes the sum of absolute value of the prediction error, i.e  .

One of the main advantage of using quantile regression is that it will take care of the over-dispersion and under-dispersion in the data. That is, it minimizes the error with (1-q)*|ei| for over dispersed data and q*|ei| for the under dispersed data. The other advantages of using median regression is that

  • it is more robust or less sensitive to outliers than OLS estimates
  • No assumptions about the distribution of the parameters.
  • If the errors are non-normal then OLS may be inefficient. But QR is more robust to non – normal data and outliers.
  • It allows to consider the impact of covariates of y
  • It is invariant to monotonic transformations
  • It is also suitable for count data types using Poisson regression.
  • Resistant to outliers.

Like in usual regression model, the quantile regression model can be expressed as

Where yi  is the outcome variable, xi is the explanatory variables, βq is estimated value and Єi is the error term. Then the quantile regression estimator minimizes the following objective function.

The quantile regression uses the linear programming method in contrast to the maximum likelihood as in usual linear regression method.

Text Box: . use mus03data, clear 
. drop if mi(ltotexp)
. su ltotexp ins totchr age female white, sep(0)
Let me illustrate the quantile regression using a medical expenditure data analysis using STATA. Here, the response variable is the total medical expenditure of the people surveyed and the independent variable are the age, gender specifically female, white, insurance, and the health status. There are few missing entries present in the data and are omitted for the analysis.

The following table gives the descriptive summary of the variables under study.

VariableObsMeanStd. Dev.MinMax
ltotexp29558.05981.36751.09811.7409
ins29550.59150.491601
totchr29551.80871.294607
Age295574.24536.37596590
Female29550.58400.492901
White29550.97360.160301

Next, we plot the CDF of the log of the response variable to check whether it is symmetric in nature.

. qplot ltotexp, recast(line) ylab(,angle(0)) /// xlab(0(0.1)1) xline(0.5) xline(0.1) xline(0.9)

The next step is to conduct the median regression with all covariates. In STATA, this can be done using the qreg function.

. qreg ltotexp ins totchr age female white, nolog

The result as follows:

Raw sum of deviations = 3110.961 (about 8.111928)                   Number of obs = 2955

Min sum of deviations 2796.983                                                    Pseudo R2 = 0.1009

ltotexpCoef.Std. Err.tP>|t|[95% Conf. Interval]
ins0.2770.05365.170.000.1718924 .3820617
totchr0.39430.020219.470.000.3545663 .4339664
age0.01490.00413.580.000.0067335 .0229996
female-0.0880.0532-1.660.098-.1924109 .0162175
white0.49870.16313.060.002.1789474 .818544
_cons5.64890.341216.560.0004.979943 6.317838

From the p-value, it is clear that only the covariate female is statistically significant from the QR model.

Next, we will calculate the marginal effects on the response variable for this study.

Text Box: . mat b = e(b)
. qui predict double xb 
. qui gen double expxb = exp(xb) 
. su expxb, mean . mat b = r(mean) * b 
. mat li b, ti("Marginal effects ($) on total medical expenditures")
 b[1,6]: Marginal effects ($) on total medical expenditures
 instotchragefemalewhite_cons
y11037.7551477.204955.701-330.07351868.6521164.8

From this marginal effects, we infer that there is an increase in expenditure by $1477.20 if the expenditure increased by $55.70 per year.

In order to find the efficiency of the QR regression, we now compare it with the usual OLS regression at various quantiles.

  1. . eststo clear
  2. . eststo, ti(“OLS”): qui reg ltotexp ins totchr age female white, robust
  3. (est1 stored)
  4. . foreach q in 0.10 0.25 0.50 0.75 0.90 { 2. eststo, ti(“Q(`q´)”): qui qreg ltotexp ins totchr age female white, q(`q´) nolog 3. }
  5. (est2 stored)
  6. (est3 stored)
  7. (est4 stored)
  8. (est5 stored)
  9. (est6 stored)

The result of the comparative study is tabulated below.

From the above result, it is clear that the insurance have a big effect on total expenditure when the quantiles are less. The median quantile results similar to the OLS. In the same way, we can also identify how each covariates differs in each quantile using STATA. The following graph shows how the covariates differs in each quantile.

In conclusion, Quantile regression provides an alternative to OLS regression based on the conditional median, that is, it identifies the relationship between the variables beyond the mean data points. It is more appropriate when the data are not normally distributed and if it contains the observations that are more influential and it is more robust to outliers.

  1. Buchinsky M. Recent advances in quantile regression models: a practical guideline for empirical research. Journal of Human Resources. 1998;33(1):88–126.
  2. Koenker R. Quantile Regression. Cambridge, UK: Cambridge University Press; 2005.
  3. https://sites.google.com/site/econometricsacademy/econometrics-models/quantile-regression


jjgyou 1156131ghjh hkh21
jluj 484524

This will close in 0 seconds