Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Many of the problems we encounter in Econometrics can be formulated as a linear or a quadratic problem. In this post, I want to approach two traditional problems: Quantile Regression and Ordinary Least Squares as convex problems and how to implement them in R using the package RMosek.

There are many convex optimizer solvers available for R, a survey can be found at the CRAN Task View: Optimization and Mathematical Programming. But here, I want to center my attention on Rmosek (Friberg 2014). Rmosek is an interface that makes available Mosek from R and it’s available from CRAN. Mosek is an “optimization software designed to solve large-scale mathematical optimization problems” (Mosek, 2011). Probably one of Mosek’s greatest features is that nonlinear separable convex optimization problems can be easily implemented and solved.

I begin this post by writing the Quantile and Ordinary Least Squares problems in their linear and quadratic forms respectively. Outlined the analytic form, I write functions that implement them in Rmosek. Finally, I test my Rmosek function to benchmark R functions in a simulation exercise, and in a replication exercise. Overall, once the analytic forms of the problems are outline its translation to Rmosek is quite easy and they perform fairly well for moderately large problems.

### Quantile Regression

The Quantile Regression problem solves the following problem

which can be written as a linear program. Let $X$ be a $n \times p$ matrix of regressors with a constant, and $y$ an $n\ \times 1$ vector of the dependent variable. Following Koenker (2005) we can write the quantile problem as a linear problem,

Recall that the primal problem in a linear program can be written as

To translate this problem into Rmosek it is useful to make the following correspondences:

where $e_n$ is an n-vector of ones.

furthermore, note that $x \in\mathbb{R}^{p+2n}$ and that in this case $l^c=u^c=y$.

The next step is then to write our own quantile regression function that wraps Rmosek. I call the function qr.mosek and give it as inputs the design X matrix (that should contain a column of ones if the model has a constant) and the dependent variable. Also, takes a variable verb that controls for the “verbosity” of the output’s log. The function begins by getting the relevant dimensions from the data. Then it defines the problem in Mosek. The problem definition starts with an empty list, which I called lo1. The next step is defining the objective goal, which in this case is minimization ("min"). Then we pass the variables following the above correspondences. The last line solves the problem. When defining the problem Rmosek Users Guide becomes a very handy tool, and I follow it closely.

<span class="n">qr.mosek</span><span class="o"><-</span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="p">,</span><span class="w"> </span><span class="n">verb</span><span class="o">=</span><span class="m">1</span><span class="p">){</span><span class="w">
</span><span class="n">n</span><span class="o"><-</span><span class="nf">length</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1">#number of observations
</span><span class="w">  </span><span class="n">p</span><span class="o"><-</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="c1"># number of parameters of interest
</span><span class="w">
</span><span class="c1">#problem definition in Mosek
</span><span class="w">  </span><span class="n">lo1</span><span class="o"><-</span><span class="nf">list</span><span class="p">()</span><span class="w"> </span><span class="c1">#empty list that contains the LP problem
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">sense</span><span class="o"><-</span><span class="s2">"min"</span><span class="w"> </span><span class="c1">#problem sense </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">c</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="nf">rep</span><span class="p">(</span><span class="n">tau</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="nf">rep</span><span class="p">((</span><span class="m">1</span><span class="o">-</span><span class="n">tau</span><span class="p">),</span><span class="n">n</span><span class="p">))</span><span class="w">  </span><span class="c1">#objective coefficients
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">A</span><span class="o"><-</span><span class="n">Matrix</span><span class="p">(</span><span class="n">cbind</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">diag</span><span class="p">(</span><span class="n">n</span><span class="p">),</span><span class="o">-</span><span class="n">diag</span><span class="p">(</span><span class="n">n</span><span class="p">)),</span><span class="n">sparse</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="c1">#constraint matrix </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">bc</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blc</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="n">y</span><span class="p">),</span><span class="n">buc</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="n">y</span><span class="p">))</span><span class="w"> </span><span class="c1">#lower and upper constraint bounds
</span><span class="w">  </span><span class="n">blx</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="o">-</span><span class="kc">Inf</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#lower parameters bounds
</span><span class="w">  </span><span class="n">bux</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="kc">Inf</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="nf">rep</span><span class="p">(</span><span class="kc">Inf</span><span class="p">,</span><span class="n">n</span><span class="p">),</span><span class="nf">rep</span><span class="p">(</span><span class="kc">Inf</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#upper parameters bounds
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">bx</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blx</span><span class="p">,</span><span class="n">bux</span><span class="p">)</span><span class="w"> </span><span class="c1">#bind the lower and upper paramenter bounds </span><span class="w"> </span><span class="n">r</span><span class="o"><-</span><span class="n">mosek</span><span class="p">(</span><span class="n">lo1</span><span class="p">,</span><span class="w"> </span><span class="n">opts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">verb</span><span class="p">))</span><span class="w"> </span><span class="c1">#call mosek solver </span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span> Mosek uses as default its interior point algorithm and returns an interior point solution and a vertex solution. The primal interior point solution is called itr and can be obtained via $sol$itr$xx. One must be careful when retrieving the solution because we introduced 2n slack variables in the problem. The option verbose on Mosek controls messages priorities, option 1 prints when errors are encountered. Setting it to 0 will silence Mosek completely. The option optimizer can also be specified and allows the user to let Mosek choose the optimizer, or choose an interior point algorithm, or a simplex method on the primal, among others.

The primal form gets “a bit unwieldy” (Koenker and Mizera, 2014) because of these 2n slack variables. However, the problem can be written in the corresponding dual form, which is more convenient for interior point implementations.

A function using Mosek will look like;

<span class="n">qr.mosek.dual</span><span class="o"><-</span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="p">,</span><span class="w"> </span><span class="n">verb</span><span class="o">=</span><span class="m">1</span><span class="p">){</span><span class="w">
</span><span class="n">n</span><span class="o"><-</span><span class="nf">length</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1">#number of observations
</span><span class="w">  </span><span class="n">p</span><span class="o"><-</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="c1"># number of parameters of interest
</span><span class="w">
</span><span class="c1">#problem definition in Mosek
</span><span class="w">  </span><span class="n">lo1</span><span class="o"><-</span><span class="nf">list</span><span class="p">()</span><span class="w">  </span><span class="c1">#empty list that contains the LP problem
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">sense</span><span class="o"><-</span><span class="s2">"max"</span><span class="w"> </span><span class="c1">#problem sense </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">c</span><span class="o"><-</span><span class="n">as.vector</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1">#objective coefficients
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">A</span><span class="o"><-</span><span class="n">Matrix</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">X</span><span class="p">),</span><span class="n">sparse</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="c1">#constraint matrix </span><span class="w"> </span><span class="n">blc</span><span class="o"><-</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="c1">#lower constraint bounds </span><span class="w"> </span><span class="n">buc</span><span class="o"><-</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="c1">#upper constraint bounds </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">bc</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blc</span><span class="p">,</span><span class="n">buc</span><span class="p">)</span><span class="w"> </span><span class="c1">#bind the lower and upper constraint bounds
</span><span class="w">  </span><span class="n">blx</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="n">tau</span><span class="m">-1</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#lower parameters bounds
</span><span class="w">  </span><span class="n">bux</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="n">tau</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#upper parameters bounds
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">bx</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blx</span><span class="p">,</span><span class="n">bux</span><span class="p">)</span><span class="w"> </span><span class="c1">#bind the lower and upper parameter bounds </span><span class="w"> </span><span class="n">r</span><span class="o"><-</span><span class="n">mosek</span><span class="p">(</span><span class="n">lo1</span><span class="p">,</span><span class="w"> </span><span class="n">opts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">verb</span><span class="p">))</span><span class="w"> </span><span class="c1">#call mosek solver </span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span> Equivalently, the may be reparametrized as and its implementation in Rmosek <span class="n">qr.mosek.dual.rep</span><span class="o"><-</span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="p">,</span><span class="w"> </span><span class="n">verb</span><span class="o">=</span><span class="m">1</span><span class="p">){</span><span class="w"> </span><span class="n">n</span><span class="o"><-</span><span class="nf">length</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1">#number of observations </span><span class="w"> </span><span class="n">p</span><span class="o"><-</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="c1"># number of parameters of interest </span><span class="w"> </span><span class="c1">#problem definition in Mosek </span><span class="w"> </span><span class="n">lo1</span><span class="o"><-</span><span class="nf">list</span><span class="p">()</span><span class="w"> </span><span class="c1">#empty list that contains the LP problem </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">sense</span><span class="o"><-</span><span class="s2">"max"</span><span class="w"> </span><span class="c1">#problem sense
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">c</span><span class="o"><-</span><span class="n">as.vector</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1">#objective coefficients </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">A</span><span class="o"><-</span><span class="n">Matrix</span><span class="p">(</span><span class="n">t</span><span class="p">(</span><span class="n">X</span><span class="p">),</span><span class="n">sparse</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">  </span><span class="c1">#constraint matrix
</span><span class="w">  </span><span class="n">e</span><span class="o"><-</span><span class="n">matrix</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="n">ncol</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="n">nrow</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="c1">#vector of ones
</span><span class="w">  </span><span class="n">blc</span><span class="o"><-</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="n">tau</span><span class="p">)</span><span class="o">*</span><span class="nf">c</span><span class="p">(</span><span class="n">crossprod</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">e</span><span class="p">))</span><span class="w"> </span><span class="c1">#lower constraint bounds
</span><span class="w">  </span><span class="n">buc</span><span class="o"><-</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="n">tau</span><span class="p">)</span><span class="o">*</span><span class="nf">c</span><span class="p">(</span><span class="n">crossprod</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">e</span><span class="p">))</span><span class="w"> </span><span class="c1">#upper constraint bounds
</span><span class="w">  </span><span class="n">lo1</span><span class="o">$</span><span class="n">bc</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blc</span><span class="p">,</span><span class="n">buc</span><span class="p">)</span><span class="w"> </span><span class="c1">#bind the lower and upper constraint bounds </span><span class="w"> </span><span class="n">blx</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#lower parameters bounds </span><span class="w"> </span><span class="n">bux</span><span class="o"><-</span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="c1">#upper parameters bounds </span><span class="w"> </span><span class="n">lo1</span><span class="o">$</span><span class="n">bx</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blx</span><span class="p">,</span><span class="n">bux</span><span class="p">)</span><span class="w"> </span><span class="c1">#bind the lower and upper parameter bounds
</span><span class="w">
</span><span class="n">r</span><span class="o"><-</span><span class="n">mosek</span><span class="p">(</span><span class="n">lo1</span><span class="p">,</span><span class="w"> </span><span class="n">opts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">verb</span><span class="p">))</span><span class="w"> </span><span class="c1">#call mosek solver
</span><span class="w">  </span><span class="nf">return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

Again, the solution to the interior point method is accessed via $sol$itr. In these cases, we must be careful because we want to retrieve the solution to the dual variables.

## OLS as a Quadratic Program

Now I turn to our more familiar Ordinary Least Squares. It solves

with a little use of algebra, we can write this problem as a Quadratic Program

recall that a general form of a Quadratic program is
$x = \underset{x \in \mathbb{R}^p}{arg\min}{( \frac{1}{2}xQx-x'c)}$

so, making the appropriate correspondences we have the Ordinary Least Squares problem. Note that the OLS case, is unconstrained, which will become handy when specifying the problem in the solver.

To implement the problem in Rmosek I proceed as before. First, I create a function that wraps the definition of the optimization problem and Mosek’s optimizer. The function is called solve.ols and has the same inputs and options as before. The first half of the function is devoted to setting the correspondences between OLS and the Quadratic Program, the second half we simple follow Rmosek Users Guide to define the minimization problem.

<span class="n">solve.ols</span><span class="o"><-</span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">verb</span><span class="o">=</span><span class="m">1</span><span class="p">){</span><span class="w">
</span><span class="n">p</span><span class="o"><-</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">]</span><span class="w">  </span><span class="c1"># number of parameters of interest
</span><span class="w">
</span><span class="c1">#correspondence between OLS and the Quadratic Program
</span><span class="w">  </span><span class="n">xx</span><span class="o"><-</span><span class="n">crossprod</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="w"> </span><span class="c1"># X'X=Q variable in the Quadratic program
</span><span class="w">  </span><span class="n">c</span><span class="o"><--</span><span class="n">crossprod</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="c1"># X'y=c variable in the Quadratic program
</span><span class="w">  </span><span class="n">xx2</span><span class="o"><-</span><span class="n">xx</span><span class="w">
</span><span class="n">xx2</span><span class="p">[</span><span class="n">upper.tri</span><span class="p">(</span><span class="n">xx</span><span class="p">)]</span><span class="o"><</span><span class="m">-0</span><span class="w"> </span><span class="c1">#mosek needs Q to be  triangular
</span><span class="w">  </span><span class="n">idx</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">which</span><span class="p">(</span><span class="n">xx2</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">arr.ind</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="c1">#index of the nonzero elements of Q
</span><span class="w">
</span><span class="c1">#problem definition in Mosek
</span><span class="w">  </span><span class="n">qo1</span><span class="o"><-</span><span class="nf">list</span><span class="p">()</span><span class="w"> </span><span class="c1">#empty list that contains the QP problem
</span><span class="w">  </span><span class="n">qo1</span><span class="o">$</span><span class="n">sense</span><span class="o"><-</span><span class="s2">"min"</span><span class="w"> </span><span class="c1">#problem sense </span><span class="w"> </span><span class="n">qo1</span><span class="o">$</span><span class="n">c</span><span class="o"><-</span><span class="n">as.vector</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="c1">#objective coefficients
</span><span class="w">  </span><span class="n">qo1</span><span class="o">$</span><span class="n">qobj</span><span class="o"><-</span><span class="nf">list</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">idx</span><span class="p">[,</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">idx</span><span class="p">[,</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xx2</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="c1">#the Q matrix is imputed by the row indexes i, the col indexes j and the values v that define the Q matrix </span><span class="w"> </span><span class="n">qo1</span><span class="o">$</span><span class="n">A</span><span class="o"><-</span><span class="n">Matrix</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="w"> </span><span class="n">nrow</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="n">byrow</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">,</span><span class="n">sparse</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="c1">#constrain matrix A is a null matrix in this case
</span><span class="w">  </span><span class="n">qo1</span><span class="o">$</span><span class="n">bc</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blc</span><span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span><span class="w"> </span><span class="n">buc</span><span class="o">=</span><span class="w"> </span><span class="kc">Inf</span><span class="p">)</span><span class="w"> </span><span class="c1">#constraint bounds </span><span class="w"> </span><span class="n">qo1</span><span class="o">$</span><span class="n">bx</span><span class="o"><-</span><span class="n">rbind</span><span class="p">(</span><span class="n">blx</span><span class="o">=</span><span class="nf">rep</span><span class="p">(</span><span class="o">-</span><span class="kc">Inf</span><span class="p">,</span><span class="n">p</span><span class="p">),</span><span class="w"> </span><span class="n">bux</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="kc">Inf</span><span class="p">,</span><span class="n">p</span><span class="p">))</span><span class="w"> </span><span class="c1">#parameter bounds
</span><span class="w">
</span><span class="n">r</span><span class="o"><-</span><span class="n">mosek</span><span class="p">(</span><span class="n">qo1</span><span class="p">,</span><span class="w"> </span><span class="n">opts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">verb</span><span class="p">))</span><span class="w"> </span><span class="c1">#call mosek solver
</span><span class="w">  </span><span class="nf">return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

## Applications

### Simulation

We are now ready to test our function. I set a mock problem where the independent variable does not have an effect on the conditional mean but the effect is on the variance.

<span class="n">set.seed</span><span class="p">(</span><span class="m">1212</span><span class="p">)</span><span class="w">
</span><span class="n">x</span><span class="o"><-</span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="n">each</span><span class="o">=</span><span class="m">500</span><span class="p">)</span><span class="w">
</span><span class="n">n</span><span class="o"><-</span><span class="nf">length</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">cons</span><span class="o"><-</span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="n">n</span><span class="p">)</span><span class="w">
</span><span class="n">X</span><span class="o"><-</span><span class="n">cbind</span><span class="p">(</span><span class="n">cons</span><span class="p">,</span><span class="n">x</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="o"><-</span><span class="n">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">mean</span><span class="o">=</span><span class="m">0</span><span class="p">,</span><span class="n">sd</span><span class="o">=</span><span class="n">x</span><span class="o">^</span><span class="m">.8</span><span class="p">)</span><span class="w">
</span>

Plotting the simulation clearly shows how that the effect of the independent variable on the mean is zero but the variance increases with it.
r plot(x,y) 

First, once we have installed Rmosek, we call it

<span class="n">require</span><span class="p">(</span><span class="n">Rmosek</span><span class="p">,</span><span class="w"> </span><span class="n">quietly</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w">
</span>

Suppose we want to assess the effect on the median (τ = .5)

<span class="n">tau</span><span class="o"><</span><span class="m">-0.50</span><span class="w">
</span><span class="n">z</span><span class="o"><-</span><span class="n">qr.mosek</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">z</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">xx</span><span class="p">,</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span> [1] -0.039657474 0.008174528 The parameter results are located under $sol$itr$xx. Note that the parameters of interest are only the first p parameters. The solution includes p × 2n results, p parameters and 2n slack variables, but only the first p are of interest. As expected the effect on the median is null.

Suppose now that we want to assess the effect on the 90t**h quantile, where we should expect a positive effect of the independent variable.

<span class="n">tau</span><span class="o"><</span><span class="m">-0.90</span><span class="w">
</span><span class="n">z</span><span class="o"><-</span><span class="n">qr.mosek</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">z</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">xx</span><span class="p">,</span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span> [1] 1.0165950 0.6772688 To compare Mosek performance we choose quantreg package (Koenker, 2016), which is the most used package to solve this type of problems in R. The default solver in the quantreg package is the function rq.fit.br. This function calls algorithm of Koenker and d’Orey (Koenker, 2016). <span class="n">require</span><span class="p">(</span><span class="n">quantreg</span><span class="p">)</span><span class="w"> </span> <span class="n">rq.fit.br</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">)</span><span class="o">$</span><span class="n">coefficients</span><span class="w">
</span>
     cons         x
1.0165950 0.6772688

Results are numerically identical under Mosek and the quantreg package. More interesting however is testing how Mosek solver compares in computation time to the quantreg package default. I test then the primal, the dual, and the reparametrized dual, against the function rq.fit.br

<span class="n">microbenchmark</span><span class="o">::</span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">qr.mosek</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">),</span><span class="w">
</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">),</span><span class="w">
</span><span class="n">qr.mosek.dual.rep</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">),</span><span class="w">
</span><span class="n">rq.fit.br</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span>
Unit: milliseconds
expr         min          lq        mean
qr.mosek(X, y, tau = tau) 2866.959432 3132.532709 3238.346004
qr.mosek.dual(X, y, tau = tau)   93.354619   95.710071   98.888875
qr.mosek.dual.rep(X, y, tau = tau)   93.863603   95.716660   98.375217
rq.fit.br(X, y, tau = tau)    8.280732    8.459964    8.892775
median          uq        max neval
3172.331852 3355.718077 3809.76471   100
96.661428   99.848480  128.10002   100
96.848799   99.052075  120.14091   100
8.582349    8.987302   12.99779   100

It’s clear now how unwieldy the primal problem. It takes considerably more time than the dual formulations or the rq.fit.br function. And the dual formulations in Mosek are considerable slower than the rq.fit.br function.

Next, I examine how the OLS solver works. First, I call the function I constructed before

<span class="n">solve.ols</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">xx</span><span class="w"> </span> [1] 0.020485604 -0.005660287 and I compare it to the traditional lm function in the base package <span class="n">summary</span><span class="p">(</span><span class="n">lm</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">X</span><span class="m">-1</span><span class="p">))</span><span class="o">$</span><span class="n">coef</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="w">
</span>
       Xcons           Xx
0.020485604 -0.005660287

note the −1 in the lm line because the X already includes a constant in X. Once again numerically they are identical. Comparing times, I find that Mosek is quite considerably faster than the base lm. I should test however this with more complex problems.

<span class="n">microbenchmark</span><span class="o">::</span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">lm</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">X</span><span class="m">-1</span><span class="p">),</span><span class="w">
</span><span class="n">solve.ols</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span>
Unit: milliseconds
expr          min       lq      mean      median        uq
lm(y ~ X - 1)   6.063136 9.518182 10.313522   9.809023  10.261577
solve.ols(X, y) 1.007566 1.164319  2.313304   1.426241   1.547576
max      neval
60.70589    100
82.68025    100

### Replication: The Wage Distribution

Next, I turn to an application taken from Mostly Harmless Econometrics (Angrist & Pischke, 2008). The application replicates Angrist, et al (2006) Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure paper. All the relevant files to replicate the paper are publicly available on Angrist’s Data Archive

Files are in Stata format, so I use the foreign package to read the data set corresponding to the 2000 census. The data set is quite large, it contains 97397 observations and 5 variables: the logarithm of wage, education, race, experience and experience squared.

<span class="n">require</span><span class="p">(</span><span class="n">foreign</span><span class="p">)</span><span class="w">
</span>
Loading required package: foreign
<span class="n">dta</span><span class="o"><-</span><span class="n">read.dta</span><span class="p">(</span><span class="w"> </span><span class="s2">"~/Data/census00.dta"</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="o"><-</span><span class="n">dta</span><span class="o">$</span><span class="n">logwk</span><span class="w"> </span><span class="n">X</span><span class="o"><-</span><span class="n">cbind</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="nf">length</span><span class="p">(</span><span class="n">y</span><span class="p">)),</span><span class="n">dta</span><span class="o">$</span><span class="n">educ</span><span class="p">,</span><span class="n">dta</span><span class="o">$</span><span class="n">black</span><span class="p">,</span><span class="n">dta</span><span class="o">$</span><span class="n">exper</span><span class="p">,</span><span class="n">dta</span><span class="o">$</span><span class="n">exper2</span><span class="p">)</span><span class="w"> </span><span class="nf">dim</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="w"> </span> [1] 97397 5 We focus here on the dual problem, and focus only on the returns to schooling for five quantiles: .10, .25, .50, .75 and .90, and also the Mean Estimate using OLS <span class="o">-</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="m">.1</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">suc</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w">
</span>
[1] 0.09090798
<span class="o">-</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="m">.25</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">suc</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span> [1] 0.1033592 <span class="o">-</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="m">.5</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">suc</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w">
</span>
[1] 0.1110223
<span class="o">-</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="m">.75</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">suc</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span> [1] 0.1215066 <span class="o">-</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="m">.9</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">$</span><span class="n">suc</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w">
</span>
[1] 0.1549263
<span class="n">solve.ols</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">)</span><span class="o">$</span><span class="n">sol</span><span class="o">$</span><span class="n">itr</span><span class="o">\$</span><span class="n">xx</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w">
</span>
[1] 0.1148

It interesting how the average return is about 11 but when we focus on the quantiles a different pattern emerges. The mean and median returns are quite similar but larger returns to schooling appear as we move to the upper quantiles. I could not end this incursion into Mosek without comparing computation times for the application.

<span class="n">tau</span><span class="o"><</span><span class="m">-0.5</span><span class="w">
</span><span class="n">microbenchmark</span><span class="o">::</span><span class="n">microbenchmark</span><span class="p">(</span><span class="w">
</span><span class="n">qr.mosek.dual</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">),</span><span class="w">
</span><span class="n">rq.fit.br</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">tau</span><span class="o">=</span><span class="n">tau</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span>
Unit: milliseconds
expr       min        lq      mean    median
qr.mosek.dual(X, y, tau = tau)  420.2002  434.9739  473.0457  456.4372
rq.fit.br(X, y, tau = tau) 4642.0985 4761.6424 5034.9789 5052.8138
uq       max neval
472.2065  889.6079   100
5226.3060 5975.6663   100

Mosek now turns to be considerable faster than rq. Thus, for a moderately large problem and using the dual formulation of the quantile program Mosek performs quite well.

To finish, those interested in convex optimization with R I would recommend reading the paper by Koenker and Mizera (2014) Convex Optimization in R.

Comments and suggestions are always welcomed. You can send them to srmntbr2 at illinois.edu.

Angrist, J., Chernozhukov, V., & Fernández‐Val, I. (2006). Quantile regression under misspecification, with an application to the US wage structure. Econometrica, 74(2), 539-563.

Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics: An empiricist’s companion. Princeton university press.

Friberg HA (2014). Rmosek: The R-to-MOSEK Optimization Interface. R package version 1.2.5.1, URL http://CRAN.R-project.org/package=Rmosek.

Koenker, R. (2005). Quantile regression. No. 38. Cambridge university press.

Koenker, R. (2016). quantreg: QuantileRegression. R package version 5.29. https://CRAN.R-project.org/package=quantreg

Koenker, R., & Hallock, K. (2001). Quantile regression: An introduction. Journal of Economic Perspectives, 15(4), 43-56.

Koenker, R., & Mizera, I (2014). “Convex optimization in R.” Journal of Statistical Software 60, no. 5: 1-23.

MOSEK ApS, Denmark (2011). The MOSEK Optimization Tools Manual. Version 6.0, http://www.mosek.com/.