The oem package for penalized regression is on CRAN

[This article was first published on Jared Huling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The oem package has been on CRAN for some time now, but with the latest update I expect few structural changes to the user interface. oem is a package for the estimation of various penalized regression models using the oem algorithm of Xiong et al. (2016). The focus of oem is to provide high performance computation for big tall data. Many applications not only have a large number of variables, but a vast number of observations; oem is designed to perform well in these settings.

  • Fast computation for big tall data
  • Efficient computation for computation for multiple penalties simultaneously
  • Efficient cross-validation

In this post I’ll give a brief overview of what is in the oem package and how to use it.

Installation

The simplest way to install oem is via the CRAN repositories as the following:

<span class="n">install.packages</span><span class="p">(</span><span class="s2">"oem"</span><span class="p">)</span><span class="w">
</span>

To install the development version, first install the devtools package

<span class="n">install.packages</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">)</span><span class="w">
</span>

and then install oem via the install_github function

<span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"jaredhuling/oem"</span><span class="p">)</span><span class="w">
</span>

Quick Start

First load oem

<span class="n">library</span><span class="p">(</span><span class="n">oem</span><span class="p">)</span><span class="w">
</span>

Simulate data

<span class="n">nobs</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="m">1e4</span><span class="w">
</span><span class="n">nvars</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobs</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nvars</span><span class="p">),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nvars</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">drop</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nvars</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">5</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobs</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
</span>

Fit a penalized regression model using the oem function

<span class="n">fit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">)</span><span class="w">
</span>

Plot the solution path

<span class="n">plot</span><span class="p">(</span><span class="n">fit1</span><span class="p">)</span><span class="w">
</span>

center

Key Features

Available functions

Function Name Functionality
oem() Main fitting function
predict.oemfit() Prediction for oem objects
plot.oemfit() Plotting for oem objects
logLik.oemfit() log Likelihood for oem objects
cv.oem() Cross-validation function
xval.oem() Accelerated cross-validation for linear models
predict.cv.oem() Prediction for cv.oem objects
plot.cv.oem() Plotting for cv.oem objects
logLik.cv.oem() log Likelihood for cv.oem objects

Available Penalties

Penalty Option Name Penalty Form
Lasso lasso $\lambda \sum_{j = 1}^pd_j\vert\beta_j\vert$
Elastic Net elastic.net $\lambda \sum_{j = 1}^p\alpha d_j\vert\beta_j\vert + (1 – \alpha)\lambda \sum_{j = 1}^pd_j\beta_j^2$
MCP mcp $\lambda \sum_{j = 1}^pd_j \int_0^{\beta_j}(1 – x/(\gamma\lambda d_j))_+\mathrm{d}x$
SCAD scad $\sum_{j = 1}^p p^{SCAD}_{\lambda d_j,\gamma}(\beta_j)$
Group Lasso grp.lasso $\lambda \sum_{k = 1}^Gd_k\sqrt{\sum_{j \in g_k}\beta_j^2}$
Group MCP grp.mcp $\sum_{k = 1}^G p_{\lambda d_k,\gamma}^{MCP}\left(\vert \vert \boldsymbol\beta_{g_k}\vert \vert_2\right)$
Group SCAD grp.scad $\sum_{k = 1}^G p_{\lambda d_k,\gamma}^{SCAD}\left(\vert \vert \boldsymbol\beta_{g_k}\vert \vert_2\right)$
Sparse Group Lasso sparse.grp.lasso $\lambda \alpha\sum_{j = 1}^pd_j\vert\beta_j\vert + \lambda (1-\alpha)\sum_{k = 1}^Gd_k\sqrt{\sum_{j \in g_k}\beta_j^2}$

where $\vert\vert\boldsymbol\beta_{g_k}\vert\vert_2 = \sqrt{\sum_{j \in g_k}\beta_j^2}$.

Any penalty with .net at the end of its name has a ridge term of $(1 – \alpha)\lambda \sum_{j = 1}^pd_j\beta_j^2$ added to it and the original penalty multiplied by $\alpha$. For example, grp.mcp.net is the penalty

Available Model Families

The following models are available currently.

Model Option Name Loss Form
Linear Regression gaussian $\frac{1}{2n}\sum_{i=1}^n(y_i – x_i^T\beta) ^ 2$
Logistic Regression binomial $-\frac{1}{n}\sum_{i=1}^n\left[y_i x_i^T\beta – \log (1 + \exp{ x_i^T\beta } ) \right]$

There are plans to include support for multiple responses, binomial models (not just logistic regression), Cox’s proportional hazards model, and more if requested.

Fitting multiple penalties at once

The oem algorithm is well-suited to quickly estimate a solution path for multiple penalties simultaneously if the number of variables is not too large. The oem algorithm is only efficient for multiple penalties for linear models.

For the group lasso penalty, the groups argument must be used. groups should be a vector which indicates the group number for each variable.

<span class="n">fit2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">),</span><span class="w">
            </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">))</span><span class="w">
</span>

Plot the solution paths for all models

<span class="n">layout</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">fit2</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">xvar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">fit2</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">xvar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">fit2</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="n">xvar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">fit2</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">,</span><span class="w"> </span><span class="n">xvar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda"</span><span class="p">)</span><span class="w">
</span>

center

Timing Comparison

The following is a demonstration of oem’s efficiency for computing solutions for tuning parameter paths for multiple
penalties at once.

Linear Regression

The efficiency oem for fitting multiple penalties at once only applies to linear models. However, for linear models it is quite efficient, even for a high number of tuning parameters for many different penalties.

<span class="n">nobs</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="m">1e5</span><span class="w">
</span><span class="n">nvars</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobs</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nvars</span><span class="p">),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nvars</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">drop</span><span class="p">(</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nvars</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">5</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobs</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">

</span><span class="n">system.time</span><span class="p">(</span><span class="n">fit2a</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"grp.lasso"</span><span class="p">),</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">nlambda</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100L</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    0.23    0.02    0.25
<span class="n">system.time</span><span class="p">(</span><span class="n">fit2b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="m">2</span><span class="p">,</span><span class="w"> 
                         </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> 
                                     </span><span class="s2">"scad"</span><span class="p">,</span><span class="w"> </span><span class="s2">"elastic.net"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">,</span><span class="w">
                                     </span><span class="s2">"grp.scad"</span><span class="p">,</span><span class="w"> </span><span class="s2">"sparse.grp.lasso"</span><span class="p">),</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">nlambda</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100L</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    0.28    0.02    0.30
<span class="n">system.time</span><span class="p">(</span><span class="n">fit2c</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="m">2</span><span class="p">,</span><span class="w"> 
                         </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> 
                                     </span><span class="s2">"scad"</span><span class="p">,</span><span class="w"> </span><span class="s2">"elastic.net"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">,</span><span class="w">
                                     </span><span class="s2">"grp.scad"</span><span class="p">,</span><span class="w"> </span><span class="s2">"sparse.grp.lasso"</span><span class="p">),</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">nlambda</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">500L</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    0.41    0.01    0.42

Logistic Regression

It is still more efficient to fit multiple penalties at once instead of individually for logistic regression, but the benefit is not as dramatic as for linear models.

<span class="n">nobs</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="m">5e4</span><span class="w">
</span><span class="n">nvars</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobs</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nvars</span><span class="p">),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nvars</span><span class="p">)</span><span class="w">

</span><span class="n">y</span><span class="m">2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbinom</span><span class="p">(</span><span class="n">nobs</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="n">drop</span><span class="p">(</span><span class="n">x</span><span class="m">2</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.15</span><span class="p">,</span><span class="w"> </span><span class="m">0.15</span><span class="p">,</span><span class="w"> </span><span class="m">-0.15</span><span class="p">,</span><span class="w"> </span><span class="m">-0.15</span><span class="p">,</span><span class="w"> </span><span class="m">0.25</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nvars</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">5</span><span class="p">))))))</span><span class="w">


</span><span class="n">system.time</span><span class="p">(</span><span class="n">fit2a</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"grp.lasso"</span><span class="p">),</span><span class="w">
                         </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">,</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">nlambda</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100L</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    2.15    0.00    2.17
<span class="n">system.time</span><span class="p">(</span><span class="n">fit2b</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> </span><span class="s2">"scad"</span><span class="p">,</span><span class="w"> </span><span class="s2">"elastic.net"</span><span class="p">),</span><span class="w">
                         </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">,</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> </span><span class="n">nlambda</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100L</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##   11.23    0.06   11.42

Cross Validation

Here we use the nfolds argument to specify the number of folds for $k$-fold cross validation

<span class="n">system.time</span><span class="p">(</span><span class="n">cvfit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cv.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> 
                             </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> 
                                         </span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">),</span><span class="w"> 
                             </span><span class="n">gamma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w">
                             </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> 
                             </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    1.72    0.11    1.83

Plot the cross validation mean squared error results for each model

<span class="n">layout</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">cvfit1</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">cvfit1</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">cvfit1</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">cvfit1</span><span class="p">,</span><span class="w"> </span><span class="n">which.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
</span>

center

Extremely Fast Cross Validation for Linear Models

The function xval.oem offers accelerated cross validation for penalized linear models. In many cases is is orders of magnitude faster than cv.oem. It is only recommended for scenarios where the number of observations is larger than the number of variables. In addition to the computational gains in single-core usage, it also benefits from parallelizaton using OpenMP (instead of using foreach, as used by cv.oem). For large enough problems, it has on a similar order of computation time as just fitting one OEM model.

<span class="n">nobsc</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="m">1e5</span><span class="w">
</span><span class="n">nvarsc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">xc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobsc</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nvarsc</span><span class="p">),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nvarsc</span><span class="p">)</span><span class="w">
</span><span class="n">yc</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">drop</span><span class="p">(</span><span class="n">xc</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nvarsc</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">5</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">nobsc</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">

</span><span class="n">system.time</span><span class="p">(</span><span class="n">cvalfit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cv.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yc</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> 
                               </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> 
                               </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    6.67    0.77    7.56
<span class="n">system.time</span><span class="p">(</span><span class="n">xvalfit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">xval.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yc</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w">
                                 </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> 
                                 </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    0.99    0.05    1.05
<span class="n">system.time</span><span class="p">(</span><span class="n">xvalfit2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">xval.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yc</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w">
                                 </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w"> 
                                 </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">ncores</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">))</span><span class="w">
</span>
##    user  system elapsed 
##    1.39    0.04    0.94
<span class="n">system.time</span><span class="p">(</span><span class="n">ofit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yc</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lasso"</span><span class="p">,</span><span class="w">
                         </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)))</span><span class="w">
</span>
##    user  system elapsed 
##    0.20    0.05    0.25

Evaluation Metrics

A variety of evaluation metrics can be used for cross validation. The available metrics can be found in the table below

Model Metric type.measure=
Linear Regression Mean squared error mse or deviance
  Mean absolute error mae
Logistic Regression Deviance deviance
  Area under the ROC curve auc
  Misclassification Rate class
  Mean squared error of fitted mean mse
  Mean absolute error of fitted mean mae

Consider a binary outcome setting with logistic regression.

<span class="n">nobs</span><span class="w">  </span><span class="o"><-</span><span class="w"> </span><span class="m">2e3</span><span class="w">
</span><span class="n">nvars</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">20</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="n">runif</span><span class="p">(</span><span class="n">nobs</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nvars</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nvars</span><span class="p">)</span><span class="w">

</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbinom</span><span class="p">(</span><span class="n">nobs</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">prob</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="n">drop</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.25</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">-1</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.5</span><span class="p">,</span><span class="w"> </span><span class="m">-0.25</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nvars</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">6</span><span class="p">))))))</span><span class="w">
</span>

Misclassification Rate

<span class="n">cvfit2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cv.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> 
                                           </span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">),</span><span class="w"> 
                 </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">,</span><span class="w">
                 </span><span class="n">type.measure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"class"</span><span class="p">,</span><span class="w">
                 </span><span class="n">gamma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w">
                 </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> 
                 </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">standardize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span>

center

In this case, misclassification rate is not the best indicator of performance. The classes here are imbalanced:

<span class="n">mean</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="w">
</span>
## [1] 0.0685

Area Under the ROC Curve

Area under the ROC curve is an alternative classification metric to misclassification rate. It is available by setting type.measure = "auc".

<span class="n">cvfit2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cv.oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mcp"</span><span class="p">,</span><span class="w"> 
                                           </span><span class="s2">"grp.lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.mcp"</span><span class="p">),</span><span class="w"> 
                 </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">,</span><span class="w">
                 </span><span class="n">type.measure</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"auc"</span><span class="p">,</span><span class="w">
                 </span><span class="n">gamma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w">
                 </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> 
                 </span><span class="n">nfolds</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">standardize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span>

center

Methods for Very Large Scale Problems

OEM with Precomputed $X^TX$, $X^TY$ for Linear Models

With a very large dataset and computing cluster, the total size of a dataset may be very large, but if the number of variables is only moderately large (on the order of a few thousands) $X^TX$ and $X^TY$ may not be large and may already be available from other computations or can be computed trivially in parallel. The function oem.xtx computes penalized linear regression models using the OEM algorithm only using $X^TX$ and $X^TY$. Standardization can be achieved by providing a vector of scaling factors (usually the standard deviations of the columns of x). The function is used like the following:

<span class="n">xtx</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossprod</span><span class="p">(</span><span class="n">xc</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">xc</span><span class="p">)</span><span class="w">
</span><span class="n">xty</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">crossprod</span><span class="p">(</span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">yc</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">xc</span><span class="p">)</span><span class="w">


</span><span class="n">system.time</span><span class="p">(</span><span class="n">fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yc</span><span class="p">,</span><span class="w"> 
                       </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.lasso"</span><span class="p">),</span><span class="w"> 
                       </span><span class="n">standardize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">intercept</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
                       </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">)))</span><span class="w">
</span>
##    user  system elapsed 
##    0.22    0.04    0.25
<span class="n">system.time</span><span class="p">(</span><span class="n">fit.xtx</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem.xtx</span><span class="p">(</span><span class="n">xtx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xtx</span><span class="p">,</span><span class="w"> </span><span class="n">xty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xty</span><span class="p">,</span><span class="w"> 
                               </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.lasso"</span><span class="p">),</span><span class="w"> 
                               </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">))</span><span class="w">  </span><span class="p">)</span><span class="w">  
</span>
##    user  system elapsed 
##    0.01    0.00    0.02
<span class="nf">max</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">fit</span><span class="o">$</span><span class="n">beta</span><span class="p">[[</span><span class="m">1</span><span class="p">]][</span><span class="m">-1</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">fit.xtx</span><span class="o">$</span><span class="n">beta</span><span class="p">[[</span><span class="m">1</span><span class="p">]]))</span><span class="w">
</span>
## [1] 1.454392e-14
<span class="nf">max</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">fit</span><span class="o">$</span><span class="n">beta</span><span class="p">[[</span><span class="m">2</span><span class="p">]][</span><span class="m">-1</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">fit.xtx</span><span class="o">$</span><span class="n">beta</span><span class="p">[[</span><span class="m">2</span><span class="p">]]))</span><span class="w"> 
</span>
## [1] 1.454392e-14
<span class="n">col.std</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">xc</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">sd</span><span class="p">)</span><span class="w">
</span><span class="n">fit.xtx.s</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">oem.xtx</span><span class="p">(</span><span class="n">xtx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xtx</span><span class="p">,</span><span class="w"> </span><span class="n">xty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">xty</span><span class="p">,</span><span class="w"> 
                     </span><span class="n">scale.factor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">col.std</span><span class="p">,</span><span class="w">
                     </span><span class="n">penalty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"lasso"</span><span class="p">,</span><span class="w"> </span><span class="s2">"grp.lasso"</span><span class="p">),</span><span class="w"> 
                     </span><span class="n">groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">))</span><span class="w">  
</span>

Out-of-memory Computation

The OEM package also provides functionality for on-disk computation with the big.oem function, allowing for fitting penalized regression models on datasets too large to fit in memory. The big.oem function uses the tools provided by the bigmemory package, so a big.matrix object must be used for the design matrix.

<span class="n">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span><span class="w">
</span><span class="n">nrows</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">50000</span><span class="w">
</span><span class="n">ncols</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">100</span><span class="w">
</span><span class="n">bkFile</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"bigmat.bk"</span><span class="w">
</span><span class="n">descFile</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"bigmatk.desc"</span><span class="w">
</span><span class="n">bigmat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">filebacked.big.matrix</span><span class="p">(</span><span class="n">nrow</span><span class="o">=</span><span class="n">nrows</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="o">=</span><span class="n">ncols</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="o">=</span><span class="s2">"double"</span><span class="p">,</span><span class="w">
                                </span><span class="n">backingfile</span><span class="o">=</span><span class="n">bkFile</span><span class="p">,</span><span class="w"> </span><span class="n">backingpath</span><span class="o">=</span><span class="s2">"."</span><span class="p">,</span><span class="w">
                                </span><span class="n">descriptorfile</span><span class="o">=</span><span class="n">descFile</span><span class="p">,</span><span class="w">
                ...

To leave a comment for the author, please follow the link and comment on their blog: Jared Huling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)