A Speed Comparison Between Flexible Linear Regression Alternatives in R

Posted on March 25, 2015 by Rasmus Bååth in R bloggers | 0 Comments

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Everybody loves speed comparisons! Is R faster than Python? Is dplyr faster than data.table? Is STAN faster than JAGS? It has been said that speed comparisons are utterly meaningless, and in general I agree, especially when you are comparing apples and oranges which is what I’m going to do here. I’m going to compare a couple of alternatives to lm(), that can be used to run linear regressions in R, but that are more general than lm(). One reason for doing this was to see how much performance you’d loose if you would use one of these tools to run a linear regression (even if you could have used lm()). But as speed comparisons are utterly meaningless, my main reason for blogging about this is just to highlight a couple of tools you can use when you grown out of lm(). The speed comparison was just to lure you in. Let’s run!

The Contenders

Below are the seven different methods that I’m going to compare by using each method to run the same linear regression. If you are just interested in the speed comparisons, just scroll to the bottom of the post. And if you are actually interested in running standard linear regressions as fast as possible in R, then Dirk Eddelbuettel has a nice post that covers just that.

`lm()`

This is the baseline, the “default” method for running linear regressions in R. If we have a data.frame d with the following layout:

head(d)

## y x1 x2 ## 1 -64.579 -1.8088 -1.9685 ## 2 -19.907 -1.3988 -0.2482 ## 3 -4.971 0.8366 -0.5930 ## 4 19.425 1.3621 0.4180 ## 5 -1.124 -0.7355 0.4770 ## 6 -12.123 -0.9050 -0.1259

library(bbmle) inits <- list(log_sigma = rnorm(1), intercept = rnorm(1), beta1 = rnorm(1), beta2 = rnorm(1)) mle2(y ~ dnorm(mean = intercept + x1 * beta1 + x2 * beta2, sd = exp(log_sigma)), start = inits, data = d)

log_like_fn <- function(par, d) { sigma <- exp(par[1]) intercept <- par[2] beta1 <- par[3] beta2 <- par[4] mu <- intercept + d$x1 * beta1 + d$x2 * beta2 sum(dnorm(d$y, mu, sigma, log=TRUE)) } inits <- rnorm(4) optim(par = inits, fn = log_like_fn, control = list(fnscale = -1), d = d)

An Utterly Meaningless Speed Comparison

So, just for fun, here is the speed comparison, first for running a linear regression with 1000 data points and 5 predictors:

This should be taken with a huge heap of salt (which is not too good for your health!). While all these methods produce a result equivalent to a linear regression they do it in different ways, and not necessary in equally good ways, for example, my homemade optim routine is not converging correctly when trying to fit a model with too many predictors. As I have used the standard settings there is surely a multitude of ways in which any of these methods can be made faster. Anyway, here is what happens if we vary the number of predictors and the number of data points:

To make these speed comparisons I used the microbenchmark package, the full script replicating the plots above can be found here. This speed comparison was made on my laptop running R version 3.1.2, on 32 bit Ubuntu 12.04, with an average amount of RAM and a processor that is starting to get a bit tired.

To leave a comment for the author, please follow the link and comment on their blog: Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

A Speed Comparison Between Flexible Linear Regression Alternatives in R

The Contenders

`lm()`

`glm()`

`bayesglm()`

`nls()`

`mle2()`

`optim()`

Stan’s `optimizing()`

An Utterly Meaningless Speed Comparison

Related

The Contenders

lm()

glm()

bayesglm()

nls()

mle2()

optim()

Stan’s optimizing()

An Utterly Meaningless Speed Comparison

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

`lm()`

`glm()`

`bayesglm()`

`nls()`

`mle2()`

`optim()`

Stan’s `optimizing()`

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)