if … else and ifelse

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else:

  1. It’s super fast.
  2. It’s more convenient to use.

The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector. An example follows below. First, let’s load the {rbenchmark} package to see the speed benefits.

library(rbenchmark)

Now, the toy example: I am creating a vector of half a million random normally distributed values. For each of these values, I want to know whether the value is below or above zero.

x <- rnorm(500000)

ifelse() is used as ifelse(<TEST>, <OUTCOME IF TRUE>, <OUTCOME IF FALSE>), so we need three arguments. My test is x < 0 and I want to have the string "negative" in y whenever the corresponding value in x is smaller than zero. If this is not the case, then y should have a "positive" in this position. ifelse() only needs one line of code for this.

benchmark(replications = 50, {
  y <- ifelse(x < 0, "negative", "positive")
})$user.self
## [1] 5.88

We could also solve this with a for loop. But, as you can see, this takes approx. 3 times as long.

benchmark(replications = 50, {
  y <- c()
  for (i in x) {
    if (i < 0) { 
      y[length(y)+1] <- "negative"
    } else { 
      y[length(y)+1] <- "negative"
    }
  }
  })$user.self
## [1] 16.938

The same is true for an sapply() version. sapply() even consistently takes a little longer than a for loop in this case - to my surprise.

benchmark(replications = 50, {
  y <- sapply(x, USE.NAMES = F, FUN = function (i) {
    if (i < 0) {
      "negative"
    } else {
      "positive"
    }
  }
  )
})$user.self
## [1] 20.423

It’s highly unlikely that rnorm() produces a value of exactly zero. But we could also check for this by simply nesting calls to ifelse(). If you want to do this, you simply add another ifelse() in the “FALSE” part of the previous ifelse() as I did below. In this little toy example, this nested test is still considerably faster than the for or sapply() versions of the single test.

benchmark(replications = 50, {
  y <- ifelse(x < 0, "negative",
              ifelse(x > 0, "positive", "exactly zero"))
})$user.self
## [1] 12.197

To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)