if … else and ifelse
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else:
- It’s super fast.
- It’s more convenient to use.
The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector. An example follows below. First, let’s load the {rbenchmark} package to see the speed benefits.
library(rbenchmark)
Now, the toy example: I am creating a vector of half a million random normally distributed values. For each of these values, I want to know whether the value is below or above zero.
x <- rnorm(500000)
ifelse() is used as ifelse(<TEST>, <OUTCOME IF TRUE>, <OUTCOME IF FALSE>), so we need three arguments. My test is x < 0 and I want to have the string "negative" in y whenever the corresponding value in x is smaller than zero. If this is not the case, then y should have a "positive" in this position. ifelse() only needs one line of code for this.
benchmark(replications = 50, {
y <- ifelse(x < 0, "negative", "positive")
})$user.self
## [1] 5.88
We could also solve this with a for loop. But, as you can see, this takes approx. 3 times as long.
benchmark(replications = 50, {
y <- c()
for (i in x) {
if (i < 0) {
y[length(y)+1] <- "negative"
} else {
y[length(y)+1] <- "negative"
}
}
})$user.self
## [1] 16.938
The same is true for an sapply() version. sapply() even consistently takes a little longer than a for loop in this case - to my surprise.
benchmark(replications = 50, {
y <- sapply(x, USE.NAMES = F, FUN = function (i) {
if (i < 0) {
"negative"
} else {
"positive"
}
}
)
})$user.self
## [1] 20.423
It’s highly unlikely that rnorm() produces a value of exactly zero. But we could also check for this by simply nesting calls to ifelse(). If you want to do this, you simply add another ifelse() in the “FALSE” part of the previous ifelse() as I did below. In this little toy example, this nested test is still considerably faster than the for or sapply() versions of the single test.
benchmark(replications = 50, {
y <- ifelse(x < 0, "negative",
ifelse(x > 0, "positive", "exactly zero"))
})$user.self
## [1] 12.197
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.