Two common mistakes with the colon operator in R

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R has a colon operator which makes it really easy to define a sequence of integers. For example, the code 1:10 generates a vector of consisting of the integers from 1 to 10 (inclusive). However, using the colon operator is not without its pitfalls! I will highlight two common mistakes here.

First, imagine that you have a variable n which has value 5. What do you think the following code prints out?

for (i in 1:n+1) print(i)

My first instinct is that it should print out the numbers 1, 2, …, 6 (inclusive), with one number on each line. Wrong! Instead, this is the output we get:

[1] 2 [1] 3 [1] 4 [1] 5 [1] 6

What is going on here? The problem here is one of operator precedence. Just like how \times and \div come before + and -, in R : comes before +. Hence, the code written above is interpreted as

for (i in (1:n)+1) print(i)

which is why the numbers 2 to 6 are printed out instead of the numbers 1 to 5. If we want to print the numbers 1 to n+1 inclusive, put brackets to enforce the correct order for evaluation:

for (i in 1:(n+1)) print(i)

Let’s move on to the second common mistake. Let’s say I have a vector vec and I want to print its elements one by one. The first instinct of most of us would be to write something like this:

for (i in 1:length(vec)) print(vec[i])

This works most of the time, but not all the time. Consider what happens when vec is an empty vector:

vec <- c()
for (i in 1:length(vec)) print(vec[i])

NULL NULL

What happened here? The problem is that the colon operator can return a descending sequence of integers! In the code above, length(vec) has value 0, so 1:length(vec) is the same as c(1, 0). It prints out vec[1] and vec[0], which are both NULL.

To avoid this problem, use the seq_along function instead:

for (i in seq_along(vec)) print(vec[i])

You may think that this is not really a big problem; after all, it only fails when we have an empty vector right? Well, there are 2 responses to that. First, you don’t want your code to ever do anything unintended. In this case the mistake was easy to catch; in some cases this mistake might be 3 levels deep in your code which is thousands of lines long— not so easy to catch anymore! The second response is that this mistake will crop up more easily when you don’t start from the first element of the vector.

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)