The difference between “letters[c(1,NA)]” and “letters[c(NA,NA)]“

[This article was first published on R-statistics blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:

Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

So I did, and here’s the output:

> letters[c(12,NA)]
[1] "l" NA 
>  letters[c(NA,NA)] 
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>

Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:

My example with ‘letters’ comes from a collision of three features:

  1. recycling of short subscripts
  2. silent coercion of types (boolean NA to numeric NA)
  3. and the existence of five different NA values that all print the same.

[…] to really understand that letters[c(1,NA)] is different from letters[c(NA,NA)] you have to see that:

  • in the first case, the NA is coerced to a numeric NA because it’s in a vector with a numeric ‘1′.
  • in the first case, you are selecting elements by supplying a vector of indexes
  • in the second case, your NAs are boolean (logical) NA values
  • hence your subscript is a logical vector
  • logical vectors are recycled
  • now your subscript is a vector of TRUE/FALSE values (which are all NA) of the same length as ‘letters’.

To make sure I understood Barry correctly, I tried the following code:

>  letters[c(T,NA)] 
 [1] "a" NA  "c" NA  "e" NA  "g" NA  "i" NA  "k" NA  "m" NA  "o" NA  "q" NA  "s" NA  "u" NA  "w" NA  "y" NA
Barry gave this example to illustrate how R violates the Zen idea if: “Simple is better than complex”.  Since (so he claims), subscript recycling is shooting you in the foot.
To follow up on that discussion, head over to Barry’s comment on the REvolution blog

To leave a comment for the author, please follow the link and comment on their blog: R-statistics blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)