The trick to understanding NAs (missing values) in R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here's a little puzzle that might shed some light on some apparently confusing behaviour by missing values (NAs) in R:

What is NA^0 in R?

You can get the answer easily by typing at the R command line:

> NA^0
[1] 1

But the interesting question that arises is: why is it 1? Most people might expect that the answer would be NA, like most expressions that include NA. But here's the trick to understanding this outcome: think of NA not as a number, but as a placeholder for a number that exists, but whose value we don't know. 

Now think of all of the numbers that could replace NA in the expression NA^0. Any positive number to the power zero is 1. Same goes for any negative number. Even zero to the power zero is defined by mathematicians to be 1 (for reasons I'm not going to go into here). So that means whatever number you substitute for NA in the expression NA^0, the answer will be 1. And so that's the answer R gives.

There are a few other instances where using the indeterminate NA in an expression can lead to a specific non-NA result. Consider this example:

> NA || TRUE
[1] TRUE

Here. the NA is holding the place of a logical value1, so it could be representing only TRUE or FALSE. But whatever it represents, the answer will be the same:

> TRUE || TRUE
[1] TRUE
> FALSE || TRUE
[1] TRUE

By the same token, any(x) can return TRUE even if the logical vector includes NAs, as long as x includes at least one TRUE value. Similarly, NA && FALSE is always FALSE.

There are a few other examples as well (if you know some, share them in the comments). But always remember: if you're ever confused by the behaviour of NA in R, think about what values it might contain, and if changing them changes the outcome. That might explain what's going on. For more on how R handles NAs, see the R Language Definition.

1Footnote: I'm deliberately ignoring the storage mode of NA, which can come in logical, integer, double and character flavours. In all the examples above, it gets coerced to the type of the other elements in the expression.

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)