The %notin% operator

[This article was first published on woodpeckR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Problem

I keep forgetting how to select all elements of an object except a few, by name. I get the ! operator confused with the - operator and I find both of them less than intuitive to use. How can I negate the %in% operator?

Context

I have a data frame called electrofishing that contains observations from a fish sampling survey. One column, stratum, gives the aquatic habitat type of the sampling site. I’d like to exclude observations sampled in the “Tailwater Zone” or “Impounded-Offshore” areas.

My instinct would be to do this:

electrofishing <- electrofishing[electrofishing$stratum !%in% c("Tailwater Zone", "Impounded-Offshore"),]

But that doesn’t work. You can’t negate the %in% operator directly. Instead, you have to wrap the %in% statement in parentheses and negate the entire statement, returning the opposite of the original boolean vector.

I’m not saying this doesn’t make sense, but I can never remember it. My English-speaking brain would much rather say “rows whose stratum is not included in c(“Tailwater Zone”, “Impounded-Offshore”)” than “not rows whose stratum is included in c(“Tailwater Zone”, “Impounded-Offshore”)”.

Solution

Luckily, it’s pretty easy to negate %in% and create a %notin% operator. I credit this answer to user catastrophic-failure on this stackoverflow question.

`%notin%` <- Negate(`%in%`)

I didn’t even know that the Negate function was a thing. The more you know.

Outcome

I know there are lots of ways to negate selections in R. dplyr has select() and filter() functions that are easier to use with -c(). Or I could just learn to throw a ! in front of my %in% statements. But %notin% seems a little more intuitive.

Now it’s straightforward to select these rows from my data frame.

electrofishing <- electrofishing[electrofishing$stratum %notin% c("Tailwater Zone", "Impounded-Offshore"),]

Resources

https://stackoverflow.com/questions/38351820/negation-of-in-in-r

This one does a good job of explaining why !%in% doesn’t work:
http://r.789695.n4.nabble.com/in-operator-NOT-IN-td3506655.html

To leave a comment for the author, please follow the link and comment on their blog: woodpeckR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)