A Step to the Right in R Assignments

February 4, 2015
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

I received an out-of-band question on the use of %<>% in my CDC FluView post, and took the opportunity to address it in a broader, public fashion.

Anyone using R knows that the two most common methods of assignment are the venerable (and sensible) left arrow <- and it’s lesser cousin =. <- has an evil sibling, <<-, which is used when you want/need to have R search through parent environments for an existing definition of the variable being assigned (up to the global environment).

Since the introduction of the “piping idom”–%>%–made popular by magrittr, dplyr, ggvis and other packages, I have struggled with the use of <- in pipes. Since pipes flow data in a virtual forward motion, that LHS (left hand side) assignment has an awkward characteristic about it. Furthermore, many times you are piping from an object with the intent to replace the contents of said object. For example:

iris$Sepal.Length <- 
  iris$Sepal.Length %>%
  sqrt

(which is from the magrittr documentation).

To avoid the repetition of the left-hand side immediately after the assignment operator, Bache & Wickham came up with the %<>% operator, which shortens the above to:

iris$Sepal.Length %<>% sqrt

Try as I may (including the CDC FluView blog post), that way of assigning variables still feels awkward, and is definitely confusing to new R users. But, what’s the alternative? I believe it’s R’s infrequently used -> RHS assignment operator.

Let’s look at that in the context of the somewhat-long pipe in the CDC FluView example:

dat %>%
  mutate(REGION=factor(REGION,
                       levels=unique(REGION),
                       labels=c("Boston", "New York",
                                "Philadelphia", "Atlanta",
                                "Chicago", "Dallas",
                                "Kansas City", "Denver",
                                "San Francisco", "Seattle"),
                       ordered=TRUE)) %>%
  mutate(season_week=ifelse(WEEK>=40, WEEK-40, WEEK),
         season=ifelse(WEEK<40,
                       sprintf("%d-%d", YEAR-1, YEAR),
                       sprintf("%d-%d", YEAR, YEAR+1))) -> dat

That pipe flow says “take dat, change-up some columns, make some new columns and reassign into dat. It’s a very natural flow and reads well, too, since you’re following a process up to it’s final destination. It’s even more natural in pipes that actually transform the data into something else. For example, to get a vector of the number of US male births since 1880, we’d do:

library(magrittr)
library(rvest)
 
births <- html("http://www.ssa.gov/oact/babynames/numberUSbirths.html")
 
births %>%
  html_nodes("table") %>%
  extract2(2) %>%
  html_table %>%
  use_series(Male) %>%
  gsub(",", "", .) %>%
  as.numeric -> males

That’s very readable (one of the benefits of pipes) and the flow, again, makes sense. Compare that to it’s base R counterpart:

males <- as.numeric(gsub(",", "", html_table(html_nodes(births, "table")[[2]])$Male))

The base R version is short and the LHS assignment fits well as the values “pop out” of the function calls. But, it’s also only initially, quickly readable to veteran R folks. Since code needs to be readable, maintainable and (often times) shared with folks on a team, I believe the pipes help increase overall productivity and aid in documenting what is trying to be achieved in that portion of an analysis (especially when combined with dplyr idioms).

Pipes are here to stay and they are definitely a part of my data analysis workflows. Moving forward, so will RHS (->) assignments from pipes.

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)