I fell out with tapply and in love with dplyr

mikerspencer

3 years ago

[This article was first published on R – scottishsnow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A long time ago (5 years) I wrote a blog post on tapply. Back then I was just getting into programming and I thought the possibilities of tapply were amazing. So it seems, do many others as it’s become one of my most viewed articles.

However, I never use tapply these days because the output is either a named vector or a matrix. Both of these require munging if I’m going to use the output. Three months after I wrote my tapply post a little package called dplyr was released. It took a while before it became integral to my workflow (I like to use as few packages as possible), but now I use it almost daily. The two biggest reasons are:

A data frame as output
Readable code.

Now we’re all living in the tidyverse, I’m a bit confused that so many folk still land on my blog looking for tapply. So this post updates/supersedes what I wrote previously. I’ve repeated the toy example I made before:

# Generate an example time series
df = data.frame(date=seq.Date(as.Date("1990-01-01"),
                              as.Date("2013-12-31"),
                              by=1))

# Add some data (0s and 1s)
df = df %>%
   mutate(snow_lying=sample(c(0, 1), nrow(df), replace=T))

# Get month and year from date
df = df %>%
   mutate(month=format(date, "%m"),
          year=format(date, "%Y"))

# Sum for each month
df %>%
   group_by(month) %>%
   summarise(snow_days=sum(snow_lying))

# Sum for each month, each year
df %>%
   group_by(year, month) %>%
   summarise(snow_days=sum(snow_lying))

Read more about dplyr here: https://dplyr.tidyverse.org/

To leave a comment for the author, please follow the link and comment on their blog: R – scottishsnow.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.