Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A long time ago (5 years) I wrote a blog post on tapply. Back then I was just getting into programming and I thought the possibilities of tapply were amazing. So it seems, do many others as it’s become one of my most viewed articles.
However, I never use tapply these days because the output is either a named vector or a matrix. Both of these require munging if I’m going to use the output. Three months after I wrote my tapply post a little package called dplyr was released. It took a while before it became integral to my workflow (I like to use as few packages as possible), but now I use it almost daily. The two biggest reasons are:
- A data frame as output
- Readable code.
Now we’re all living in the tidyverse, I’m a bit confused that so many folk still land on my blog looking for tapply. So this post updates/supersedes what I wrote previously. I’ve repeated the toy example I made before:
# Generate an example time series
df = data.frame(date=seq.Date(as.Date("1990-01-01"),
as.Date("2013-12-31"),
by=1))
# Add some data (0s and 1s)
df = df %>%
mutate(snow_lying=sample(c(0, 1), nrow(df), replace=T))
# Get month and year from date
df = df %>%
mutate(month=format(date, "%m"),
year=format(date, "%Y"))
# Sum for each month
df %>%
group_by(month) %>%
summarise(snow_days=sum(snow_lying))
# Sum for each month, each year
df %>%
group_by(year, month) %>%
summarise(snow_days=sum(snow_lying))
Read more about dplyr here: https://dplyr.tidyverse.org/
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
