Installing dplyr 0.3 on Mac OS X (Mavericks)

[This article was first published on Data Driven Security, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

UPDATE Per the author, a devtools::install_github("hadley/devtools") should take care of everything you need prior to installing the latest dplyr (though I did not have postgres libs installed and suspect that might still be needed).

The R dplyr package just turned 0.3 and to get it working in my development environment (OS X Mavericks) I had to do the following:

  • brew install postgresql (you are using homebrew on Macs, right?)
  • install.packages("DBI", type="source")
  • install.packages("RPostgreSQL", type="source")
  • devtools::install_github("rstudio/rmarkdown")
  • devtools::install_github("hadley/lazyeval")
  • devtools::install_github("hadley/dplyr")

Such is the way of things when living on the cutting edge of the Hadleyverse.

Why go through the trouble of using the newest version of dplyr? Take a look at some of the new capabilities available:

  • between() vector function efficiently determines if numeric values fall
    in a range, and is translated to special form for SQL (#503).

  • count() makes it even easier to do (weighted) counts (#358).

  • data_frame() by @kevinushey is a nicer way of creating data frames.
    It never coerces column types (no more stringsAsFactors = FALSE!),
    never munges column names, and never adds row names. You can use previously
    defined columns to compute new columns (#376).

  • distinct() returns distinct (unique) rows of a tbl (#97). Supply
    additional variables to return the first row for each unique combination
    of variables.

  • Set operations, intersect(), union() and setdiff() now have methods
    for data frames, data tables and SQL database tables (#93). They pass their
    arguments down to the base functions, which will ensure they raise errors if
    you pass in two many arguments.

  • Joins (e.g. left_join(), inner_join(), semi_join(), anti_join())
    now allow you to join on different variables in x and y tables by
    supplying a named vector to by. For example, by = c("a" = "b") joins
    x.a to y.b.

  • n_groups() function tells you how many groups in a tbl. It returns
    1 for ungrouped data. (#477)

  • transmute() works like mutate() but drops all variables that you didn’t
    explicitly refer to (#302).

  • rename() makes it easy to rename variables – it works similarly to
    select() but it preserves columns that you didn’t otherwise touch.

  • slice() allows you to selecting rows by position (#226). It includes
    positive integers, drops negative integers and you can use expression like

Also, the lazyeval package looks pretty interesting.

To leave a comment for the author, please follow the link and comment on their blog: Data Driven Security. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)