Installing dplyr 0.3 on Mac OS X (Mavericks)

September 25, 2014

(This article was first published on Data Driven Security, and kindly contributed to R-bloggers)

UPDATE Per the author, a devtools::install_github("hadley/devtools") should take care of everything you need prior to installing the latest dplyr (though I did not have postgres libs installed and suspect that might still be needed).

The R dplyr package just turned 0.3 and to get it working in my development environment (OS X Mavericks) I had to do the following:

  • brew install postgresql (you are using homebrew on Macs, right?)
  • install.packages("DBI", type="source")
  • install.packages("RPostgreSQL", type="source")
  • devtools::install_github("rstudio/rmarkdown")
  • devtools::install_github("hadley/lazyeval")
  • devtools::install_github("hadley/dplyr")

Such is the way of things when living on the cutting edge of the Hadleyverse.

Why go through the trouble of using the newest version of dplyr? Take a look at some of the new capabilities available:

  • between() vector function efficiently determines if numeric values fall
    in a range, and is translated to special form for SQL (#503).

  • count() makes it even easier to do (weighted) counts (#358).

  • data_frame() by @kevinushey is a nicer way of creating data frames.
    It never coerces column types (no more stringsAsFactors = FALSE!),
    never munges column names, and never adds row names. You can use previously
    defined columns to compute new columns (#376).

  • distinct() returns distinct (unique) rows of a tbl (#97). Supply
    additional variables to return the first row for each unique combination
    of variables.

  • Set operations, intersect(), union() and setdiff() now have methods
    for data frames, data tables and SQL database tables (#93). They pass their
    arguments down to the base functions, which will ensure they raise errors if
    you pass in two many arguments.

  • Joins (e.g. left_join(), inner_join(), semi_join(), anti_join())
    now allow you to join on different variables in x and y tables by
    supplying a named vector to by. For example, by = c("a" = "b") joins
    x.a to y.b.

  • n_groups() function tells you how many groups in a tbl. It returns
    1 for ungrouped data. (#477)

  • transmute() works like mutate() but drops all variables that you didn’t
    explicitly refer to (#302).

  • rename() makes it easy to rename variables – it works similarly to
    select() but it preserves columns that you didn’t otherwise touch.

  • slice() allows you to selecting rows by position (#226). It includes
    positive integers, drops negative integers and you can use expression like

Also, the lazyeval package looks pretty interesting.

To leave a comment for the author, please follow the link and comment on their blog: Data Driven Security. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)