Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
UPDATE Per the author, a
devtools::install_github("hadley/devtools")should take care of everything you need prior to installing the latestdplyr(though I did not have postgres libs installed and suspect that might still be needed).
The R dplyr package just turned 0.3 and to get it working in my development environment (OS X Mavericks) I had to do the following:
brew install postgresql(you are using homebrew on Macs, right?)install.packages("DBI", type="source")install.packages("RPostgreSQL", type="source")devtools::install_github("rstudio/rmarkdown")devtools::install_github("hadley/lazyeval")devtools::install_github("hadley/dplyr")
Such is the way of things when living on the cutting edge of the Hadleyverse.
Why go through the trouble of using the newest version of dplyr? Take a look at some of the new capabilities available:
-
between()vector function efficiently determines if numeric values fall in a range, and is translated to special form for SQL (#503). -
count()makes it even easier to do (weighted) counts (#358). -
data_frame()by @kevinushey is a nicer way of creating data frames. It never coerces column types (no morestringsAsFactors = FALSE!), never munges column names, and never adds row names. You can use previously defined columns to compute new columns (#376). -
distinct()returns distinct (unique) rows of a tbl (#97). Supply additional variables to return the first row for each unique combination of variables. -
Set operations,
intersect(),union()andsetdiff()now have methods for data frames, data tables and SQL database tables (#93). They pass their arguments down to the base functions, which will ensure they raise errors if you pass in two many arguments. -
Joins (e.g.
left_join(),inner_join(),semi_join(),anti_join()) now allow you to join on different variables inxandytables by supplying a named vector toby. For example,by = c("a" = "b")joinsx.atoy.b. -
n_groups()function tells you how many groups in a tbl. It returns 1 for ungrouped data. (#477) -
transmute()works likemutate()but drops all variables that you didn’t explicitly refer to (#302). -
rename()makes it easy to rename variables – it works similarly toselect()but it preserves columns that you didn’t otherwise touch. -
slice()allows you to selecting rows by position (#226). It includes positive integers, drops negative integers and you can use expression liken().
Also, the lazyeval package looks pretty interesting.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
