New cheat-sheet for the dplyrXdf package

August 8, 2016
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Hadley Wickham's dplyr package is an amazing tool for restructuring, filtering, and aggregating data sets using its elegant grammar of data manipulation. By default, it works on in-memory data frames, which means you're limited to the amount of data you can fit into R's memory. Hadley also provided an extension mechanism to make dplyr work with external data sources, and so Hong Ooi created the dplyrXdf package to work with Xdf data files. With dplyrXdf you can manipulate data files of virtually unlimited size using R, and even use the pipe operator %>% from the magrittr package.

To use the dplyrXdf package, you will need to use Microsoft R Client (free download for Windows) or Microsoft R Server (on Windows, Linux, Hadoop or HDInsight with Spark). The Xdf files you create can then be used with the big-data functions of the included ScaleR package, enabling you to use R to perform statistical analysis of files hundreds of gigabytes in size

Dplyrxdf cheat sheet

To help you get started with the dplyrXdf packaghe, Hong has created a new dplyrXdf cheat sheet (pdf). This handy and printable 2-page document explains how dplyrXdf:

  • Extends dplyr framework to large, on-disk data sets
  • Simplifies current interface to xdf functionality
  • Handles the task of file management for the user
  • Is transparent to other xdf-aware functions

It also includes some extended examples of working with big data with dplyrXdf and analyzing them with the ScaleR package. To download the cheat-sheet, click on the link below.

Microsoft Advanced Analytics: dplyrXdf cheat sheet

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)