Articles by John Mount

John Mount speaking on rquery and rqdatatable

July 11, 2018 | John Mount

rquery and rqdatatable are new R packages for data wrangling; either at scale (in databases, or big data systems such as Apache Spark), or in-memory. The speed up both execution (through optimizations) and development (though a good mental model and up-front error checking) for data wrangling tasks. Win-Vector LLC‘s ...
[Read more...]

Speed up your R Work

July 8, 2018 | John Mount

Introduction In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatable, data.table, and dplyr. The methods shown will also work with base-R and other packages. For each of the ...
[Read more...]

seplyr 0.5.8 Now Available on CRAN

July 2, 2018 | John Mount

We are pleased to announce that seplyr version 0.5.8 is now available on CRAN. seplyr is an R package that provides a thin wrapper around elements of the dplyr package and (now with version 0.5.8) the tidyr package. The intent is to give the part time R user the ability to easily ... [Read more...]

wrapr 1.5.0 available on CRAN

June 13, 2018 | John Mount

The R package wrapr 1.5.0 is now available on CRAN. wrapr includes a lot of tools for writing better R code: let() (let block) %.__% (dot arrow pipe) build_frame() / draw_frame() ( data.frame builders and formatters ) qc() (quoting concatenate) := (named map builder) %?% (coalesce) NEW! %.|% (reduce/expand args) NEW! uniques() (safe unique() ... [Read more...]

R Tip: use isTRUE()

June 11, 2018 | John Mount

R Tip: use isTRUE(). A lot of R functions are type unstable, which means they return different types or classes depending on details of their values. For example consider all.equal(), it returns the logical value TRUE when the items being compared are equal: all.equal(1:3, c(1, 2, 3)) # [1] TRUE However, when ... [Read more...]

rqdatatable: rquery Powered by data.table

June 3, 2018 | John Mount

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of ...
[Read more...]

WVPlots now at version 1.0.0 on CRAN!

May 25, 2018 | John Mount

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN! The idea is: ...
[Read more...]

wrapr 1.4.1 now up on CRAN

May 18, 2018 | John Mount

wrapr 1.4.1 is now available on CRAN. wrapr is a really neat R package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features. Please give it a try! wrapr, is an R package that supplies powerful tools for writing and debugging R code. ...
[Read more...]

Ready Made Plots make Work Easier

May 16, 2018 | John Mount

A while back Simon Jackson and Kara Woo shared some great ideas and graphs on grouped bar charts and density plots (link). Win-Vector LLC‘s Nina Zumel just added a graph of this type to the development version of WVPlots. Nina has, as usual, some great documentation here. More and ...
[Read more...]

rquery: SQL from R

May 10, 2018 | John Mount

My BARUG rquery talk went very well, thank you very much to the attendees for being an attentive and generous audience. (John teaching rquery at BARUG, photo credit: Timothy Liu) I am now looking for invitations to give a streamlined version of this talk privately to groups using R who ...
[Read more...]

Upcoming speaking engagments

April 19, 2018 | John Mount

I have a couple of public appearances coming up soon. The East Bay R Language Beginners Group: Preparing Datasets – The Ugly Truth & Some Solutions, Tuesday, May 1, 2018 at Robert Half Technologies, 1999 Harrison Street, Oakland, CA, 94612. Official May 2018 BARUG Meeting: rquery: a Query Generator for Working With SQL Data, Tuesday, … Continue reading ...
[Read more...]

R Tip: Use Slices

April 16, 2018 | John Mount

R tip: use slices. R has a very powerful array slicing ability that allows for some very slick data processing. Suppose we have a data.frame “d“, and for every row where d$n_observations __ 5 we wish to “NA-out” some other columns (mark them as not yet reliably available). Using ...
[Read more...]

cdata Update

April 12, 2018 | John Mount

The R package cdata now has version 0.7.0 available from CRAN. cdata is a data manipulation package that subsumes many higher order data manipulation operations including pivot/un-pivot, spread/gather, or cast/melt. The record to record transforms are specified by drawing a table that expresses the record structure (called the “...
[Read more...]

Neglected R Super Functions

April 11, 2018 | John Mount

R has a lot of under-appreciated super powerful functions. I list a few of our favorites below. Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam. Photo: Dominik Bartsch, CC some rights reserved. stats::approx(): approximate a curve/function. base::cumsum(): cumulative ordered sum. stats::ecdf(): estimate the ...
[Read more...]

magrittr and wrapr Pipes in R, an Examination

April 6, 2018 | John Mount

Let’s consider piping in R both using the magrittr package and using the wrapr package. magrittr pipelines The magittr pipe glyph “%__%” is the most popular piping symbol in R. magrittr documentation describes %__% as follow. Basic piping: x %__% f is equivalent to f(x) x %__% f(y) is equivalent to ... [Read more...]

Four Years of Practical Data Science with R

April 4, 2018 | John Mount

Four years ago today authors Nina Zumel and John Mount received our author’s copies of Practical Data Science with R! It has its imitators, but it remains the best “I have R, now what do I do with it?” book (as it works the user through non-trivial projects, analyses, ...
[Read more...]
1 7 8 9 10 11 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)