Articles by John Mount

Quoting Concatenate

December 16, 2018 | John Mount

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more. My position on code-capturing interfaces (or non-standard-evaluation/NSE) is: if poorly handled, they can be a large interface price/... [Read more...]

Reusable Pipelines in R

December 13, 2018 | John Mount

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr ... [Read more...]

Sharing Modeling Pipelines in R

December 11, 2018 | John Mount

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary ...
[Read more...]

Timing Grouped Mean Calculation in R

December 8, 2018 | John Mount

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So let’s dig in the above a bit. These timings are of the kind of small task large number of ...
[Read more...]

Very Non-Standard Calling in R

December 3, 2018 | John Mount

Our group has done a lot of work with non-standard calling conventions in R. Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we ... [Read more...]

Quoting in R

November 15, 2018 | John Mount

Many R users appear to be big fans of "code capturing" or "non standard evaluation" (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R. The above terms are simply talking about interfaces where a name to be used is captured from the source code the ... [Read more...]

More on Bias Corrected Standard Deviation Estimates

November 14, 2018 | John Mount

This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments. For normal deviates there is, of course, a well know scaling correction that returns an unbiased estimate for observed standard deviations. It (from the same source): … provides an ...
[Read more...]

How to de-Bias Standard Deviation Estimates

November 12, 2018 | John Mount

This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there is a bias, concentrate on why it is not important to remove it for reasonable sized samples, and (despite that) ...
[Read more...]

coalesce with wrapr

November 3, 2018 | John Mount

coalesce is a classic useful SQL operator that picks the first non-NULL value in a sequence of values. We thought we would share a nice version of it for picking non-NA R with convenient operator infix notation wrapr::coalesce(). Here is a short example of it in action: library("wrapr") ... [Read more...]

The blocks and rows theory of data shaping

November 1, 2018 | John Mount

We have our latest note on the theory of data wrangling up here. It discusses the roles of “block records” and “row records” in the cdata data transform tool. With that and the theory of how to design transforms, we think we have a pretty complete description of the system.
[Read more...]

Designing Transforms for Data Reshaping with cdata

October 25, 2018 | John Mount

Authors: John Mount, and Nina Zumel 2018-10-25 As a followup to our previous post, this post goes a bit deeper into reasoning about data transforms using the cdata package. The cdata packages demonstrates the "coordinatized data" theory and includes an implementation of the "fluid data" methodology for general data ...
[Read more...]

Quasiquotation in R via bquote()

October 16, 2018 | John Mount

In August of 2003 Thomas Lumley added bquote() to R 1.8.1. This gave R and R users an explicit Lisp-style quasiquotation capability. bquote() and quasiquotation are actually quite powerful. Professor Thomas Lumley should get, and should continue to receive, a lot of credit and thanks for introducing the concept into R. In ... [Read more...]

Piping into ggplot2

October 13, 2018 | John Mount

In our wrapr pipe RJournal article we used piping into ggplot2 layers/geoms/items as an example. Being able to use the same pipe operator for data processing steps and for ggplot2 layering is a question that comes up from time to time (for example: Why can’t ggplot2 use %__%?). ...
[Read more...]

Some R Guides: tidyverse and data.table Versions

October 10, 2018 | John Mount

Saghir Bashir of ilustat recently shared a nice getting started with R and tidyverse guide. In addition they were generous enough to link to Dirk Eddelbuette’s later adaption of the guide to use data.table. This type of cooperation and user choice is what keeps the R community vital. ...
[Read more...]

Running the Same Task in Python and R

October 8, 2018 | John Mount

According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2018. At the same time more respondents (by rate) used only Python in 2017 than in 2016. Let’s take this as an excuse to take a quick look at what happens when we try a task in ...
[Read more...]

Quick Significance Calculations for A/B Tests in R

October 6, 2018 | John Mount

Introduction Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing. We already share a free video ... [Read more...]

Modeling muti-category Outcomes With vtreat

October 1, 2018 | John Mount

vtreat is a powerful R package for preparing messy real-world data for machine learning. We have further extended the package with a number of features including rquery/rqdatatable integration (allowing vtreat application at scale on Apache Spark or data.table!). In addition vtreat and can now effectively prepare data for ... [Read more...]
1 5 6 7 8 9 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)