Articles by John Mount

replyr: Get a Grip on Big Data in R

March 5, 2017 | John Mount

replyr is an R package that contains extensions, adaptions, and work-arounds to make remote R dplyr data sources (including big data systems such as Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (...
[Read more...]

vtreat: prepare data

March 3, 2017 | John Mount

This article is on preparing data for modeling in R using vtreat. Our example Suppose we wish to work with some data. Our example task is to train a classification model for credit approval using the ranger implementation of the random forests method. We will take our data from John ...
[Read more...]

wrapr: for sweet R code

March 1, 2017 | John Mount

This article is on writing sweet R code using the wrapr package. The problem Consider the following R puzzle. You are given: a data.frame, the name of a column that you wish to find missing values (NA) in, and the name of a column to land the result. For ...
[Read more...]

Iteration and closures in R

February 26, 2017 | John Mount

I recently read an interesting thread on unexpected behavior in R when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity. ...
[Read more...]

The Zero Bug

February 21, 2017 | John Mount

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it. The zero bug Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” ...
[Read more...]

Announcing the wrapr packge for R

February 11, 2017 | John Mount

Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and ...
[Read more...]

Evolving R Tools and Practices

February 5, 2017 | John Mount

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view ...
[Read more...]

Going to Strata / Hadoop World 2017 San Jose?

February 2, 2017 | John Mount

Are you attending or considering attending Strata / Hadoop World 2017 San Jose? Are you interested in learning to use R to work with Spark and h2o? Then please consider signing up for my 3 1/2 hour workshop soon. We are about half full now, but I really want to fill the room, ... [Read more...]

Upcoming Win-Vector LLC public speaking engagements

January 26, 2017 | John Mount

I am happy to announce a couple of exciting upcoming Win-Vector LLC public speaking engagements. BARUG Meetup Tuesday, Tuesday February 7, 2017 ~7:50pm, Intuit, Building 20, 2600 Marine Way, Mountain View, CA. Win-Vector LLC’s John Mount will be giving a “lightning talk” (15 minutes) on R calling conventions (standard versus non-standard) and showing how ... [Read more...]

Upgrading to macOS Sierra (nee OSX) for R users

January 26, 2017 | John Mount

A good fraction of R users use Apple computers. Apple machines historically have sat at a sweet spot of convenience, power, and utility: Convenience: Apple machines are available at retail stores, come with purchasable support, and can run a lot of common commercial software. Power: R packages such as parallel ...
[Read more...]

Why do Decision Trees Work?

January 6, 2017 | John Mount

In this article we will discuss the machine learning method called “decision trees”, moving quickly over the usual “how decision trees work” and spending time on “why decision trees work.” We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning ...
[Read more...]

A Theory of Nested Cross Simulation

January 1, 2017 | John Mount

[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through. However, it should be of interest as it touches on some of the fine points of cross-validation that ...
[Read more...]

Data Preparation, Long Form and tl;dr Form

December 26, 2016 | John Mount

Data preparation and cleaning are some of the most important steps of predictive analytic and data science tasks. They are laborious, where most of the errors are made, your last line of defense against a wild data, and hold the biggest opportunities for outcome improvement. No matter how much time ...
[Read more...]

Does replyr::let work with data.table?

December 24, 2016 | John Mount

I’ve been asked if the adapter “let” from our R package replyr works with data.table. My answer is: it does work. I am not a data.table user so I am not the one to ask if data.table benefits a from a non-standard evaluation to standard evaluation ...
[Read more...]

Comparative examples using replyr:let

December 22, 2016 | John Mount

Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of replyr::let makes such programming easier. Archie’s Mechanics #2 (1954) copyright ...
[Read more...]

help(let, package=’replyr’)

December 17, 2016 | John Mount

A bit more on our replyr R package. library("replyr") help(let, package='replyr') let {replyr} R Documentation Prepare expr for execution with name substitutions specified in alias. Description replyr::let implements a mapping from desired names (names used directly in the expr code) to names used in the data. ... [Read more...]
1 13 14 15 16 17 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)