Articles by John Mount

Vectorized Block ifelse in R

November 27, 2017 | John Mount

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R. From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure. When porting code from one language ... [Read more...]

Arbitrary Data Transforms Using cdata

November 22, 2017 | John Mount

We have been writing a lot on higher-order data transforms lately: Coordinatized Data: A Fluid Data Specification Data Wrangling at Scale Fluid Data Big Data Transforms. What I want to do now is "write a bit more, so I finally feel I have been concise." The cdata R package supplies ...
[Read more...]

RStudio Keyboard Shortcuts for Pipes

November 18, 2017 | John Mount

I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R. You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).
[Read more...]

Update on coordinatized or fluid data

November 12, 2017 | John Mount

We have just released a major update of the cdata R package to CRAN. If you work with R and data, now is the time to check out the cdata package. Among the changes in the 0.5.* version of cdata package: All coordinatized data or fluid data operations are now in ...
[Read more...]

Let X=X in R

November 3, 2017 | John Mount

Our article "Let’s Have Some Sympathy For The Part-time R User" includes two points: Sometimes you have to write parameterized or re-usable code. The methods for doing this should be easy and legible. The first point feels abstract, until you find yourself wanting to re-use code on new projects. ...
[Read more...]

Big Data Transforms

October 29, 2017 | John Mount

As part of our consulting practice Win-Vector LLC has been helping a few clients stand-up advanced analytics and machine learning stacks using R and substantial data stores (such as relational database variants such as PostgreSQL or big data systems such as Spark). Often we come to a point where we ...
[Read more...]

Some Announcements

October 24, 2017 | John Mount

Some Announcements: Dr. Nina Zumel will be presenting “Myths of Data Science: Things you Should and Should Not Believe”, Sunday, October 29, 2017 10:00 AM to 12:30 PM at the She Talks Data Meetup (Bay Area). ODSC West 2017 is soon. It is our favorite conference and we will be giving both a workshop and … ... [Read more...]

Upcoming data preparation and modeling article series

September 23, 2017 | John Mount

I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, ...
[Read more...]

My advice on dplyr::mutate()

September 22, 2017 | John Mount

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; ...
[Read more...]

It is Needlessly Difficult to Count Rows Using dplyr

September 3, 2017 | John Mount

Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and ...
[Read more...]

Permutation Theory In Action

September 2, 2017 | John Mount

While working on a large client project using Sparklyr and multinomial regression we recently ran into a problem: Apache Spark chooses the order of multinomial regression outcome targets, whereas R users are used to choosing the order of the targets (please see here for some details). So to make things ... [Read more...]

Why to use the replyr R package

August 31, 2017 | John Mount

Recently I noticed that the R package sparklyr had the following odd behavior: suppressPackageStartupMessages(library("dplyr")) library("sparklyr") packageVersion("dplyr") #__ [1] '0.7.2.9000' packageVersion("sparklyr") #__ [1] '0.6.2' packageVersion("dbplyr") #__ [1] '1.1.0.9000' sc * Using Spark: 2.1.0 d [1] NA ncol(d) #__ [1] NA nrow(d) #__ [1] NA … Continue reading Why to use the replyr R package
[Read more...]

Neat New seplyr Feature: String Interpolation

August 28, 2017 | John Mount

The R package seplyr has a neat new feature: the function seplyr::expand_expr() which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of ...
[Read more...]

wrapr: R Code Sweeteners

August 25, 2017 | John Mount

wrapr is an R package that supplies powerful tools for writing and debugging R code. Primary wrapr services include: let() %.__% (dot arrow pipe) := (named map builder) λ() (anonymous function builder) DebugFnW() let() let() allows execution of arbitrary code with substituted variable names (note this is subtly different than binding values for ...
[Read more...]

Some Neat New R Notations

August 22, 2017 | John Mount

The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the ...
[Read more...]

Is dplyr Easily Comprehensible?

August 19, 2017 | John Mount

dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?dplyr makes sense to those of us who use it a lot. And we can teach part time R users a lot of the common good use patterns. But, ...
[Read more...]

Thank You For The Very Nice Comment

August 16, 2017 | John Mount

Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course. Thanks for a wonderful course on DataCamp on XGBoost and Random forest. I was struggling with Xgboost earlier and Vtreat has made my life easy now :). Supervised Learning in ...
[Read more...]

Supervised Learning in R: Regression

August 13, 2017 | John Mount

We are very excited to announce a new (paid) Win-Vector LLC video training course: Supervised Learning in R: Regression now available on DataCamp The course is primarily authored by Dr. Nina Zumel (our chief of course design) with contributions from Dr. John Mount. This course will get you quickly up ...
[Read more...]

More on “The Part-Time R-User”

August 6, 2017 | John Mount

I have some more thoughts on the topic: “the part-time R-user.” I am thinking a bit more about the diversity R users. It occurs to me simply dividing R users into two groups, beginning and advanced, neglects a very important group: the part-time R user. This leaves us teachers and ... [Read more...]
1 10 11 12 13 14 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)