Articles by John Mount

Running the Same Task in Python and R

October 8, 2018 | John Mount

According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2018. At the same time more respondents (by rate) used only Python in 2017 than in 2016. Let’s take this as an excuse to take a quick look at what happens when we try a task in ...
[Read more...]

Quick Significance Calculations for A/B Tests in R

October 6, 2018 | John Mount

Introduction Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing. We already share a free video ... [Read more...]

Modeling muti-category Outcomes With vtreat

October 1, 2018 | John Mount

vtreat is a powerful R package for preparing messy real-world data for machine learning. We have further extended the package with a number of features including rquery/rqdatatable integration (allowing vtreat application at scale on Apache Spark or data.table!). In addition vtreat and can now effectively prepare data for ... [Read more...]

A Better Example of the Confused By The Environment Issue

September 25, 2018 | John Mount

Our interference from then environment issue was a bit subtle. But there are variations that can be a bit more insidious. Please consider the following. library("dplyr") # unrelated value that happens # to be in our environment z % select(-z) # x y # … Continue reading A Better Example of the Confused By ... [Read more...]

A Subtle Flaw in Some Popular R NSE Interfaces

September 23, 2018 | John Mount

It is no great secret: I like value oriented interfaces that preserve referential transparency. It is the side of the public debate I take in R programming. "One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish ... [Read more...]

Timing Column Indexing in R

September 21, 2018 | John Mount

I’ve ended up (almost accidentally) collecting a number of different solutions to the “use a column to choose values from other columns in R” problem. Please read on for a brief benchmark comparing these methods/solutions. What we did is: build a 1,000,000 row variation of the original example. In ...
[Read more...]

Using a Column as a Column Index

September 20, 2018 | John Mount

We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions. Let’s say we ... [Read more...]

Parameterizing with bquote

September 16, 2018 | John Mount

One thing that is sure to get lost in my long note on macros in R is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke. Let’s try ... [Read more...]

On “Competition” in the R Ecosystem

September 15, 2018 | John Mount

I’ve been thinking a bit on “competition” in the R ecosystem. I guess the closest I can come to a fair and coherent view on “competition” in the R ecosystem is some variation of the following. I, of course, should not be treating things as a competition. We are ... [Read more...]

Better R Code with wrapr Dot Arrow

September 15, 2018 | John Mount

Our R package wrapr supplies a "piping operator" that we feel is a real improvement in R code piped-style coding. The idea is: with wrapr‘s "dot arrow" pipe "%.__%" the expression "A %.__% B" is treated very much like "{. % B(.)" as a … Continue reading Better R Code with wrapr Dot Arrow [Read more...]

Announcing wrapr 1.6.2

September 12, 2018 | John Mount

wrapr 1.6.2 is now up on CRAN. We have some neat new features for R users to try (in addition to many earlier wrapr goodies). The first is the %in_block% alternate notation for let(). The wrapr let()-block allows easy replacement of names in name-capturing interfaces (such as transform()), as ...
[Read more...]

Practical Data Science with R2

September 12, 2018 | John Mount

The secret is out: Nina Zumel and I are busy working on Practical Data Science with R2, the second edition of our best selling book on learning data science using the R language. Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here: We ...
[Read more...]

A Quick Appreciation of the R transform Function

September 10, 2018 | John Mount

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame. suppressPackageStartupMessages(library("dplyr")) iris %__% mutate( ., Petal.Area = (pi/4)*Petal.Width*Petal.Length) %__% head(.) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.... [Read more...]

R Tip: Give data.table a Try

September 8, 2018 | John Mount

If your R or dplyr work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table. For some tasks data.table is routinely faster than ... [Read more...]

R tip: How to Pass a formula to lm

September 1, 2018 | John Mount

R tip : how to pass a formula to lm(). Often when modeling in R one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to ... [Read more...]

Timings of a Grouped Rank Filter Task

August 23, 2018 | John Mount

Introduction This note shares an experiment comparing the performance of a number of data processing systems available in R. Our notional or example problem is finding the top ranking item per group (group defined by three string columns, and order defined by a single numeric column). This is a common ...
[Read more...]

R Tip: Consider radix Sort

August 21, 2018 | John Mount

R tip: consider using radix sort. The “method = "radix"” option can greatly speed up sorting and ordering tables in R. For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default … ...
[Read more...]

More Practical Data Science with R Book News

August 19, 2018 | John Mount

Some more Practical Data Science with R news. Practical Data Science with R is the book we wish we had when we started in data science. Practical Data Science with R, Second Edition is the revision of that book with the packages we wish had been available at that time (... [Read more...]
1 7 8 9 10 11 24

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)