Articles by John Mount

A Better Example of the Confused By The Environment Issue

September 25, 2018 | John Mount

Our interference from then environment issue was a bit subtle. But there are variations that can be a bit more insidious. Please consider the following. library("dplyr") # unrelated value that happens # to be in our environment z % select(-z) # x y # … Continue reading A Better Example of the Confused By ... [Read more...]

A Subtle Flaw in Some Popular R NSE Interfaces

September 23, 2018 | John Mount

It is no great secret: I like value oriented interfaces that preserve referential transparency. It is the side of the public debate I take in R programming. "One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish ... [Read more...]

Timing Column Indexing in R

September 21, 2018 | John Mount

I’ve ended up (almost accidentally) collecting a number of different solutions to the “use a column to choose values from other columns in R” problem. Please read on for a brief benchmark comparing these methods/solutions. What we did is: build a 1,000,000 row variation of the original example. In ...
[Read more...]

Using a Column as a Column Index

September 20, 2018 | John Mount

We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions. Let’s say we ... [Read more...]

Parameterizing with bquote

September 16, 2018 | John Mount

One thing that is sure to get lost in my long note on macros in R is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke. Let’s try ... [Read more...]

On “Competition” in the R Ecosystem

September 15, 2018 | John Mount

I’ve been thinking a bit on “competition” in the R ecosystem. I guess the closest I can come to a fair and coherent view on “competition” in the R ecosystem is some variation of the following. I, of course, should not be treating things as a competition. We are ... [Read more...]

Better R Code with wrapr Dot Arrow

September 15, 2018 | John Mount

Our R package wrapr supplies a "piping operator" that we feel is a real improvement in R code piped-style coding. The idea is: with wrapr‘s "dot arrow" pipe "%.__%" the expression "A %.__% B" is treated very much like "{. % B(.)" as a … Continue reading Better R Code with wrapr Dot Arrow [Read more...]

Announcing wrapr 1.6.2

September 12, 2018 | John Mount

wrapr 1.6.2 is now up on CRAN. We have some neat new features for R users to try (in addition to many earlier wrapr goodies). The first is the %in_block% alternate notation for let(). The wrapr let()-block allows easy replacement of names in name-capturing interfaces (such as transform()), as ...
[Read more...]

Practical Data Science with R2

September 12, 2018 | John Mount

The secret is out: Nina Zumel and I are busy working on Practical Data Science with R2, the second edition of our best selling book on learning data science using the R language. Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here: We ...
[Read more...]

A Quick Appreciation of the R transform Function

September 10, 2018 | John Mount

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame. suppressPackageStartupMessages(library("dplyr")) iris %__% mutate( ., Petal.Area = (pi/4)*Petal.Width*Petal.Length) %__% head(.) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.... [Read more...]

R Tip: Give data.table a Try

September 8, 2018 | John Mount

If your R or dplyr work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table. For some tasks data.table is routinely faster than ... [Read more...]

R tip: How to Pass a formula to lm

September 1, 2018 | John Mount

R tip : how to pass a formula to lm(). Often when modeling in R one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to ... [Read more...]

Timings of a Grouped Rank Filter Task

August 23, 2018 | John Mount

Introduction This note shares an experiment comparing the performance of a number of data processing systems available in R. Our notional or example problem is finding the top ranking item per group (group defined by three string columns, and order defined by a single numeric column). This is a common ...
[Read more...]

R Tip: Consider radix Sort

August 21, 2018 | John Mount

R tip: consider using radix sort. The “method = "radix"” option can greatly speed up sorting and ordering tables in R. For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default … ...
[Read more...]

More Practical Data Science with R Book News

August 19, 2018 | John Mount

Some more Practical Data Science with R news. Practical Data Science with R is the book we wish we had when we started in data science. Practical Data Science with R, Second Edition is the revision of that book with the packages we wish had been available at that time (... [Read more...]

data.table is Really Good at Sorting

August 13, 2018 | John Mount

The data.table R package is really good at sorting. Below is a comparison of it versus dplyr for a range of problem sizes. The graph is using a log-log scale (so things are very compressed). But data.table is routinely 7 times faster than dplyr. The ratio of run times ...
[Read more...]

Meta-packages, nails in CRAN’s coffin

August 7, 2018 | John Mount

Derek Jones recently discussed a possible future for the R ecosystem in “StatsModels: the first nail in R’s coffin”. This got me thinking on the future of CRAN (which I consider vital to R, and vital in distributing our work) in the era of super-popular meta-packages. Meta-packages are convenient, ...
[Read more...]

Collecting Expressions in R

August 5, 2018 | John Mount

Not a full R article, but a quick note demonstrating by example the advantage of being able to collect many expressions and pack them into a single extend_se() node. This example may seem extreme or unnatural. However we have seen once you expose a system to enough users you ...
[Read more...]
1 6 7 8 9 10 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)