Articles by John Mount

R summary() got better!

June 4, 2017 | John Mount

Here is a really nice feature found in the current 3.4.0 version of R: summary() has become a lot more reasonable. summary(15555) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 15555 15555 15555 15555 15555 15555 Please read on for some background. In older versions of R (say R 3.3.1) the above code … Continue reading R summary() got ... [Read more...]

In defense of wrapr::let()

June 1, 2017 | John Mount

Saw this the other day: In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say: let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (...
[Read more...]

Summarizing big data in R

May 30, 2017 | John Mount

Our next "R and big data tip" is: summarizing big data. We always say "if you are not looking at the data, you are not doing science"- and for big data you are very dependent on summaries (as you can’t actually look at everything). Simple question: is there ... [Read more...]

Managing Spark data handles in R

May 26, 2017 | John Mount

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big ...
[Read more...]

On indexing operators and composition

May 18, 2017 | John Mount

In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example of ...
[Read more...]

dplyr in Context

May 6, 2017 | John Mount

Introduction Beginning R users often come to the false impression that the popular packages dplyr and tidyr are both all of R and sui generis inventions (in that they might be unprecedented and there might no other reasonable way to get the same effects in R). These packages and their ...
[Read more...]

Why to use wrapr::let()

May 2, 2017 | John Mount

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers. Abstractions A common definition of an abstraction is (from the OSX dictionary): the process of considering something independently of its ...
[Read more...]

Programming over R

April 21, 2017 | John Mount

R is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have ...
[Read more...]

Visualizing relational joins

April 4, 2017 | John Mount

I want to discuss a nice series of figures used to teach relational join semantics in R for Data Science by Garrett Grolemund and Hadley Wickham, O’Reilly 2016. Below is an example from their book illustrating an inner join: Please read on for my discussion of this diagram and teaching ...
[Read more...]

Coordinatized Data: A Fluid Data Specification

March 29, 2017 | John Mount

Authors: John Mount and Nina Zumel. Introduction It’s been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and ...
[Read more...]

Datashader is a big deal

March 22, 2017 | John Mount

I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics. I had the privilege of having Peter Wang himself demonstrate datashader for ...
[Read more...]

Another R [Non-]Standard Evaluation Idea

March 17, 2017 | John Mount

Jonathan Carroll had a an interesting R language idea: to use @-notation to request value substitution in a non-standard evaluation environment (inspired by msyql User-Defined Variables). He even picked the right image: The idea is kind of reverse from some Lisp ideas ("evaled unless ticked"), but an interesting possibility. We ...
[Read more...]

Some Win-Vector R packages

March 9, 2017 | John Mount

This post concludes our mini-series of Win-Vector open source R packages. We end with WVPlots, a collection of ready-made ggplot2 plots we find handy. Please read on for list of some of the Win-Vector LLC open-source R packages that we are pleased to share. For each package we have prepared ...
[Read more...]

sigr: Simple Significance Reporting

March 7, 2017 | John Mount

sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use. Model Example Let’s take as our example the following linear relation between x and y: ...
[Read more...]
1 12 13 14 15 16 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)