Blog Archives

Why to use wrapr::let()

May 2, 2017
By
Why to use wrapr::let()

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers. Abstractions A common definition of an abstraction is (from the OSX dictionary): the process of considering something independently of its associations, attributes, or concrete accompaniments. In … Continue...

Read more »

Programming over R

April 21, 2017
By
Programming over R

R is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have been experimenting with. Meta-Programming Meta-programming is … Continue...

Read more »

Encoding categorical variables: one-hot and beyond

April 15, 2017
By
Encoding categorical variables: one-hot and beyond

(or: how to correctly use xgboost from R) R has "one-hot" encoding hidden in most of its modeling paths. Asking an R user where one-hot encoding is used is like asking a fish where there is water; they can’t point to it as it is everywhere. For example we can see evidence of one-hot encoding … Continue...

Read more »

Visualizing relational joins

April 4, 2017
By
Visualizing relational joins

I want to discuss a nice series of figures used to teach relational join semantics in R for Data Science by Garrett Grolemund and Hadley Wickham, O’Reilly 2016. Below is an example from their book illustrating an inner join: Please read on for my discussion of this diagram and teaching joins. Teaching joins In the … Continue...

Read more »

Coordinatized Data: A Fluid Data Specification

March 29, 2017
By
Coordinatized Data: A Fluid Data Specification

Authors: John Mount and Nina Zumel. Introduction It’s been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and understanding of this concept doesn’t fully … Continue...

Read more »

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

March 25, 2017
By
Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment. Pipes in R The magrittr R package supplies an operator called “pipe” which is written as “%>%“. The pipe operator is partly famous due to its extensive use in dplyr and use by dplyr users. The pipe operator is … Continue...

Read more »

Datashader is a big deal

March 22, 2017
By
Datashader is a big deal

I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics. I had the privilege of having Peter Wang himself demonstrate datashader for me and answer a … Continue...

Read more »

Practical Data Science with R: ACM SIGACT News Book Review and Discount!

March 19, 2017
By
Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)! The book is half off at Manning form March 21st 2017 using the following code (please share/Tweet): Deal of the Day … Continue...

Read more »

Another R [Non-]Standard Evaluation Idea

March 17, 2017
By
Another R [Non-]Standard Evaluation Idea

Jonathan Carroll had a an interesting R language idea: to use @-notation to request value substitution in a non-standard evaluation environment (inspired by msyql User-Defined Variables). He even picked the right image: The idea is kind of reverse from some Lisp ideas ("evaled unless ticked"), but an interesting possibility. We can play along with it … Continue...

Read more »

New screencast: using R and RStudio to install and experiment with Apache Spark

March 15, 2017
By

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark. More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)