Like Rob, I recently got back from ICOTS. What a great conference. Kudos to everyone who worked hard to organize and pull it off. In one of the sessions I was at, Amelia McNamara (@AmeliaMN) gave a nice presentation about how … Continue reading →

In this post, I will focus on the analysis of available R packages and the behavior of its users. In essence, this involves looking at the data in two different ways (1) relationships among available R packages in CRAN and (2) tracking the behavior of R users through download logs on CRAN mirrors. I will

It has been a month since the UseR! 2014 conference, and I'm probably the last one who writes about it. UseR! is my favorite conference because it is technical and not too big. I have completely lost interest in big and broad conferences like JSM (to me, it has become Joint Sightseeing Meetings). Karl has written two blog posts...

by Joseph Rickert If I had to pick just one application to be the “killer app” for the digital computer I would probably choose Agent Based Modeling (ABM). Imagine creating a world populated with hundreds, or even thousands of agents, interacting with each other and with the environment according to their own simple rules. What kinds of patterns and...

This is a guest post by Randy Zwitch (@randyzwitch), a digital analytics and predictive modeling consultant in the Greater Philadelphia area. Randy blogs regularly about Data Science and related technologies at http://randyzwitch.com. He’s blogged at Bad Hessian before here. For those of you with WordPress blogs and have the Jetpack Stats module installed, you’re intimately familiar… Continue reading →

tidyr is new package that makes it easy to “tidy” your data. Tidy data is data that’s easy to work with: it’s easy to munge (with dplyr), visualise (with ggplot2 or ggvis) and model (with R’s hundreds of modelling packages). The two most important properties of tidy data are: Each column is a variable. Each

Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data. And for dotplots, we have the barley data, first popularized by Bill Cleveland in the landmark 1993 text Visualizing Data. Cleveland's innovations in data visualiation were hugely influential...

e-mails with the latest R posts.

(You will not see this message again.)