Blog Archives

Kickin’ it with elastic net regression

Kickin’ it with elastic net regression

With the kind of data that I usually work with, overfitting regression models can be a huge problem if I'm not careful. Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm… Continue reading →

Read more »

Lessons learned in high-performance R

Lessons learned in high-performance R

On this blog, I've had a long running investigation/demonstration of how to make a "embarrassingly-parallel" but computationally intractable (on commodity hardware, at least) R problem more performant by using parallel computation and Rcpp. The example problem is to find the… Continue reading →

Read more »

I’m all about that bootstrap (’bout that bootstrap)

I’m all about that bootstrap (’bout that bootstrap)

As some of my regular readers may know, I'm in the middle of writing a book on introductory data analysis with R. I'm at the point in the writing of the book now where I have to make some hard… Continue reading →

Read more »

Playing around with #rstats twitter data

Playing around with #rstats twitter data

As a bit of weekend fun, I decided to briefly look into the #rstats twitter data that Stephen Turner collected and made available (thanks!). Essentially, this data set contains some basic information about over 100,000 tweets that contain the hashtag… Continue reading →

Read more »

Assertive R programming in dplyr/magrittr pipelines

Assertive R programming in dplyr/magrittr pipelines

A lot of my job–and side projects, for that matter–involve running R scripts on updates of open government data. While I’m infinitely grateful to have access to any interesting open datasets in the first place, I can’t ignore that dealing… Continue reading →

Read more »

Why is my OS X Yosemite install taking so long?: an analysis

Why is my OS X Yosemite install taking so long?: an analysis

Why? Since the latest Mac OS X update, 10.10 "Yosemite", was released last Thursday, there have been complaints springing up online of the progress bar woefully underestimating the actual time to complete installation. More specifically, it appeared as if, for a certain group of people (myself included), the installer would stall out at "two minutes »more

Read more »

Fun with .Rprofile and customizing R startup

Fun with .Rprofile and customizing R startup

Over the years, I've meticulously compiled–and version controlled–massive and extensive configuration files for virtually all of my most used utilities, most notably vim, tmux, and zsh. In fact, one of the only configurable utilities for which I had no special configuration schema was R. This is extremely surprising, given that I use R everyday. One »more

Read more »

Interactive visualization of non-linear logistic regression decision boundaries with Shiny

Interactive visualization of non-linear logistic regression decision boundaries with Shiny

(skip to the shiny app) Model building is very often an iterative process that involves multiple steps of choosing an algorithm and hyperparameters, evaluating that model / cross validation, and optimizing the hyperparameters. I find a great aid in this process, for classification tasks, is not only to keep track of the accuracy across models, »more

Read more »

Squeezing more speed from R for nothing, Rcpp style

Squeezing more speed from R for nothing, Rcpp style

In a previous post we explored how you can greatly speed up certain types of long-running computations in R by parallelizing your code using multicore package*. I also mentioned that there were a few other ways to speed up R code; the one I will be exploring in this post is using Rcpp to replace »more

Read more »

Take a look, it’s in a book: distribution of kindle e-book highlights

Take a look, it’s in a book: distribution of kindle e-book highlights

If you've ever started a book and not finished it, it may comfort you to know that you are not alone. It's hard to get accurate estimates of the percentage books that are discontinued, but the rise of e-reading (and resulting circumvention of privacy) affords us the opportunity to answer related questions. The kindle e-reading »more

Read more »