Articles by [email protected]

The Bayesian approach to ridge regression

October 30, 2016 | 0 Comments

In a previous post, we demonstrated that ridge regression (a form of regularized linear regression that attempts to shrink the beta coefficients toward zero) can be super-effective at combating overfitting and lead to a greatly more generalizable model. This approach… Continue reading → [Read more...]

Kickin’ it with elastic net regression

August 19, 2015 | 0 Comments

With the kind of data that I usually work with, overfitting regression models can be a huge problem if I'm not careful. Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm… Continue reading → [Read more...]

Lessons learned in high-performance R

May 30, 2015 | 0 Comments

On this blog, I've had a long running investigation/demonstration of how to make a "embarrassingly-parallel" but computationally intractable (on commodity hardware, at least) R problem more performant by using parallel computation and Rcpp. The example problem is to find the… Continue reading → [Read more...]

Playing around with #rstats twitter data

February 28, 2015 | 0 Comments

As a bit of weekend fun, I decided to briefly look into the #rstats twitter data that Stephen Turner collected and made available (thanks!). Essentially, this data set contains some basic information about over 100,000 tweets that contain the hashtag… Continue reading → [Read more...]

Assertive R programming in dplyr/magrittr pipelines

January 23, 2015 | 0 Comments

A lot of my job–and side projects, for that matter–involve running R scripts on updates of open government data. While I’m infinitely grateful to have access to any interesting open datasets in the first place, I can’t ignore that dealing… Continue reading → [Read more...]

Why is my OS X Yosemite install taking so long?: an analysis

October 22, 2014 | 0 Comments

Why? Since the latest Mac OS X update, 10.10 "Yosemite", was released last Thursday, there have been complaints springing up online of the progress bar woefully underestimating the actual time to complete installation. More specifically, it appeared as if, for a certain group of people (myself included), the installer would stall ... [Read more...]

Fun with .Rprofile and customizing R startup

September 17, 2014 | 0 Comments

Over the years, I've meticulously compiled–and version controlled–massive and extensive configuration files for virtually all of my most used utilities, most notably vim, tmux, and zsh. In fact, one of the only configurable utilities for which I had no special configuration schema was R. This is extremely surprising, ... [Read more...]

Squeezing more speed from R for nothing, Rcpp style

June 27, 2014 | 0 Comments

In a previous post we explored how you can greatly speed up certain types of long-running computations in R by parallelizing your code using multicore package*. I also mentioned that there were a few other ways to speed up R code; the one I will be exploring in this post ... [Read more...]

How dplyr replaced my most common R idioms

February 10, 2014 | 0 Comments

Having written a lot of R code over the last few years, I've developed a set of constructs for my most common tasks. Like an idiom in a natural language (e.g. "break a leg"), I automatically grasp their meaning without having to think about it. Because they allow me ... [Read more...]

Using Last.fm to data mine my music listening history

January 27, 2014 | 0 Comments

I've (passively) been keeping meticulous records of almost every song I've listened to since January of 2008. Since I opened my last.fm account 6 years ago, they've accumulated a massive detailed dataset of the 107,222 songs I've listened to since then. The best thing is that they're willing to share this data ... [Read more...]

The performance gains from switching R’s linear algebra libraries

December 26, 2013 | 0 Comments

What is often forgotten in the so-called data analysis "language wars” is that, across most of these languages, many common computations are performed using outsourced dynamically linked math libraries. For example, R; Python's Numpy; Julia; Matlab; and Mathematica all make heavy use of the BLAS linear algebra API. As a ... [Read more...]

Compiling R from source and why you shouldn’t do it

November 22, 2013 | 0 Comments

I’ve always thought that it’s silly, in most cases, source compiling software that’s already available in binary form. To the end of making more binary packages available to Mac users, I just started contributing to a project that is creating a repository of 64 bit builds of pkgsrc’... [Read more...]

Parallel R (and air travel)

November 13, 2013 | 0 Comments

My heart sinks a little when I check on my laptop in the morning and the computation I started the night before still hasn’t finished. Even when the data I’m playing with isn’t particularly.... large... (I’m not going to say it), I have a knack for ... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)