Blog Archives

Genre-based Music Recommendations Using Open Data (and the problem with recommender systems)

Genre-based Music Recommendations Using Open Data (and the problem with recommender systems)

After a long 12 months of pouring my soul into it, my book, Data Analysis with R, was finally published. After the requisite 2-4 day breather, I started thinking about how I was going to get back into the swing… Continue reading →

Read more »

Kickin’ it with elastic net regression

Kickin’ it with elastic net regression

With the kind of data that I usually work with, overfitting regression models can be a huge problem if I'm not careful. Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm… Continue reading →

Read more »

Lessons learned in high-performance R

Lessons learned in high-performance R

On this blog, I've had a long running investigation/demonstration of how to make a "embarrassingly-parallel" but computationally intractable (on commodity hardware, at least) R problem more performant by using parallel computation and Rcpp. The example problem is to find the… Continue reading →

Read more »

I’m all about that bootstrap (’bout that bootstrap)

I’m all about that bootstrap (’bout that bootstrap)

As some of my regular readers may know, I'm in the middle of writing a book on introductory data analysis with R. I'm at the point in the writing of the book now where I have to make some hard… Continue reading →

Read more »

Playing around with #rstats twitter data

Playing around with #rstats twitter data

As a bit of weekend fun, I decided to briefly look into the #rstats twitter data that Stephen Turner collected and made available (thanks!). Essentially, this data set contains some basic information about over 100,000 tweets that contain the hashtag… Continue reading →

Read more »

Assertive R programming in dplyr/magrittr pipelines

Assertive R programming in dplyr/magrittr pipelines

A lot of my job–and side projects, for that matter–involve running R scripts on updates of open government data. While I’m infinitely grateful to have access to any interesting open datasets in the first place, I can’t ignore that dealing… Continue reading →

Read more »

Why is my OS X Yosemite install taking so long?: an analysis

Why is my OS X Yosemite install taking so long?: an analysis

Why? Since the latest Mac OS X update, 10.10 "Yosemite", was released last Thursday, there have been complaints springing up online of the progress bar woefully underestimating the actual time to complete installation. More specifically, it appeared as if, for a certain group of people (myself included), the installer would stall out at "two minutes »more

Read more »

Fun with .Rprofile and customizing R startup

Fun with .Rprofile and customizing R startup

Over the years, I've meticulously compiled–and version controlled–massive and extensive configuration files for virtually all of my most used utilities, most notably vim, tmux, and zsh. In fact, one of the only configurable utilities for which I had no special configuration schema was R. This is extremely surprising, given that I use R everyday. One »more

Read more »

Interactive visualization of non-linear logistic regression decision boundaries with Shiny

Interactive visualization of non-linear logistic regression decision boundaries with Shiny

(skip to the shiny app) Model building is very often an iterative process that involves multiple steps of choosing an algorithm and hyperparameters, evaluating that model / cross validation, and optimizing the hyperparameters. I find a great aid in this process, for classification tasks, is not only to keep track of the accuracy across models, »more

Read more »

Squeezing more speed from R for nothing, Rcpp style

Squeezing more speed from R for nothing, Rcpp style

In a previous post we explored how you can greatly speed up certain types of long-running computations in R by parallelizing your code using multicore package*. I also mentioned that there were a few other ways to speed up R code; the one I will be exploring in this post is using Rcpp to replace »more

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)