I have been working on developing some interactive tools for demonstrating statistical principles. The advent of the shiny framework for R from the RStudio team has made this very easy to do. These demos are still in the development...

In two weeks (on January 24), Think Big Analytics' Jeffrey Breen will present a new webinar on using R with Hadoop. Here's the webinar description: R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies...

One issue I continuously encounter when starting to work with a new dataset is that of the codebook. In general, I prefer to load a codebook into R like any other data source, specifically as a data frame. And ideally, one data frame to provides the variable names with descriptions and any other meta data available, and a separate...

Factor Analysis of Baseball's Hall of Fame VotersRecently, Nate Silver wrote a post which analyzed how voters who voted for and against Barry Bonds for Baseball's Hall of Fame differed. Not surprisingly, those who voted for Bonds were more likely to vote for other suspected steroids users (like Roger Clemens). This got...

A trackback from Martin Hawksey’s recent post on Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet), which demonstrates how to pull WordPress stats into a Google Spreadsheet and generates charts and reports therein, reminded me of the WordPress stats API. So here’s a quick function for pulling WordPress reports into R. (Code

In two previous posts, I have written about how you can speed up your R computations either by using strange notation and non-standard functions or by compiling your code. Last year my department bought a 64-core computational server, which allowed me ...

Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical tool,

This a brief guide to using R in collaborative, social ways. R is a powerful open-source programming language for data analysis, statistics, and visualization, but much of its power derives from a large, engaged community of users. This is an introduction to tools for engaging the community to improve your R code and collaborate with others. (Am I...

A Brief Introduction to Metaprogramming in Julia In contrast to my previous post, which described one way in which Julia allows (and expects) the programmer to write code that directly employs the atomic operations offered by computers, this post is meant to introduce newcomers to some of Julia’s higher level functions for metaprogramming. To make