Monthly Archives: May 2012

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

R Tutorials and Learning Materials

May 3, 2012
By

We are getting ready to host an R bootcamp this summer at work and I am looking at building on materials that already exist. I just wanted to list a few here while I figure out the best ways to incorporate them. Video Tutorials:This is a fairly ne...

Read more »

Big Data Analytics with R and Hadoop

May 3, 2012
By

The open-source RHadoop project makes it easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel statistical computing cluster based on R. In yesterday's webinar (the replay of which is embedded below), Data scientist and RHadoop project lead Antonio Piccolboni...

Read more »

what’s wrong with package comment?!

May 3, 2012
By
what’s wrong with package comment?!

I spent most of the Sunday afternoon trying to understand why defining did not have the same effect as writing the line until I found there is a clash due to the comment package… The assuredly simple code produces an error message: This is quite an inconvenience as I need to compile my solution manual

Read more »

RegEx: Named Capture in R

May 3, 2012
By

I consider myself a decent RegEx user.  References to famous quotes about RegEx aside, I find it intuitive, like its speed and that it makes my code simple (more so than the alternative anyhow). Thus, I use RegEx where I can in the growing grab bag of languages I consider myself proficient in: *nix command line / shell scripts Javascript PHP Matlab Python R Now...

Read more »

Theme Elements in ggplot2

May 3, 2012
By

This website provides a simple summary of the theme elements that can be set within ggplot2. There should be sufficient information here to change the default settings for graphs within the ggplot2 package.

Read more »

cumplyr: Extending the plyr Package to Handle Cross-Dependencies

May 3, 2012
By

Introduction For me, Hadley Wickham‘s reshape and plyr packages are invaluable because they encapsulate omnipresent design patterns in statistical computing: reshape handles switching between the different possible representations of the same underlying data, while plyr automates what Hadley calls the Split-Apply-Combine strategy, in which you split up your data into several subsets, perform some computation

Read more »

Google Translate for code, and an R help-list bot

May 3, 2012
By

What we did in our Stan meeting yesterday: Some discussion of revision of the Nuts paper, some conversations about parameterizations of categorical-data models, plans for the R interface, blah blah blah. But also, I had two exciting new ideas! Google Translate for code Wouldn’t it be great if Google Translate could work on computer languages? The post Google...

Read more »

How to plot three categorical variables and one continuous variable using ggplot2

May 3, 2012
By
How to plot three categorical variables and one continuous variable using ggplot2

This post shows how to produce a plot involving three categorical variables and one continuous variable using ggplot2 in R. The following code is also available as a gist on github. 1. Create Data First, let's load ggplot2 and create some data to work...

Read more »

An ivreg2 function for R

May 3, 2012
By
An ivreg2 function for R

The ivreg2 command is one of the most popular routines in Stata. The reason for this popularity is its simplicity. A one-line ivreg2 command generates not only the instrumental variable regression coefficients and their standard errors, but also a number of other statistics of interest. I have come across a number of functions in R

Read more »