Blog Archives

Computing kook density in R

September 24, 2012
By
Computing kook density in R

Do you ever see strange lights in the sky? Do you wonder what really goes on in Area 51? Would you like to use your R hacking skills to get to the bottom of the whole UFO conspiracy? Of course, you would! UFO data from infochimps is the focus of a dat...

Read more »

OO in R

September 13, 2012
By
OO in R

"Is there a package for obfuscating code in #rstats?", someone asked. "The S4 object system?!" came the snarky reply. If you're smiling right now, you know that it wouldn't be funny if it weren't at least a little bit true. Options: S3, S4 or R5? There can be little doubt that object oriented...

Read more »

Linear regression by gradient descent

July 26, 2012
By
Linear regression by gradient descent

In Andrew Ng's Machine Learning class, the first section demonstrates gradient descent by using it on a familiar problem, that of fitting a linear function to data. Let's start off, by generating some bogus data with known characteristics. Let's make y just a noisy version of x. Let's also add 3 to give the intercept term something to...

Read more »

Long-vector kludge in R

July 25, 2012
By
Long-vector kludge in R

Just recently, I found out that R is limited to 32-bit integers, even on 64-bit hardware. Bummer, huh? As a consequence, the maximum size of a vector is 2^31-1. To be fair, dealing with numeric types across machine architectures is hard. A fixed repr...

Read more »

Sage Bionetworks Synapse

April 27, 2012
By
Sage Bionetworks Synapse

Michael Kellen, Director of Technology at Sage Bionetworks, is trying to build a GitHub for science. It's called Synapse and Kellen described it in a talk at the Sage Bionetworks Commons Congress 2012, this past weekend: 'Synapse' Pilot for Building an...

Read more »

International Open Data Hackathon

December 5, 2011
By
International Open Data Hackathon

This past Saturday, I hung out at the Seattle branch of the International Open Data Hackathon. The event was hosted at the Pioneer Square office of Socrata, a small company that helps governments provide public open data. A pair of data analysts from Tableau were showing off a visualization for the Washington...

Read more »

Hipster programming languages

September 26, 2011
By
Hipster programming languages

If you look at the programming languages that are popular these days, a few patterns emerge. I'm not talking about languages that have the most hits on the job sites. I'm talking about what the cool kids are coding in - the folks that hang out on hacke...

Read more »

String functions in R

August 25, 2011
By

Here's a quick cheat-sheet on string manipulation functions in R, mostly cribbed from Quick-R's list of String Functions with a few additional links. substr(x, start=n1, stop=n2) grep(pattern,x, value=FALSE, ignore.case=FALSE, fixed=FALSE) gsub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE) gregexpr(pattern, text, ignore.case=FALSE, perl=FALSE, fixed=FALSE) strsplit(x, split) paste(..., sep="", collapse=NULL) sprintf(fmt, ...)

Read more »

MySQL and R

August 15, 2011
By

Using MySQL with R is pretty easy, with RMySQL. Here are a few notes to keep me straight on a few things I always get snagged on. Typically, most folks are going to want to analyze data that's already in a MySQL database. Being a little bass-ackwards, I often want to go the other way. One reason to do...

Read more »

Notes on Engineering Data Analysis (with R and ggplot2)

July 8, 2011
By
Notes on Engineering Data Analysis (with R and ggplot2)

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes. The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access an...

Read more »