Blog Archives

Quick Hit: which() and match() are not the same

April 11, 2012
By

What’s the difference between using which() and match() in R? For me - about 10 hours! Today I was doing some string matching in R. In my experience performing any sort of regex or string manipulation in R is a mistake....

Read more »

Data Science Undefined

April 4, 2012
By

One of the favorite bar room discussions of statisticians, machine learners, and computer scientists is – what is data science? (And I don’t care whether it happens in a bar or not, it’s a “bar room” discussion by virtue of...

Read more »

How I Learned to Stop Worrying and Love Twitter

April 4, 2012
By

In honor of Twitter making the decision to come to Detroit, here’s a special post on how I became a Twitter user. … At 3:30pm my wife called me. There was a shooting where my brother-in-law works at UPMC Western...

Read more »

Radical Education Reform? Think Bigger.

April 2, 2012
By

“My job is to teach you how to think.” –Hugh Young A few days ago John Naughton published an article summarizing his manifesto on how to reform computer science education. I agree computer science education is in need of drastic...

Read more »

Missing Data Club

April 1, 2012
By

Welcome to Missing Data Club. There are only three rules. Rule #1 is: There is no missing data. Rule #2 is: THERE IS NO MISSING DATA! Rule #3: If you’ve never built a model using missing data – you must do it...

Read more »

A Crash Course in git for Data Scientists

March 10, 2012
By

I really like git. It’s the first versioning tool I’ve ever used so I have nothing else to compare it to, but in the world of statistical model building where iteration is constant (and almost never a strict linear progression)...

Read more »

github with Multiple Accounts: An Analyst Perspective

March 10, 2012
By

After using github for data mining competitions and a project on statistical language models I found I enjoyed it some much I wanted to use it at work too. The trick is there’s a lot of overlap between what I...

Read more »

R Meets Java: An Absolute Beginners’ Introduction

March 10, 2012
By

My guess is R is most commonly integrated with C/C++ to handle heavy-duty computing. (thanks in no small part due to the productivity of Dick Eddlebuttle!) That said, if you’re like most statisticians and physical scientists and aren’t already a programming...

Read more »

Get ROAuth to work on Windows 7

March 10, 2012
By

Jeff Gentry has created a couple of really fun and handy R packages for working with Twitter data called twitteR and ROAuth. He’s also written an easy to read vignette on how to get started. As of right now (March...

Read more »

Thoughts on SPSS and R Integration

March 10, 2012
By
Thoughts on SPSS and R Integration

As part of considering SPSS as a platform for modeling I wanted to test SPSS’ integration with R. What I found out is getting SPSS to work with R isn’t embarssingly obvious. What’s worse I found it quite difficult to...

Read more »