What’s the difference between using which() and match() in R? For me - about 10 hours! Today I was doing some string matching in R. In my experience performing any sort of regex or string manipulation in R is a mistake....
What’s the difference between using which() and match() in R? For me - about 10 hours! Today I was doing some string matching in R. In my experience performing any sort of regex or string manipulation in R is a mistake....
One of the favorite bar room discussions of statisticians, machine learners, and computer scientists is – what is data science? (And I don’t care whether it happens in a bar or not, it’s a “bar room” discussion by virtue of...
In honor of Twitter making the decision to come to Detroit, here’s a special post on how I became a Twitter user. … At 3:30pm my wife called me. There was a shooting where my brother-in-law works at UPMC Western...
“My job is to teach you how to think.” –Hugh Young A few days ago John Naughton published an article summarizing his manifesto on how to reform computer science education. I agree computer science education is in need of drastic...
Welcome to Missing Data Club. There are only three rules. Rule #1 is: There is no missing data. Rule #2 is: THERE IS NO MISSING DATA! Rule #3: If you’ve never built a model using missing data – you must do it...
I really like git. It’s the first versioning tool I’ve ever used so I have nothing else to compare it to, but in the world of statistical model building where iteration is constant (and almost never a strict linear progression)...
After using github for data mining competitions and a project on statistical language models I found I enjoyed it some much I wanted to use it at work too. The trick is there’s a lot of overlap between what I...
My guess is R is most commonly integrated with C/C++ to handle heavy-duty computing. (thanks in no small part due to the productivity of Dick Eddlebuttle!) That said, if you’re like most statisticians and physical scientists and aren’t already a programming...
Jeff Gentry has created a couple of really fun and handy R packages for working with Twitter data called twitteR and ROAuth. He’s also written an easy to read vignette on how to get started. As of right now (March...