## Varian on big data

June 15, 2014
Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28. It’s a nice introduction to trees, bagging and forests, plus a very brief entree to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them,...

## Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

June 12, 2014
A colleague recently contacted us with the following question: "My outcome is skewed-- how can I compare medians across multiple categories?" What they were asking for was a generalization of the Wilcoxon rank-sum test (also known as the Mann-Whitney-Wilcoxon test, among other monikers) to more than two groups. For the record, the answer...

## Basketball Data Part III – BMI: Does it Matter?

June 11, 2014
For those of you who are just joining us, please refer back to the previous two posts referencing scraping XML data and length of NBA career by position. The next idea I wanted to explore was whether BMI had any … Continue reading →

## The Most Comprehensive Review of Comic Books Teaching Statistics

June 11, 2014
As I’m more or less an autodidact when it comes to statistics, I have a weak spot for books that try to introduce statistics in an accessible and pedagogical way. I have therefore collected what I believe are all books that introduces statistics using comics (at least those written in English). What follows are highly subjective reviews of those...

## R minitip: don’t use data.matrix when you mean model.matrix

June 10, 2014
A quick R mini-tip: don’t use data.matrix when you mean model.matrix. If you do so you may lose (without noticing) a lot of your model’s explanatory power (due to poor encoding). For some modeling tasks you end up having to prepare a special expanded data matrix before calling a given machine learning algorithm. For example Related posts:

## R style tip: prefer functions that return data frames

June 6, 2014
While following up on Nina Zumel’s excellent Trimming the Fat from glm() Models in R I got to thinking about code style in R. And I realized: you can make your code much prettier by designing more of your functions to return data.frames. That may seem needlessly heavy-weight, but it has a lot of down-stream Related posts:

## Female hurricanes reloaded – another reanalysis of Jung et al.

June 6, 2014
I have blogged a few days a ago about a study by Kiju Jung that suggested that implicit bias leads people to underestimate the danger of female-named hurricanes. The study used historical data to demonstrate a correlation between femininity and death-toll, and subsequent experiments seemed to show that people indeed estimate hurricanes to be less…

## Introducing R for Big Data with PivotalR

June 4, 2014
Wouldn't it be great if there was a way to harness the familiarity and usability of a tool like R, and at the same time take advantage of the performance and scalability benefits of in-database/in-Hadoop computation? We're happy to announce PivotalR, a package that translates R code into SQL for processing, is available to download from GitHub today.

## Geomorph 2.1 Now Available!

June 2, 2014
Geomorph users,We have uploaded version 2.1 to CRAN. The windows and mac binaries have been compiled and the tarball is available.Version 2.1 comes with some small changes and new features: Mike Collyer has now officially joined the geomorph ...

## A new gitbook – learnR

May 30, 2014
Gitbook is rather a relatively new concept on the web. It provides a user-friendly framework for authors to write and produce online books with beautiful illustrations and responsive interactions. It allows authors to write in Markdown syntax, which is very easy to learn and use, so that they can focus more on the contents they try to...