## Amanda Cox on How The New York Times Graphics Department Uses R

March 14, 2011
Last month, Amanda Cox from The New York Times Graphic Department gave a great talk to the NYC R Statistical Programming Meetup. I’ve just got around to uploading the video, which has been broken into a part one and part two. You can also view the videos embedded after the jump. Amanda made use of

## Language used by Academics with the Protection of Anonymity

March 14, 2011
Those in the political science discipline probably remember their first encounter with poliscijobrumors.com. For those outside, you have probably never heard of this particular message board, and you would have no reason to. As the URL suggests, the board specializes in rumor, gossip, back-bitting, mudslinging, and the occasional lucid thread on the political science

## R 2.13.0 scheduled for April 13

March 14, 2011
As announced yesterday by the R Core Team, the next major update to R will be released on April 13. R 2.13.0 is the next major release of R, which gets major updates approximately every six months. This also indicates that R 2.12.2 is the last patch level of the R 2.12 series, and so the next version of...

## R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

## Hacker News Analysis

March 13, 2011
I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

## Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011
This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →

## A Kernel Density Approach to Outlier Detection

March 13, 2011
$A Kernel Density Approach to Outlier Detection$

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

## Eigensheep

March 13, 2011
Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

## Counting Clusters

March 13, 2011
Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →

## RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

## Code: LaTeX tables for lme4 models

March 13, 2011
$Code: LaTeX tables for lme4 models$

I have recently discovered memisc, an extremely useful R package by Martin Elff (see his memisc page here). The package contains any number of useful functions, and is particularly good at helping one manage and recode survey data. However, by far my … Continue reading →

## Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes the number of successes in a series of independent trials without replacement. Chapter 6 of Using R introduces the geometric distribution - the time to...

## Legendary Plots

March 12, 2011
I was recently pointed in the direction of a thermal comfort model by the engineering company Arup (p27–28 of this pdf). Figure 3 at the top of p28 caught my attention. It’s mostly a nice graph; there’s not too much junk in it. One thing that struck me was that there is an awful lot

## A new series of mishaps

March 12, 2011
Following the slight difficulties of last week, I had a hard week on the computer front: indeed, on Monday, I received my 2007 macbook from the repair shop, with a new video card, courtesy of Apple. Unfortunately, this started a series of problems. First, the old macbook stopped recognizing the NVIDIA video and, while it

## A quick look at #march11 / #saudi tweets

March 12, 2011
Well, so much for that #march11 #Saudi day of rage.  Whether it was really the "tempest in a teacup" that  Prince Al-Waleed suggested on CNBC (video below, transcript here) or not, the oil complex and Saudi markets seem to have shrugged … Continue reading →

March 12, 2011
## sab-R-metrics: Multiple Regression and Interactions

March 12, 2011
Last time, I covered ordinary least squares with a single variable. This time, I'll extend this to using multiple predictor variables in a regression, interacting terms in R, and start thinking about using polynomials of certain terms in the regression (like Age and Age Squared). This should be a pretty straight forward tutorial, especially if you've got...

## How to Vectorize Nested Loop in R?

Could any R expert here help me to vectorize my for loop? Thanks in advance for your favor. The reason I am in trouble is the variable inside my "for" function are updated after each loop, which makes me feel difficult to use lapply, sapply or whatever. Simplifed codes are listed below:for (i in 1:N) { #N could be...

## Things I wish I’d known before I started using R

March 12, 2011
I’ve been using R for a couple of years now.  This post is aimed at me a couple of years ago, or you if you’re just starting to use R and are pressed for time.  Here’s some things I wish I’d known in early 2009. Use a naming convention read.csv is a great function, but

March 11, 2011
Conrad Sanderson continues an active release schedule for his wonderful Armadillo templated C++ library for linear algebra; release 1.1.8 just came out yesterday. So I made a new release 0.2.16 of RcppArmadillo, our Rcpp-based integration into R. No ...

## Survey: R used by more data miners than any other tool

March 11, 2011
According to respondents of the 2010 Rexer Analytics Data Miner Survey, open source R is the most commonly-used analysis tool amongst data miners: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also...

## Plotting Indifference Curves with R Contour Function

March 11, 2011
The following post at Constructing Difference Curves - Part 3 from economics.about.com provides a discussion on indifference curves (but actually I think they are isoquants) and how to construct them. I think I have a grasp on how to do this in R if yo...

## Script for Geostatistics with R

March 11, 2011
I received requests for the script used during the tutorial.All the material is available in the main page of the course.However, in order to facilitate the availability of the scripts to all the viewers of this blog I've put the link to donwnload them...