Happy Pi Day, Now Go Estimate It!

March 14, 2011
By
Happy Pi Day, Now Go Estimate It!

As you may know, today is Pi Day, when all good nerds take a moment to thank the geeks of antiquity for their painstaking work in estimating this marvelous mathematical constant. It is also a great opportunity to thank contemporary geeks for the wonders of modern computing, which allow us to estimate pi to near

Read more »

R/Finance 2011 Registration Open

March 14, 2011
By

The registration for R/Finance 2011--which will take place April 29 and 30 in Chicago--is NOW OPEN!Building on the success of the two previous conferences in 2009 and 2010, we are expecting more than 250 attendees from around the world representing bot...

Read more »

R/Finance 2011 Registration Open

March 14, 2011
By

The registration for R/Finance 2011--which will take place April 29 and 30 in Chicago--is NOW OPEN!Building on the success of the two previous conferences in 2009 and 2010, we are expecting more than 250 attendees from around the world representing bot...

Read more »

Amanda Cox on How The New York Times Graphics Department Uses R

March 14, 2011
By

Last month, Amanda Cox from The New York Times Graphic Department gave a great talk to the NYC R Statistical Programming Meetup. I’ve just got around to uploading the video, which has been broken into a part one and part two. You can also view the videos embedded after the jump. Amanda made use of

Read more »

Language used by Academics with the Protection of Anonymity

March 14, 2011
By
Language used by Academics with the Protection of Anonymity

Those in the political science discipline probably remember their first encounter with poliscijobrumors.com. For those outside, you have probably never heard of this particular message board, and you would have no reason to. As the URL suggests, the board specializes in rumor, gossip, back-bitting, mudslinging, and the occasional lucid thread on the political science

Read more »

R 2.13.0 scheduled for April 13

March 14, 2011
By

As announced yesterday by the R Core Team, the next major update to R will be released on April 13. R 2.13.0 is the next major release of R, which gets major updates approximately every six months. This also indicates that R 2.12.2 is the last patch level of the R 2.12 series, and so the next version of...

Read more »

R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
By
R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

Read more »

R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
By
R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

Read more »

Hacker News Analysis

March 13, 2011
By
Hacker News Analysis

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

Read more »

Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011
By
Piiikaaachuuuuuu vs. KHAAAAAN!

This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →

Read more »

A Kernel Density Approach to Outlier Detection

March 13, 2011
By
A Kernel Density Approach to Outlier Detection

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

Read more »

Eigensheep

March 13, 2011
By
Eigensheep

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

Read more »

Counting Clusters

March 13, 2011
By
Counting Clusters

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →

Read more »

RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
By
RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

Read more »

RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
By
RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

Read more »

Code: LaTeX tables for lme4 models

March 13, 2011
By
Code: LaTeX tables for lme4 models

I have recently discovered memisc, an extremely useful R package by Martin Elff (see his memisc page here). The package contains any number of useful functions, and is particularly good at helping one manage and recode survey data. However, by far my … Continue reading →

Read more »

Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By
Using R for Introductory Statistics, The Geometric distribution

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes the number of successes in a series of independent trials without replacement. Chapter 6 of Using R introduces the geometric distribution - the time to...

Read more »

Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By
Using R for Introductory Statistics, The Geometric distribution

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes th...

Read more »

Legendary Plots

March 12, 2011
By
Legendary Plots

I was recently pointed in the direction of a thermal comfort model by the engineering company Arup (p27–28 of this pdf). Figure 3 at the top of p28 caught my attention. It’s mostly a nice graph; there’s not too much junk in it. One thing that struck me was that there is an awful lot

Read more »

A new series of mishaps

March 12, 2011
By
A new series of mishaps

Following the slight difficulties of last week, I had a hard week on the computer front: indeed, on Monday, I received my 2007 macbook from the repair shop, with a new video card, courtesy of Apple. Unfortunately, this started a series of problems. First, the old macbook stopped recognizing the NVIDIA video and, while it

Read more »

A quick look at #march11 / #saudi tweets

March 12, 2011
By
A quick look at #march11 / #saudi tweets

Well, so much for that #march11 #Saudi day of rage.  Whether it was really the "tempest in a teacup" that  Prince Al-Waleed suggested on CNBC (video below, transcript here) or not, the oil complex and Saudi markets seem to have shrugged … Continue reading →

Read more »

Ask R not to create a local directory tree

March 12, 2011
By

I don't like R to create a local directory tree in my home directory because new packages will automatically be installed into that directory. The way to do this is to modify the "/usr/local/lib64/R/etc/Renviron" and mark the line "R_LIBS_USER=${R_LIBS...

Read more »

Ask R not to create a local directory tree

March 12, 2011
By

I don't like R to create a local directory tree in my home directory because new packages will automatically be installed into that directory. The way to do this is to modify the "/usr/local/lib64/R/etc/Renviron" and mark the line "R_LIBS_USER=${R_LIBS...

Read more »

sab-R-metrics: Multiple Regression and Interactions

March 12, 2011
By

Last time, I covered ordinary least squares with a single variable. This time, I'll extend this to using multiple predictor variables in a regression, interacting terms in R, and start thinking about using polynomials of certain terms in the regression (like Age and Age Squared). This should be a pretty straight forward tutorial, especially if you've got...

Read more »

sab-R-metrics: Multiple Regression and Interactions

March 12, 2011
By

Last time, I covered ordinary least squares with a single variable. This time, I'll extend this to using multiple predictor variables in a regression, interacting terms in R, and start thinking about using polynomials of certain terms in the regression (like Age and Age Squared). This should be a pretty straight forward tutorial, especially if you've got...

Read more »

How to Vectorize Nested Loop in R?

Could any R expert here help me to vectorize my for loop? Thanks in advance for your favor. The reason I am in trouble is the variable inside my "for" function are updated after each loop, which makes me feel difficult to use lapply, sapply or whatever. Simplifed codes are listed below:for (i in 1:N) { #N could be...

Read more »

Things I wish I’d known before I started using R

March 12, 2011
By
Things I wish I’d known before I started using R

I’ve been using R for a couple of years now.  This post is aimed at me a couple of years ago, or you if you’re just starting to use R and are pressed for time.  Here’s some things I wish I’d known in early 2009. Use a naming convention read.csv is a great function, but

Read more »

RcppArmadillo 0.2.16

March 11, 2011
By

Conrad Sanderson continues an active release schedule for his wonderful Armadillo templated C++ library for linear algebra; release 1.1.8 just came out yesterday. So I made a new release 0.2.16 of RcppArmadillo, our Rcpp-based integration into R. No ...

Read more »

Survey: R used by more data miners than any other tool

March 11, 2011
By

According to respondents of the 2010 Rexer Analytics Data Miner Survey, open source R is the most commonly-used analysis tool amongst data miners: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also...

Read more »