## Visualizing Growth of a Retail Chain

March 15, 2011
I am a regular reader of the FlowingData blog by Nathan Yau. It is an excellent reference for anyone interested in statistical visualization of data. One of his posts that caught my attention was a visualization of the growth of Walmart in the US. Given my research interests in retail, it was a fascinating insight

## More pi plus 1 (or plus 0.01) day fun

March 15, 2011
Since I just didn’t get enough this morning, I spent some more time fooling around with estimating pi. Since I was basically counting the number of random x,y pairs inside a quarter circle and computing a sample average for more … Continue reading →

## RStudio: My thoughts

March 15, 2011
Let me get this out of the way: I just love RStudio.Created by a team lead by JJ Allaire, a name that should ring a bell if you were involved in web development during the Clinton administration, RStudio is an R IDE that is actually designed for R from...

## Webinar on integrating R with applications, March 16

March 15, 2011
A quick reminder that Revolution Analytics' CTO David Champagne will be hosting a live webinar tomorrow (March 16) on Integrating R into 3rd Party and Web Applications Using RevoDeployR. Designed for application developers, this webinar will cover publishing R scripts to the RevoDeployR server, and integrating their results into Web applications, Microsoft Excel, JasperReports Server and more. Complete details...

## New R User Group in Orange County, CA

March 15, 2011
The Orange County R User Group was formed to bring local R users together in a friendly, business-oriented environment. This is the fifth R user group in California. Founder Ray DiGiacomo, Jr. says, "I feel this group is necessary because the current Los Angeles and San Diego R User Groups are quite far from Orange County. Also, Orange County...

## Example 8.30: Compare Poisson and negative binomial count models

March 15, 2011
How similar can a negative binomial distribution get to a Poisson distribution?When confronted with modeling count data, our first instinct is to use Poisson regression. But in practice, count data is often overdispersed. We can fit the overdispersio...

## Want to say one thing and the exact oppositive with strong confidence ?

March 15, 2011
No need to do politics. Just take a statistical course. And I do not talk about misinterpretation of statistics, but I talk about the mathematical foundations of statistical tests. Consider the following parametric test, with a one-dimensional para...

## Chemometrics with R

March 15, 2011
I just heard that my supervisor's book Chemometrics with R was released, and I immediately requested our library to get a copy. Ron introduced me to R at a time that most at our department were still using Matlab. In fact, I had be maintaining Matlab s...

## I’m late for π day

March 15, 2011
It is officially no longer pi day, but I didn’t see this Drew Conway post about estimating pi until just a few minutes ago. Because Google Reader doesn’t show github embeds, I also got to try it without seeing Drew’s … Continue reading →

## How to backtest a strategy in Excel

March 14, 2011
(This is a guest post by Damian from Skill Analytics and ETF Prophet)Let me start by saying that I’m not an expert in backtesting in Excel – there are a load of very smart bloggers out there that have, as I would say, “mad skillz” at working with Excel including (but not limited to) Michael Stokes over...

## UAH Temperature Anomalies Following Predictable Pattern

March 14, 2011
In this post I show one simple  and 2 multiple regression models to assess the role of time, El Nino – La Nina SSTA and volcanic activity (SATO) on UAH global temperature anomaly trends. The 3rd model provides a reasonable  … Continue reading →

## Parallel computation [revised]

March 14, 2011
We have now completed our revision of the parallel computation paper and hope to send it to JCGS within a few days. As seen on the arXiv version, and given the very positive reviews we received, the changes are minor, mostly focusing on the explanation of the principle and on the argument that it comes

## Statistical tests for variable selection

March 14, 2011
I received an email today with the following comment: I’m using ARIMA with Intervention detection and was planning to use your package to identify my initial ARIMA model for later iteration, however I found that sometimes the auto.arima function returns a model where AR/MA coefficients are not significant. So my question is: Is there a

## Happy Pi Day, Now Go Estimate It!

March 14, 2011
As you may know, today is Pi Day, when all good nerds take a moment to thank the geeks of antiquity for their painstaking work in estimating this marvelous mathematical constant. It is also a great opportunity to thank contemporary geeks for the wonders of modern computing, which allow us to estimate pi to near

## R/Finance 2011 Registration Open

March 14, 2011
The registration for R/Finance 2011--which will take place April 29 and 30 in Chicago--is NOW OPEN!Building on the success of the two previous conferences in 2009 and 2010, we are expecting more than 250 attendees from around the world representing bot...

## Amanda Cox on How The New York Times Graphics Department Uses R

March 14, 2011
Last month, Amanda Cox from The New York Times Graphic Department gave a great talk to the NYC R Statistical Programming Meetup. I’ve just got around to uploading the video, which has been broken into a part one and part two. You can also view the videos embedded after the jump. Amanda made use of

## Language used by Academics with the Protection of Anonymity

March 14, 2011
Those in the political science discipline probably remember their first encounter with poliscijobrumors.com. For those outside, you have probably never heard of this particular message board, and you would have no reason to. As the URL suggests, the board specializes in rumor, gossip, back-bitting, mudslinging, and the occasional lucid thread on the political science

## R 2.13.0 scheduled for April 13

March 14, 2011
As announced yesterday by the R Core Team, the next major update to R will be released on April 13. R 2.13.0 is the next major release of R, which gets major updates approximately every six months. This also indicates that R 2.12.2 is the last patch level of the R 2.12 series, and so the next version of...

## R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

## Hacker News Analysis

March 13, 2011
I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

## Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011
This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →

## A Kernel Density Approach to Outlier Detection

March 13, 2011
$A Kernel Density Approach to Outlier Detection$

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

## Eigensheep

March 13, 2011
Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

## Counting Clusters

March 13, 2011
Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →