Visualizing Growth of a Retail Chain

March 15, 2011
By
Visualizing Growth of a Retail Chain

I am a regular reader of the FlowingData blog by Nathan Yau. It is an excellent reference for anyone interested in statistical visualization of data. One of his posts that caught my attention was a visualization of the growth of Walmart in the US. Given my research interests in retail, it was a fascinating insight

Read more »

More pi plus 1 (or plus 0.01) day fun

March 15, 2011
By
More pi plus 1 (or plus 0.01) day fun

Since I just didn’t get enough this morning, I spent some more time fooling around with estimating pi. Since I was basically counting the number of random x,y pairs inside a quarter circle and computing a sample average for more … Continue reading →

Read more »

RStudio: My thoughts

March 15, 2011
By
RStudio: My thoughts

Let me get this out of the way: I just love RStudio.Created by a team lead by JJ Allaire, a name that should ring a bell if you were involved in web development during the Clinton administration, RStudio is an R IDE that is actually designed for R from...

Read more »

RStudio: My thoughts

March 15, 2011
By
RStudio: My thoughts

Let me get this out of the way: I just love RStudio.Created by a team lead by JJ Allaire, a name that should ring a bell if you were involved in web development during the Clinton administration, RStudio is an R IDE that is actually designed for R from...

Read more »

Webinar on integrating R with applications, March 16

March 15, 2011
By

A quick reminder that Revolution Analytics' CTO David Champagne will be hosting a live webinar tomorrow (March 16) on Integrating R into 3rd Party and Web Applications Using RevoDeployR. Designed for application developers, this webinar will cover publishing R scripts to the RevoDeployR server, and integrating their results into Web applications, Microsoft Excel, JasperReports Server and more. Complete details...

Read more »

New R User Group in Orange County, CA

March 15, 2011
By

The Orange County R User Group was formed to bring local R users together in a friendly, business-oriented environment. This is the fifth R user group in California. Founder Ray DiGiacomo, Jr. says, "I feel this group is necessary because the current Los Angeles and San Diego R User Groups are quite far from Orange County. Also, Orange County...

Read more »

Example 8.30: Compare Poisson and negative binomial count models

March 15, 2011
By
Example 8.30:  Compare Poisson and negative binomial count models

How similar can a negative binomial distribution get to a Poisson distribution?When confronted with modeling count data, our first instinct is to use Poisson regression. But in practice, count data is often overdispersed. We can fit the overdispersio...

Read more »

Want to say one thing and the exact oppositive with strong confidence ?

March 15, 2011
By
Want to say one thing and the exact oppositive with strong confidence ?

No need to do politics. Just take a statistical course. And I do not talk about misinterpretation of statistics, but I talk about the mathematical foundations of statistical tests. Consider the following parametric test, with a one-dimensional para...

Read more »

Chemometrics with R

March 15, 2011
By
Chemometrics with R

I just heard that my supervisor's book Chemometrics with R was released, and I immediately requested our library to get a copy. Ron introduced me to R at a time that most at our department were still using Matlab. In fact, I had be maintaining Matlab s...

Read more »

I’m late for π day

March 15, 2011
By
I’m late for π day

It is officially no longer pi day, but I didn’t see this Drew Conway post about estimating pi until just a few minutes ago. Because Google Reader doesn’t show github embeds, I also got to try it without seeing Drew’s … Continue reading →

Read more »

How to backtest a strategy in Excel

March 14, 2011
By

(This is a guest post by Damian from Skill Analytics and ETF Prophet)Let me start by saying that I’m not an expert in backtesting in Excel – there are a load of very smart bloggers out there that have, as I would say, “mad skillz” at working with Excel including (but not limited to) Michael Stokes over...

Read more »

How to backtest a strategy in Excel

March 14, 2011
By

(This is a guest post by Damian from Skill Analytics and ETF Prophet) Let me start by saying that I’m not an expert in backtesting in Excel – there are a load of very smart bloggers out there that have, as I would say, “mad skillz” at working with Excel including (but not limited to) Michael Stokes over...

Read more »

UAH Temperature Anomalies Following Predictable Pattern

March 14, 2011
By
UAH Temperature Anomalies Following Predictable Pattern

In this post I show one simple  and 2 multiple regression models to assess the role of time, El Nino – La Nina SSTA and volcanic activity (SATO) on UAH global temperature anomaly trends. The 3rd model provides a reasonable  … Continue reading →

Read more »

Parallel computation [revised]

March 14, 2011
By
Parallel computation [revised]

We have now completed our revision of the parallel computation paper and hope to send it to JCGS within a few days. As seen on the arXiv version, and given the very positive reviews we received, the changes are minor, mostly focusing on the explanation of the principle and on the argument that it comes

Read more »

Statistical tests for variable selection

March 14, 2011
By

I received an email today with the following comment: I’m using ARIMA with Intervention detection and was planning to use your package to identify my initial ARIMA model for later iteration, however I found that sometimes the auto.arima function returns a model where AR/MA coefficients are not significant. So my question is: Is there a

Read more »

Happy Pi Day, Now Go Estimate It!

March 14, 2011
By
Happy Pi Day, Now Go Estimate It!

As you may know, today is Pi Day, when all good nerds take a moment to thank the geeks of antiquity for their painstaking work in estimating this marvelous mathematical constant. It is also a great opportunity to thank contemporary geeks for the wonders of modern computing, which allow us to estimate pi to near

Read more »

R/Finance 2011 Registration Open

March 14, 2011
By

The registration for R/Finance 2011--which will take place April 29 and 30 in Chicago--is NOW OPEN!Building on the success of the two previous conferences in 2009 and 2010, we are expecting more than 250 attendees from around the world representing bot...

Read more »

R/Finance 2011 Registration Open

March 14, 2011
By

The registration for R/Finance 2011--which will take place April 29 and 30 in Chicago--is NOW OPEN!Building on the success of the two previous conferences in 2009 and 2010, we are expecting more than 250 attendees from around the world representing bot...

Read more »

Amanda Cox on How The New York Times Graphics Department Uses R

March 14, 2011
By

Last month, Amanda Cox from The New York Times Graphic Department gave a great talk to the NYC R Statistical Programming Meetup. I’ve just got around to uploading the video, which has been broken into a part one and part two. You can also view the videos embedded after the jump. Amanda made use of

Read more »

Language used by Academics with the Protection of Anonymity

March 14, 2011
By
Language used by Academics with the Protection of Anonymity

Those in the political science discipline probably remember their first encounter with poliscijobrumors.com. For those outside, you have probably never heard of this particular message board, and you would have no reason to. As the URL suggests, the board specializes in rumor, gossip, back-bitting, mudslinging, and the occasional lucid thread on the political science

Read more »

R 2.13.0 scheduled for April 13

March 14, 2011
By

As announced yesterday by the R Core Team, the next major update to R will be released on April 13. R 2.13.0 is the next major release of R, which gets major updates approximately every six months. This also indicates that R 2.12.2 is the last patch level of the R 2.12 series, and so the next version of...

Read more »

R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
By
R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

Read more »

R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
By
R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate da...

Read more »

Hacker News Analysis

March 13, 2011
By
Hacker News Analysis

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

Read more »

Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011
By
Piiikaaachuuuuuu vs. KHAAAAAN!

This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →

Read more »

A Kernel Density Approach to Outlier Detection

March 13, 2011
By
A Kernel Density Approach to Outlier Detection

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

Read more »

Eigensheep

March 13, 2011
By
Eigensheep

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

Read more »

Counting Clusters

March 13, 2011
By
Counting Clusters

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →

Read more »

RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
By
RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

Read more »