Below is a chart of the top 20 offensive players based on FanGraphs WAR for the 2011 season. The various features and their corresponding metric are clear in the image. I’ve also included the leader and last place for each … Continue reading →

Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect 0.1-7. This release includes a number of improvements and new features: Performance improvements when writing large xlsx files New workbook data extraction & replacement operators [, [<-, [[, … Continue reading →

Bill Bolstad's response to Xi'an's review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting: Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a probability distribution in the parameter

For those who naturally compute portfolio returns correctly here are some lessons in how to do it wrong. The data Random portfolios were generated from constituents of the S&P 500 with constraints: long-only exactly 20 assets in the portfolio no more than 10% weight for any asset (just for fun) the sum of the 5 … Continue reading...

Exercise 4 required implementing Logistic Regression using Newton's Method. The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student's scores on...

As a follow-up to my isarithmic maps of county electoral data, I have attempted to experiment with extending the technique in two ways. First, where the electoral maps are based on data aggregated to the county level, I have sought to generalize the method to accept individual responses for which only zip code data is … Continue reading →

Last week a question came up on Stack Overflow about determining whether a variable is distributed normally. Some of the answers reminded me of a common and pervasive misconception about how to apply tests against normality. I felt the topic was general enough to reproduce my comments here (with minor edits). Misconception: If your statistical analysis requires normality, it is

How was the Netflix Prize won? I went through a lot of the Netflix Prize papers a couple years ago, so I’ll try to give an overview of the techniques that went into the winning solution here. Normalization of Global Effects Suppose Alice rates Inception 4 stars. We can think of this rating as composed of...

Bill Bolstad wrote a reply to my review of his book Understanding computational Bayesian statistics last week and here it is, unedited except for the first paragraph where he thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed

In my last few posts, I have been discussing some of the consequences of the slow decay rate of the tail of the Pareto type I distribution, along with some other, closely related notions, all in the context of continuously distributed data. Today’s post considers the Zipf distribution for discrete data, which has come to be extremely popular as...

My previous post talked about how we can employ PCA on the data for multiple stock returns to reduce the number of variables in explaining the variance of the underlying data. But the idea was greeted with skepticism by many. A caveat to the applicatio...

This binary package supports R 2.13.x (32-bit/64-bit) and MySQL 5.5.16 (32-bit/64-bit). RMySQL 0.8-0 for MySQL 5.5.16

Rose Hoffmann, AP Statistics teacher at Catholic Memorial High School in Waukesha, WI sent the following note to the Revolution Analytics team: In August 2010, my husband who is a statistician attended the American Statistical convention. Your company gave out the flying monkey with a black cape ... He gave me the monkey since it was my first year...

There are several blog posts, websites (and even books) explaining the transition from using another statistical system (e.g. SAS, SPSS, Stata, etc) to relying on R. Most of that material treats the topic from the point of view of i- … Continue reading →

(Hadley Wickham, author of ggplot2 and several other R packages, guest blogs today about forthcoming big-data improvements to his R graphics package -- ed.) Hi! I'm Hadley Wickham and I'm guest posting on the Revolutions blog to give you a taste of some of the visualisation work that my research team and I worked on this summer. This work...

Financial market is interesting place, you find people taking positions (buying/selling) based on their expectations of what the security prices would be and are rewarded/penalized according to the accuracy of their expectations. The beauty of financia...

I stumbled upon this chart in the R Graph Gallery, which got me thinking someone could come up with a Volume by Price chart using R. Such charts can be useful to determine support and resistance levels, as they illustrate amount of volume for different price ranges. Below is my first attempt at this. Note

If you haven't yet taken a look at Revolution R Enterprise but wanted to know what is adds to open-source R, the slides below from yesterday's webinar will give you a quick overview: A recorded replay with audio of the me giving the presentation is also available at the link below. Revolution Analytics Webinars: Revolution R Enterprise: 100% R...