At A Glance View of the 2012 Olympics Heptathlon Performances

August 4, 2012
I spent most of today, err, yesterday, failing to hold back the tears as the medal performances from the Team GB Olympians kept rolling in… So to celebrate one of those wonderful performances, here are a couple of quick sketches of how Jessica Ennis made her medal in the Heptathlon. (The data is cut and

And Now I Blog Again

August 4, 2012
One of my goals for 2012 has been to blog more. Much more. When I first set this goal, I had great aspirations of posting frequently. However, I had a Ph.D. to complete, and quite frankly, it demanded much higher priority. Now that I have submitted my ...

Getting Started Using R, Part 1: RStudio

August 4, 2012
Despite my preference for SAS over R, there are some add-ons to “basic” R that I’ve found that have made my learning process way easier.  While I’m still in my infancy in learning R, I feel like once I found … Continue reading →Getting Started Using R, Part 1: RStudio is an article from randyzwitch.com,...

Discriminating Between Iris Species

August 4, 2012
The Iris data set is a famous for its use to compare unsupervised classifiers. The goal is to use information about flower characteristics to accurately classify the 3 species of Iris. We can look at scatter plots of the 4 variables in the data set and see that no single variable nor bivariate combination can achieve this. One approach to improve the separation

August 4, 2012
Transformation of axes in R

August 4, 2012
As a general rule, you should not transform your data to try to fit a linear model. But proportions can be tricky. If the proportion data do not arise from a binomial process (e.g., proportion of a leaf consumed by a caterpillar), then transformation is still the best option. In an excellent paper, David Warton*

Surveys continue to rank R #1 for Data Mining

August 3, 2012
KDnuggets recently posted its annual poll on data mining software, and the R language retains its #1 ranking as the most commonly-used software for data mining: R is now used by 52.5% of poll respondents, compared with 45% last year. Donnie Berkholz provides an analysis of the year-on-year trends for Redmonk. He provides the chart below, and notes "the...

Horizon Plots in Base Graphics

August 3, 2012
for background please see prior posts More on Horizon Charts, Application of Horizon Plots, Horizon Plot Already Available, and Cubism Horizon Charts in R There are three primary graphics routes in R (base graphics, lattice, and ggplot2), and each have...

2012 Olympics Swimming – 100m Butterfly Men Finals prediction

August 3, 2012
2012 Olympics Swimming - 100m Butterfly Men Finals prediction Author: Matt Malin Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results. Code setup library(XML) library(ggplot2) swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200" swimming_data <- readHTMLTable( readLines(swimming_path), which = 3, stringsAsFactors...

R training: Visualization, Big Data, Data Mining, and Marketing Analytics

August 2, 2012
Revolution Analytics is hosting several live and online courses over the next couple of months that will be of interest to R users looking to hone their skills: Visualization in R with ggplot2. Garrett Grolemund and Winston Chang instruct how to use the ggplot2 package to make, format, label and adjust graphs using R. (August 28, Redwood City, CA.)...