Monthly Archives: August 2012

Discriminating Between Iris Species

August 4, 2012
By
Discriminating Between Iris Species

The Iris data set is a famous for its use to compare unsupervised classifiers. The goal is to use information about flower characteristics to accurately classify the 3 species of Iris. We can look at scatter plots of the 4 variables in the data set and see that no single variable nor bivariate combination can achieve this. One approach to improve the separation

Read more »

Feature Comparison of Sweave (R+LaTeX) Tools: TeXmaker vs RStudio

August 4, 2012
By
Feature Comparison of Sweave (R+LaTeX) Tools: TeXmaker vs RStudio

link to the document

Read more »

Transformation of axes in R

August 4, 2012
By
Transformation of axes in R

As a general rule, you should not transform your data to try to fit a linear model. But proportions can be tricky. If the proportion data do not arise from a binomial process (e.g., proportion of a leaf consumed by a caterpillar), then transformation is still the best option. In an excellent paper, David Warton*

Read more »

Surveys continue to rank R #1 for Data Mining

August 3, 2012
By
Surveys continue to rank R #1 for Data Mining

KDnuggets recently posted its annual poll on data mining software, and the R language retains its #1 ranking as the most commonly-used software for data mining: R is now used by 52.5% of poll respondents, compared with 45% last year. Donnie Berkholz provides an analysis of the year-on-year trends for Redmonk. He provides the chart below, and notes "the...

Read more »

Horizon Plots in Base Graphics

August 3, 2012
By
Horizon Plots in Base Graphics

for background please see prior posts More on Horizon Charts, Application of Horizon Plots, Horizon Plot Already Available, and Cubism Horizon Charts in R There are three primary graphics routes in R (base graphics, lattice, and ggplot2), and each have...

Read more »

2012 Olympics Swimming – 100m Butterfly Men Finals prediction

August 3, 2012
By
2012 Olympics Swimming – 100m Butterfly Men Finals prediction

2012 Olympics Swimming - 100m Butterfly Men Finals prediction Author: Matt Malin Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results. Code setup library(XML) library(ggplot2)swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200"swimming_data <- readHTMLTable( readLines(swimming_path), which = 3, stringsAsFactors...

Read more »

R training: Visualization, Big Data, Data Mining, and Marketing Analytics

August 2, 2012
By

Revolution Analytics is hosting several live and online courses over the next couple of months that will be of interest to R users looking to hone their skills: Visualization in R with ggplot2. Garrett Grolemund and Winston Chang instruct how to use the ggplot2 package to make, format, label and adjust graphs using R. (August 28, Redwood City, CA.)...

Read more »

plotting raster data in R: adjusting the labels and colors of a classified raster

August 2, 2012
By
plotting raster data in R: adjusting the labels and colors of a classified raster

Thank’s to Andrej who wrote this comment: “Is it possible to to color the resulting 12 clusters within your original image to get a feel for visual separation?” You can do so: But how to get values at a location? You will need these values to determine whether the defined class is representing a water

Read more »

Who wants to maintain pgfSweave?

August 2, 2012
By

So the time has come for me to face the fact that I have no time to maintain pgfSweave. It was recently archived because I didn’t make necessary changes to comply with some CRAN policies. SO, I need someone to step up to the plate to make some tweakes, put it back up on CRAN

Read more »

Spacing of multi-panel figures in R

August 2, 2012
By
Spacing of multi-panel figures in R

In a previous post, I showed how to keep text and symbols at the same size across figures that have different numbers of panels. The figures in that post were ugly because they used the default panel spacing associated with the mfrow argument of the par( ) function. Below I will walk through how to

Read more »