## At A Glance View of the 2012 Olympics Heptathlon Performances

August 4, 2012
I spent most of today, err, yesterday, failing to hold back the tears as the medal performances from the Team GB Olympians kept rolling in… So to celebrate one of those wonderful performances, here are a couple of quick sketches of how Jessica Ennis made her medal in the Heptathlon. (The data is cut and

## And Now I Blog Again

August 4, 2012
One of my goals for 2012 has been to blog more. Much more. When I first set this goal, I had great aspirations of posting frequently. However, I had a Ph.D. to complete, and quite frankly, it demanded much higher priority. Now that I have submitted my ...

## Getting Started Using R, Part 1: RStudio

August 4, 2012
Despite my preference for SAS over R, there are some add-ons to “basic” R that I’ve found that have made my learning process way easier.  While I’m still in my infancy in learning R, I feel like once I found … Continue reading →Getting Started Using R, Part 1: RStudio is an article from randyzwitch.com,...

## Discriminating Between Iris Species

August 4, 2012
The Iris data set is a famous for its use to compare unsupervised classifiers. The goal is to use information about flower characteristics to accurately classify the 3 species of Iris. We can look at scatter plots of the 4 variables in the data set and see that no single variable nor bivariate combination can achieve this. One approach to improve the separation

August 4, 2012
## Transformation of axes in R

August 4, 2012
As a general rule, you should not transform your data to try to fit a linear model. But proportions can be tricky. If the proportion data do not arise from a binomial process (e.g., proportion of a leaf consumed by a caterpillar), then transformation is still the best option. In an excellent paper, David Warton*

## Surveys continue to rank R #1 for Data Mining

August 3, 2012
KDnuggets recently posted its annual poll on data mining software, and the R language retains its #1 ranking as the most commonly-used software for data mining: R is now used by 52.5% of poll respondents, compared with 45% last year. Donnie Berkholz provides an analysis of the year-on-year trends for Redmonk. He provides the chart below, and notes "the...

## Horizon Plots in Base Graphics

August 3, 2012
for background please see prior posts More on Horizon Charts, Application of Horizon Plots, Horizon Plot Already Available, and Cubism Horizon Charts in R There are three primary graphics routes in R (base graphics, lattice, and ggplot2), and each have...

## 2012 Olympics Swimming – 100m Butterfly Men Finals prediction

August 3, 2012
2012 Olympics Swimming - 100m Butterfly Men Finals prediction Author: Matt Malin Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results. Code setup library(XML) library(ggplot2) swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200" swimming_data <- readHTMLTable( readLines(swimming_path), which = 3, stringsAsFactors...

## R training: Visualization, Big Data, Data Mining, and Marketing Analytics

August 2, 2012
Revolution Analytics is hosting several live and online courses over the next couple of months that will be of interest to R users looking to hone their skills: Visualization in R with ggplot2. Garrett Grolemund and Winston Chang instruct how to use the ggplot2 package to make, format, label and adjust graphs using R. (August 28, Redwood City, CA.)...

## plotting raster data in R: adjusting the labels and colors of a classified raster

August 2, 2012
Thank’s to Andrej who wrote this comment: “Is it possible to to color the resulting 12 clusters within your original image to get a feel for visual separation?” You can do so: But how to get values at a location? You will need these values to determine whether the defined class is representing a water

## Who wants to maintain pgfSweave?

August 2, 2012
So the time has come for me to face the fact that I have no time to maintain pgfSweave. It was recently archived because I didn’t make necessary changes to comply with some CRAN policies. SO, I need someone to step up to the plate to make some tweakes, put it back up on CRAN

## Spacing of multi-panel figures in R

August 2, 2012
In a previous post, I showed how to keep text and symbols at the same size across figures that have different numbers of panels. The figures in that post were ugly because they used the default panel spacing associated with the mfrow argument of the par( ) function. Below I will walk through how to

## How do you say “We Will Do Whatever It Takes” in Thai?

August 2, 2012
As the market has already started to poke holes in Draghi’s promise, I thought it would be good to continue the series of posts that I began with the British version “We Will Do Whatever it Takes” with my favorite article written during the Asia ...

## Data Parallelism Using Oracle R Enterprise

August 2, 2012
Modern computer processors are adequately optimized for many statistical calculations, but large data operations may require hours or days to return a result.  Oracle R Enterprise (ORE), a set of R packages designed to process large data computations in Oracle Database, can run many R operations in parallel, significantly reducing processing time. ORE supports parallelism through the transparency layer,...

## Multivariate Data Analysis Work Flow

August 2, 2012
Here is an example of a data analysis work flow supported in imDEV. This network visualization was made using CmapTools.

August 2, 2012
Handling meta-data is not natural in R, or any traditional rectangular shaped type data storage system.There are several tricks and packages which attempt to solve this problem, with Hmisc using the atrribute feature and the IRange package having its o...

## CFP: AusDM 2012, deadline extended to 31 August 2012

August 2, 2012
The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia 5-7 December 2012 http://ausdm12.togaware.com/ Deadline extended to 31 August 2012 The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data … Continue reading →

## unsupervised classification of a Landsat image in R: the whole story or part two

August 1, 2012
The main question when using remote sensed raster data, as we do, is the question of NaN-treatment. Many R functions are able to use an option like rm.NaN=TRUE to treat these missing values. In our case the kmeans function in R is not capable to use such a parameter. After reading the tif-files and creating

## More on Horizon Charts

August 1, 2012
for background please see prior posts Application of Horizon Plots, Horizon Plot Already Available, and Cubism Horizon Charts in R Some feedback has led me to think that I might have been a little ambitious with my last post on horizon charts. I though...

## Genetic algorithms: a simple R example

August 1, 2012
Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature. GA generates a population, the individuals in this population (often called chromosomes) have  Read more »

## Analytics for Marketing online training 25 – 28 September 2012

August 1, 2012
I am excited to be giving the Analytics for Marketing online training course on 25-28 September 2012. Sign up before 25 August 2012 for the early bird discount. Our friends at Revolution Analytics who will provide the infrastructure to host the event. Update: For clarification, this is an online, instructor led training course. We are...

## Bio7 1.6 for Windows and Linux released!

August 1, 2012
01.08.2012 Finally i released a new version of Bio7 with many improvements and new features. Updated tutorials are available, too. The new Bio7 1.6 release can be downloaded here. Please also download the examples *.zip file from the sourceforge website which contains new examples for Bio7 1.6 (e.g. an example to cluster an image folder with

August 1, 2012
If you haven't made the plunge yet to making R graphics with Hadley Wickham's ggplot2 package, his "ggplot2 basics" slides (from the recent Introduction to Data Visualization and Analysis course at JSM) is a good place to start. Once you get the hang of the "grammar of graphics" notation, you'll be building beautiful data visualizations like this or this...

## Creating a text grob that automatically adjusts to viewport size

August 1, 2012
I recently wanted to construe a dashboard widget that contains some text and other elements using the grid graphics system. The size available for the widget will vary. When the sizes for the elements of the grobs in the widget are specified as Normalised Parent Coordinates the size adjustments happen automatically. Text does not automatically adjust though. The

## Olympic body match and 1:1 BMI

August 1, 2012
In my morning attempt to read the whole internet before beginning work, I came across a program on the BBC website which allows you to see which Olympic athletes are your body doubles. Or rather, which athletes share your height and weight, and therefore your body mass index. Being a Canadian, I exist in an

## Building a presentation, report or paper in R

August 1, 2012
If you need to build a presentation, obviously you have following options: Powerpoint alike presentation Online engines LaTex The first two are beloved by business people and the third one is widely used in academia. The objective of the first group is shiny presentation, contrary to the second where asceticism and demand for automation are

## Examples of profiling R code

August 1, 2012
by Yanchang Zhao, RDataMining.com Below are simple examples of profiling R code, which help to find out which steps or functions are most time consuming. It is very useful for improving efficiency of R code. # profiling of running time … Continue reading →

## Trying Julia

August 1, 2012
In my previous post I tried building Williams designs in R. Since that code was running a bit slow, this was an ideal test for Julia. Big enough to be at least slightly realistic, small enough that it is doable.I am very impressed. Almost twenty fold s...