**OUseful.Info, the blog... » Rstats**, and kindly contributed to R-bloggers)

I spent most of today, err, yesterday, failing to hold back the tears as the medal performances from the Team GB Olympians kept rolling in… So to celebrate one of those wonderful performances, here are a couple of quick sketches of how Jessica Ennis made her medal in the Heptathlon. (The data is cut and pasted from the BBC website and available here: data; the script used to generate the image is pasted below.)

And a zoom in above 600 points…

Another view of the over 600 point scores, by athlete:

I think the original data I grabbed included info about whether scores were season bests or personal bests, which could also be used to add richness to the chart, for example, using colour or different symbols to denote SB or PB. I also considered a colouring based on track vs field, or track vs throwing vs jumping, to see whether or not we could identify athletes with strong preferences in any of those areas, but it’s getting a bit late/early hours and I need to get some sleep!

For a proper infographic based on the macroscopic view presented by the top two charts, it would probably make sense to use icons rather than text to identify each event, as well as denoting PB/SB; if you peer at the top two charts closely, you’ll notice there’s a dot marking the point score for each Athlete in each event. If we let icons float a little to avoid collisions, we could use an arrow or other connective pointing device to associate event icons with the corresponding Points registration point.

As and when results come in from the Olympics relating to the medal winning performances in the pure events (Women’s Javelin, 800m etc) it could be quite interesting comparing those to the heptathlon performances? A comparison of where the three heptathlon medallists would have finished in each of the pure events might also be interesting (would any of them have made any of the finals, or semi-finals, for example?)

Here’s the script I used to generate the plots…

hd <- read.csv("~/Downloads/Heptathlon - Sheet2.csv") #Generate a summary stats table overall=aggregate(hd$Points, by=list(Athlete=hd$Athlete), FUN=sum) #And order it by result overall[order(overall$x,decreasing=T),] #Mix in overal points to the original data hd2=merge(hd,overall,by='Athlete') #and then use the overall points data to reorder Athlete factors accordingly hd2= transform(hd2, Athlete=reorder(Athlete, x) ) #Now generate the plot (this was the first sketch that came to mind...) require(ggplot2) ggplot(hd2)+geom_point(aes(x=Athlete,y=Points,col=Event),size=1) + geom_text(aes(x=Athlete,y=Points,label=Event,col=Event),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) #for over 600 points: +ylim(600,1200) #Faceted plot of high points achieving events by athlete hd2$Event = with(hd2, factor(Event, levels = c('100m Hurdles','High Jump','Shot Put','200m','Long Jump','Javelin','800m'))) ggplot(hd2) + geom_point(aes(Event,Points)) + facet_wrap( ~ Athlete) + ylim(500,1200) + opts(axis.text.x=theme_text(angle=90,size=5))

Please let me know via the comments if you come up with any other interesting views over this data…:-)

PS It suddenly struck me that there may be variability in the range of points awarded in each discipline, so I threw a quick chart together to explore that:

Additional code is:

`ggplot(hd2) + geom_point(aes(x=Event,y=Points),size=1) + geom_text(aes(x=Event,y=Points,label=Athlete),size=2,,angle=45) + opts(axis.text.x=theme_text(angle=90,size=5),title="2012 Olympics Heptathlon") + scale_x_discrete(expand = c(0.05,0)) + ylim(600,1200)`

I guess I should do some distribution plots too? And maybe figure in personal performances relative to eg median scores in each event?

**leave a comment**for the author, please follow the link and comment on his blog:

**OUseful.Info, the blog... » Rstats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...