[This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats.

As an initial example we will consider the English legend Sir Ian Botham who played 102 test matches for England between his debut in 1977 until his final game in 1992.

The first obvious breakdown is to consider how Botham performed against the six countries that he played against during his test career. A summary of his statistics are shown here:

``` Opposition Matches Bat Inns Runs NO Bowl Inns Wicket Catch
Australia      36       49 1673  2       66     148    57
India      14       16 1201  0       23      59    14
New Zealand      15       22  846  2       28      64    14
Pakistan      14       20  647  1       18      40    14
Sri Lanka       3        3   41  0        6      11     2
West Indies      20       37  792  1       27      61    19```

Botham only played three matches against Sri Lanka so it is difficult to properly assess his performance against them. If the above table is stored in a data frame itb.opp then we can create a histogram of the total runs (or wickets) by opposition country:

```ggplot(itb.opp, aes(Opposition, Runs)) + geom_bar() + xlab("Country") +
ylab("Total Runs")```

This code produces the following graph:

IT Botham Total Runs by Opposition

The total wickes graph is produced by the next code:

```ggplot(itb.opp, aes(Opposition, Wicket)) + geom_bar() + xlab("Country") +
ylab("Total Wickets")```

IT Botham Total Wickets by Opposition

We may now want to delve deeper into the performance against different nations to take into account the number of games or innings where Botham batted or bowled. The traditional way to assess performance is to calculate batting and bowling averages and we can do this by opposition which provides the following data frame:

```> itb.opp.sum
Opposition Discipline  Average
Australia    Batting 29.35088
India    Batting 70.64706
New Zealand    Batting 42.30000
Pakistan    Batting 32.35000
Sri Lanka    Batting 13.66667
West Indies    Batting 21.40541
Australia    Bowling 27.65541
India    Bowling 26.40678
New Zealand    Bowling 23.43750
Pakistan    Bowling 31.77500
Sri Lanka    Bowling 28.18182
West Indies    Bowling 35.18033```

This can be converted into a dot plot so we can see whether Botham had a high batting average than bowling average, which is often taken to be one of the signs of an all-rounder.

```ggplot(itb.opp.sum, aes(Average, Opposition, colour = Discipline)) +
geom_point()+ xlab("Average") + ylab("")```

The graph is shown here:

IT Botham Batting and Bowling Averages by Opposition

We can see the differences in performance based on the opposition. Botham’s performance against the West Indies, by far the strongest team during most of his international career, were worse than against the other countries. However, his averages were far from embarassing when compared to other players at the time. The graph also shows that Botham enjoyed batting and bowling against India.

We can divide this data further based on whether the matches were played in England or outside of England and this data is shown here:

```> itb.opp.ha.sum
Opposition Venue Discipline  Average
Australia  Away    Batting 30.22581
India  Away    Batting 61.55556
New Zealand  Away    Batting 50.44444
Pakistan  Away    Batting 16.00000
Sri Lanka  Away    Batting 13.00000
West Indies  Away    Batting 14.17647
Australia  Home    Batting 28.30769
India  Home    Batting 80.87500
New Zealand  Home    Batting 35.63636
Pakistan  Home    Batting 34.16667
Sri Lanka  Home    Batting 14.00000
West Indies  Home    Batting 27.55000
Australia  Away    Bowling 28.44928
India  Away    Bowling 25.53333
New Zealand  Away    Bowling 27.44444
Pakistan  Away    Bowling 45.00000
Sri Lanka  Away    Bowling 21.66667
West Indies  Away    Bowling 39.50000
Australia  Home    Bowling 26.96203
India  Home    Bowling 27.31034
New Zealand  Home    Bowling 20.51351
Pakistan  Home    Bowling 31.07895
Sri Lanka  Home    Bowling 30.62500
West Indies  Home    Bowling 31.97143```

A dot plot is created from this data with a separate panel for each of the six opposition countries and the averages divided into batting and bowling performances. The coloured dots in the graph indicated whether the average is for matches at home or away.

```ggplot(itb.opp.ha.sum, aes(Average, Discipline, colour = Venue)) +
geom_point() + facet_wrap( ~ Opposition) +
xlab("Batting Average") + ylab("")```

This graph is shown below:

IT Botham Batting and Bowling Averages by Country and Home/Away

We can see that the difference between home and away peformance is, in general, not very large for bowling averages but in some cases there is a noticeable difference in batting averages. When looking at Botham’s performances against the West Indies his statistics at home are much better than his away performance, suggesting that his main struggles against the strong West Indies team were in the Caribbean. This might be due to his swing bowling being more suitable to English conditions compared to pitches in the West Indies.

To round off this brief look at the career of IT Botham let us consider some other important statistics, in particular games where he performed with the bat and ball.

• Overall Botham scored 14 hundreds and 22 fifties out of 161 innings so he reached fifty runs every five innings or so.
• He also took 27 five wicket hauls and 17 four wicket hauls so he took four or more wickets every four innings or so.
• He took 120 catches.

Individual matches of excellence include five games with a century and at least five wickets:

```Year  Opposition       Ground Venue Runs Wicket
1978 New Zealand Christchurch  Away  133      8
1978    Pakistan       Lord's  Home  108      8
1980       India       Mumbai  Away  114     13
1981   Australia        Leeds  Home  199      7
1984 New Zealand   Wellington  Away  138      6```

These performances and others show why Botham was considered such a great player as he produced some sustained periods of excellent all-round cricket rather than having one discipline more dominant for a long period of time.