**Giga thoughts ... » R**, and kindly contributed to R-bloggers)

# Introduction

This should be last in the series of posts based on my R package **cricketr**. That is, unless some bright idea comes trotting along and light bulbs go on around my head.

In this post cricketr adapts to the Twenty20 International format. Now cricketr can handle stats from all 3 formats of the game namely Test matches, ODIs and Twenty20 International from ESPN Cricinfo. You should be able to install the package from GitHub and use the many of the functions available in the package.

Please be mindful of the ESPN Cricinfo Terms of Use

You can also read this post at Rpubs as twenty20-cricketr. Download this report as a PDF file from twenty20-cricketr.pdf

I have chosen the Top 4 batsmen and top 4 bowlers based on ICC rankings and/or number of matches played.

**Batsmen**

- Virat Kohli (Ind)
- Faf du Plessis (SA)
- A J Finch (Aus)
- Brendon McCullum (Aus)

**Bowlers**

- Samuel Badree (WI)
- Sunil Narine (WI)
- Ravichander Ashwin (Ind)
- Ajantha Mendis (SL)

I have explained the plots and added my own observations. Please feel free to draw your conclusions!

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Sachin’s profile is 253802. This can be used to get the data for Virat Kohli as shown below

```
library(devtools)
install_github("tvganesh/cricketr")
library(cricketr)
```

The data for a particular player can be obtained with the getPlayerData() function. To do you will need to go to ESPN CricInfo Player and type in the name of the player for e.g Virat Kohli, Sunil Narine etc. This will bring up a page which have the profile number for the player e.g. for Virat Kohli this would be http://www.espncricinfo.com/india/content/player/253802.html. Hence, Kohlis profile is 253802. This can be used to get the data for Virat Kohli as shown below

`kohli <- getPlayerDataTT(253802,dir="..",file="kohli.csv",type="batting")`

The analysis is included below

## Analyses of Batsmen

The following plots gives the analysis of the 4 ODI batsmen

- Virat Kohli (Ind) – Innings-26, Runs-972, Average-46.28,Strike Rate-131.70
- Faf du Plessis (SA) – Innings-24, Runs-805, Average-42.36,Strike Rate-135.75
- A J Finch (Aus) – Innings-22, Runs-756, Average-39.78,Strike Rate-152.41
- Brendon McCullum (NZ) – Innings-70, Runs-2140, Average-35.66,Strike Rate-136.21

## Plot of 4s, 6s and the scoring rate in ODIs

The 3 charts below give the number of

- 4s vs Runs scored
- 6s vs Runs scored
- Balls faced vs Runs scored A regression line is fitted in each of these plots for each of the ODI batsmen

A. Virat Kohli

– The 1st plot shows that Kohli approximately hits about 5 4’s on his way to the 50s

– The 2nd box plot of no of 6s and runs shows the range of runs when Kohli scored 1,2 or 4 6s. The dark line in the box shows the average runs when he scored those number of 6s. So when he scored 1 6 the average runs he scored was 45

– The 3rd plot shows the number of runs scored against the balls faced. It can be seen when Kohli faced 50 balls he had scored around ~ 70 runs

```
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./kohli.csv","Kohli")
batsman6s("./kohli.csv","Kohli")
batsmanScoringRateODTT("./kohli.csv","Kohli")
```

`dev.off()`

```
## null device
## 1
```

B. Faf du Plessis

```
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./plessis.csv","Du Plessis")
batsman6s("./plessis.csv","Du Plessis")
batsmanScoringRateODTT("./plessis.csv","Du Plessss")
```

`dev.off()`

```
## null device
## 1
```

C. A J Finch

```
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./finch.csv","A J Finch")
batsman6s("./finch.csv","A J Finch")
batsmanScoringRateODTT("./finch.csv","A J Finch")
```

`dev.off()`

```
## null device
## 1
```

D. Brendon McCullum

```
par(mfrow=c(1,3))
par(mar=c(4,4,2,2))
batsman4s("./mccullum.csv","McCullum")
batsman6s("./mccullum.csv","McCullum")
batsmanScoringRateODTT("./mccullum.csv","McCullum")
```

`dev.off()`

```
## null device
## 1
```

## Relative Mean Strike Rate

This plot shows the Mean Strike Rate of the batsman in each run range. It can be seen the A J Finch has the best strike rate followed by B McCullum.

```
par(mar=c(4,4,2,2))
frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeBatsmanSRODTT(frames,names)
```

## Relative Runs Frequency Percentage

The plot below provides the average runs scored in each run range 0-5,5-10,10-15 etc. Clearly Kohli has the most runs scored in most of the runs ranges. . This is also evident in the fact that Kohli has the highest average. He is followed by McCullum

```
frames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
relativeRunsFreqPerfODTT(frames,names)
```

## Percent 4’s,6’s in total runs scored

The plot below shows the percentage of runs scored by way of 4s and 6s for each batsman. Du Plessis has the highest percentage of 4s, McCullum has the highest 6s. Finch has the highest percentage of 4s & 6s – 25.37 + 15.64= 41.01%

```
rames <- list("./kohli.csv","./plessis.csv","finch.csv","mccullum.csv")
names <- list("Kohli","Du Plessis","Finch","McCullum")
runs4s6s <-batsman4s6s(frames,names)
```

`print(runs4s6s)`

```
## Kohli Du Plessis Finch McCullum
## Runs(1s,2s,3s) 64.29 64.55 58.99 61.45
## 4s 27.78 24.38 25.37 22.87
## 6s 7.94 11.07 15.64 15.69
```

## 3D plot of Runs vs Balls Faced and Minutes at Crease

The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is then fitted based on the Balls Faced and Minutes at Crease to give the runs scored

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./kohli.csv","Kohli")
battingPerf3d("./plessis.csv","Du Plessis")
```

`dev.off()`

```
## null device
## 1
```

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
battingPerf3d("./finch.csv","A J Finch")
battingPerf3d("./mccullum.csv","McCullum")
```

`dev.off()`

```
## null device
## 1
```

## Predicting Runs given Balls Faced and Minutes at Crease

A hypothetical Balls faced and Minutes at Crease is used to predict the runs scored by each batsman based on the computed prediction plane

```
BF <- seq( 5, 70,length=10)
Mins <- seq(5,70,length=10)
newDF <- data.frame(BF,Mins)
kohli <- batsmanRunsPredict("./kohli.csv","Kohli",newdataframe=newDF)
plessis <- batsmanRunsPredict("./plessis.csv","Du Plessis",newdataframe=newDF)
finch <- batsmanRunsPredict("./finch.csv","A J Finch",newdataframe=newDF)
mccullum <- batsmanRunsPredict("./mccullum.csv","McCullum",newdataframe=newDF)
```

The predicted runs is displayed. As can be seen Finch has the best overall strike rate followed by McCullum.

```
batsmen <-cbind(round(kohli$Runs),round(plessis$Runs),round(finch$Runs),round(mccullum$Runs))
colnames(batsmen) <- c("Kohli","Du Plessis","Finch","McCullum")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
```

```
## BallsFaced MinsAtCrease Kohli Du Plessis Finch McCullum
## 1 5 5 2 1 5 3
## 2 12 12 12 10 22 16
## 3 19 19 22 19 40 28
## 4 27 27 31 28 57 41
## 5 34 34 41 37 74 54
## 6 41 41 51 47 91 66
## 7 48 48 60 56 108 79
## 8 56 56 70 65 125 91
## 9 63 63 79 74 142 104
## 10 70 70 89 84 159 117
```

## Highest runs likelihood

The plots below the runs likelihood of batsman. This uses K-Means Kohli has the highest likelihood of scoring runs 34.2% likely to score 66 runs. Du Plessis has 25% likelihood to score 53 runs, A. Virat Kohli

`batsmanRunsLikelihood("./kohli.csv","Kohli")`

```
## Summary of Kohli 's runs scoring likelihood
## **************************************************
##
## There is a 23.08 % likelihood that Kohli will make 10 Runs in 10 balls over 13 Minutes
## There is a 42.31 % likelihood that Kohli will make 29 Runs in 23 balls over 30 Minutes
## There is a 34.62 % likelihood that Kohli will make 66 Runs in 47 balls over 63 Minutes
```

B. Faf Du Plessis

`batsmanRunsLikelihood("./plessis.csv","Du Plessis")`

```
## Summary of Du Plessis 's runs scoring likelihood
## **************************************************
##
## There is a 62.5 % likelihood that Du Plessis will make 14 Runs in 11 balls over 19 Minutes
## There is a 25 % likelihood that Du Plessis will make 53 Runs in 40 balls over 50 Minutes
## There is a 12.5 % likelihood that Du Plessis will make 94 Runs in 61 balls over 90 Minutes
```

C. A J Finch

`batsmanRunsLikelihood("./finch.csv","A J Finch")`

```
## Summary of A J Finch 's runs scoring likelihood
## **************************************************
##
## There is a 20 % likelihood that A J Finch will make 95 Runs in 54 balls over 70 Minutes
## There is a 25 % likelihood that A J Finch will make 42 Runs in 27 balls over 35 Minutes
## There is a 55 % likelihood that A J Finch will make 8 Runs in 8 balls over 12 Minutes
```

D. Brendon McCullum

`batsmanRunsLikelihood("./mccullum.csv","McCullum")`

```
## Summary of McCullum 's runs scoring likelihood
## **************************************************
##
## There is a 50.72 % likelihood that McCullum will make 11 Runs in 10 balls over 13 Minutes
## There is a 28.99 % likelihood that McCullum will make 36 Runs in 27 balls over 37 Minutes
## There is a 20.29 % likelihood that McCullum will make 74 Runs in 48 balls over 70 Minutes
```

## Moving Average of runs over career

The moving average for the 4 batsmen indicate the following. It must be noted that there is not sufficient data yet on Twenty20 Internationals. Kpohli, Du Plessis and Finch average only 26 innings while McCullum has close to 70. So the moving average while an indication will regress towards the mean over time.

- The moving average of Kohli and Du Plessis is on the way up.
- McCullum has a consistent performance while Finch had a brief burst in 2013-2014

```
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("./kohli.csv","Kohli")
batsmanMovingAverage("./plessis.csv","Du Plessis")
batsmanMovingAverage("./finch.csv","A J Finch")
batsmanMovingAverage("./mccullum.csv","McCullum")
```

`dev.off()`

```
## null device
## 1
```

## Analysis of bowlers

- Samuel Badree (WI) – Innings-22, Runs -464, Wickets – 31, Econ Rate : 5.39
- Sunil Narine (WI)- Innings-31,Runs-666, Wickets – 38 , Econ Rate : 5.70
- Ravichander Ashwin (Ind)- Innings-26, Runs- 732, Wickets – 25, Econ Rate : 7.32
- Ajantha Mendis (SL)- Innings-39, Runs – 952,Wickets – 66, Econ Rate : 6.45

The plot shows the frequency with which the bowlers have taken 1,2,3 etc wickets. The most wickets taken is by Ajantha Mendis (6 wickets)

# Wicket Frequency percentage

This plot gives the percentage of wickets for each wickets (1,2,3…etc)

```
par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("./badree.csv","Badree")
bowlerWktsFreqPercent("./mendis.csv","Mendis")
bowlerWktsFreqPercent("./narine.csv","Narine")
bowlerWktsFreqPercent("./ashwin.csv","Ashwin")
```

`dev.off()`

```
## null device
## 1
```

## Wickets Runs plot

The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers. The ends of the box indicate the 25% and 75% percentile of runs scored for the wickets taken and the dark balck line is the average runs conceded.

```
par(mfrow=c(1,4))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("./badree.csv","Badree")
bowlerWktsRunsPlot("./mendis.csv","Mendis")
bowlerWktsRunsPlot("./narine.csv","Narine")
bowlerWktsRunsPlot("./ashwin.csv","Ashwin")
```

`dev.off()`

```
## null device
## 1
```

This plot below shows the average number of deliveries needed by the bowler to take the wickets (1,2,3 etc)

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./badree.csv","Badree")
bowlerWktRateTT("./mendis.csv","Mendis")
```

`dev.off()`

```
## null device
## 1
```

```
par(mfrow=c(1,2))
par(mar=c(4,4,2,2))
bowlerWktRateTT("./narine.csv","Narine")
bowlerWktRateTT("./ashwin.csv","Ashwin")
```

`dev.off()`

```
## null device
## 1
```

## Relative bowling performance

The plot below shows that Narine has the most wickets in the 2 -4 range followed by Mendis

```
frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingPerf(frames,names)
```

## Relative Economy Rate against wickets taken

The economy rate can be deduced as follows from the plot below. Narine has a good economy rate around 1 & 4 wickets, Ashwin around 2 wickets and Badree around 3. wickets

```
frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeBowlingERODTT(frames,names)
```

## Relative Wicket Rate

The relative wicket rate plots the mean number of deliveries needed to take the wickets namely (1,2,3,4). For e.g. Narine needed an average of 22 deliveries to take 1 wicket and 22.5,23.2, 24 deliveries to take 2,3 & 4 wickets respectively

```
frames <- list("./badree.csv","./mendis.csv","narine.csv","ashwin.csv")
names <- list("Badree","Mendis","Narine","Ashwin")
relativeWktRateTT(frames,names)
```

Moving average of wickets over career

```
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerMovingAverage("./badree.csv","Badree")
bowlerMovingAverage("./mendis.csv","Mendis")
bowlerMovingAverage("./narine.csv","Narine")
bowlerMovingAverage("./ashwin.csv","Ashwin")
```

```
## null device
## 1
```

# Key findings

Here are some key conclusions

**Twenty 20 batsmen**

- Kohli has the a very consistent performance scoring high runs in the different run ranges. Kohli also has a 34.2% likelihood to score 6 runs. He is followed by McCullum for consisten performance
- Finch has a best strike rate followed by McCullum.
- Du Plessis has the highest percentage of 4s and McCullum has the percentage of 6s. Finch is superior in the percentage of runs scored in 4s and 6s
- For a hypothetical balls faced and minutes at crease, Finch does best followed by McCullum
- Kohli’s & Du Plessis Twenty20 career is on a upswing. Can they maintain the momentum. McCullum is consistent

**Twenty20 bowlers**

- Narine has the highest wickets percentage for different wickets taken followed by Mendis
- Mendis has taken 1,2,3,4,6 wickets in 24 deliveries
- Narine has the lowest economy rate for 1 & 4 wickets, Ashwin for 2 wickets and Badree for 3 wickets. Mendis is comparatively expensive
- Narine needed the least deliveries to get 1 (22.5) & 2 (23.2) wickets, Mendis needed 20.5 deliveries and Ashwin 19 deliveries for 4 wickets

**Key takeaways** 1. If all the above batsment and bowlers were in the same team we expect

- Finch would be most useful when the run rate has to be greatly accelerated followed by McCullum
- If the need is to consolidate, then Kohli is the best man for the job followed by McCullum
- Overall McCullum is the best bet for Twenty20
- When it comes to bowling Narine wins hands down as he has the most wickets, a good economy rate and a very good attack rate. So Narine is great bet for providing a vital breakthrough.

Also see my other posts in R

- Introducing cricketr! : An R package to analyze performances of cricketers
- cricketr plays the ODIs!
- A peek into literacy in India: Statistical Learning with R
- A crime map of India in R – Crimes against women
- Analyzing cricket’s batting legends – Through the mirage with R
- Mirror, mirror . the best batsman of them all?

You may also like

- A closer look at “Robot Horse on a Trot” in Android
- What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
- Bend it like Bluemix, MongoDB with autoscaling – Part 2
- Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
- TWS-4: Gossip protocol: Epidemics and rumors to the rescue
- Deblurring with OpenCV:Weiner filter reloaded
- Architecting a cloud based IP Multimedia System (IMS)

**leave a comment**for the author, please follow the link and comment on their blog:

**Giga thoughts ... » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...