# Cricketr analyzes Ind-Aus faceoff in WTC 2023!!

**R – Giga thoughts …**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*“The unexamined life is not worth living.”* – Socrates

*“There is no easy way from the earth to the stars.”* – Seneca

*“If you want to go fast, go alone. If you want to go far, go together.”* – African Proverb

# 1. Introduction

In this post, I put my R package cricketr to analyze the Indian and Australia World Test Championship (WTC) final squad ahead of the World Test Championship 2023.My R package **cricketr** had its birth on Jul 4, 2015. Cricketr uses data from Cricinfo.

### Indian squad

Rohit Sharma (Captain), Shubman Gill, Cheteshwar Pujara, Virat Kohli, Ajinkya Rahane, Ravindra Jadeja, Shardul Thakur, Mohd. Shami, Mohd. Siraj, Ishan Kishan (wk).

According to me, Ishan Kishan has more experience than KS Bharat, though Rishabh Pant would have been the ideal wicket keeper/left-handed batsman. I think Shardul Thakur would be handful in the English conditions. For a spinner it either Ashwin or Jadeja. Maybe the balance shifts in favor of Jadeja

### Australian squad

Pat Cummins (capt), Alex Carey (wk), Cameron Green, Josh Hazlewood, Usman Khawaja, Marnus Labuschagne, Nathan Lyon, Todd Murphy, Steven Smith (vice-capt), Mitchell Starc, David Warner.

Not sure if Scott Boland would fill in, instead of Todd Murphy 1

Let me give you a lay-of-the-land (post) below

The post below is organized into the following parts

- Analysis of Indian WTC batsmen from Jan 2016 – May 2023
- Analysis of Indian WTC batsmen against Australia from Jan 2016 -May 2023
- Analysis of Australian WTC batsmen from Jan 2016 – May 2023
- Analysis of Australian WTC batsmen against India from Jan 2016 -May 2023
- Analysis of Indian WTC bowlers from Jan 2016 – May 2023
- Analysis of Indian WTC bowlers against Australia from Jan 2016 -May 2023
- Analysis of Australian WTC bowlers from Jan 2016 – May 2023
- Analysis of Australian WTC bowlers gainst India from Jan 2016 -May 2023
- Team analysis of India and Australia

All the above analysis use data from ESPN Statsguru and use my R pakage cricketr

The data for the different players have been obtained using calls such as the ones below.

# Get Shubman Gill's batting data #shubman <-getPlayerData(1070173,dir=".",file="shubman.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4)) #shubmansp <- getPlayerDataSp(1070173,tdir=".",tfile="shubmansp.csv",ttype="batting") #Get Shubman Gill's data from Jan 2016 - May 2023 #df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test") #df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",startDate="2016-01-01",endDate="2023-05-01") #Get Shubman Gills data from Jan 2016 - May 2023, against Australia #df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test") #df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",opposition="Australia",startDate="2016-01-01",endDate="2023-05-01")

**Note**: To get data for bowlers we need to use the corresponding profile no and use type =‘bowling’. Details in my posts below

To do similar analysis please go through the following posts

- Re-introducing cricketr! : An R package to analyze performances of cricketers
- Cricketr learns new tricks : Performs fine-grained analysis of players
- Cricketr adds team analytics to its repertoire!!!

**Note 1:** I will not be analysing each and every chart as the charts are quite self-explanatory

**Note 2:** I have had to tile charts together otherwise this will become a very, very long post. You are free to use my R package cricketr and check out for yourself ##3. Analysis of India WTC batsmen from Jan 2016 – May 2023

** Findings**

- Kohli has the best average of 48+. India has won when Rohit and Rahane played well
- Kohli’s tops the list in cumulative average runs, followed by Pujara and Rohit is 3rd. Gill is on the upswing.
- Against Australia Pujara has the best cumulative average runs record followed by Rahane, with Gill in hot pursuit. In the strike rate department Gill tops followed by Rohit and Rahane
- Since 2016 Smith, Labuschagne has an average of 53+ since 2016!! Warner & Khwaja are at ~46
- Australia has won matches when Smith, Warner and Khwaja have played well.
- Labuschagne, Smith and C Green have good records against India. Indian bowlers will need to contain them
- Ashwin has the highest wickets followed by Jadeja against all teams. Ashwin’s performance has dropped over the years, while Siraj has been becoming better
- Jadeja has the best economy rate followed by Ashwin
- Against Australia specifically Jadeja has the best record followed by Ashwin. Jadeja has the best economy against Australia, followed by Siraj, then Ashwin
- Cummins, Starc and Lyons are the best performers for Australia. Hazzlewood, Cummins have the best economy against all opposition
- Against India Lyon, Cummins and Hazzlewood have performed well
- Hazzlewood, Lyon have a good economy rate against India
- Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
- At the Oval where the World Test Championship is going to be held India has won 4, lost 10 and drawn 10.

**Note 3:** You can also read this post at Rpubs at ind-aus-WTC!! The formatting will be nicer!

**Note 4: **You can download this post as PDF to read at your leisure ind-aus-WTC.pdf

### 2. Install the cricketr package

if (!require("cricketr")){ install.packages("cricketr",lib = "c:/test") } library(cricketr)

### 3a. Basic analysis

The analyses below include – Runs frequency plot – Mean strike rate – Run Ranges

Kohli’s strike rate increases with increasing runs, while Gill’s seems to drop. So it is with Pujara & Rahane

par(mfrow=c(3,3)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("kohliTest.csv","Kohli") batsmanMeanStrikeRate("kohliTest.csv","Kohli") batsmanRunsRanges("kohliTest.csv","Kohli") batsmanRunsFreqPerf("rohitTest.csv","Rohit") batsmanMeanStrikeRate("rohitTest.csv","Rohit") batsmanRunsRanges("rohitTest.csv","Rohit") batsmanRunsFreqPerf("shubmanTest.csv","S Gill") batsmanMeanStrikeRate("shubmanTest.csv","S Gill") batsmanRunsRanges("shubmanTest.csv","S Gill")

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("rahaneTest.csv","Rahane") batsmanMeanStrikeRate("rahaneTest.csv","Rahane") batsmanRunsRanges("rahaneTest.csv","Rahane") batsmanRunsFreqPerf("pujaraTest.csv","Pujara") batsmanMeanStrikeRate("pujaraTest.csv","Pujara") batsmanRunsRanges("pujaraTest.csv","Pujara")

### 3b. More analyses

Kohli hits roughly 5 4s in his 50 versus Gill,Pujara who is able to smash 6 4s.

par(mfrow=c(3,3)) par(mar=c(4,4,2,2)) batsman4s("kohliTest.csv","Kohli") batsman6s("kohliTest.csv","Kohli") batsmanMeanStrikeRate("kohliTest.csv","Kohli") batsman4s("rohitTest.csv","Rohit") batsman6s("rohitTest.csv","Rohit") batsmanMeanStrikeRate("rohitTest.csv","Rohit") batsman4s("shubmanTest.csv","S Gill") batsman6s("shubmanTest.csv","S Gill") batsmanMeanStrikeRate("shubmanTest.csv","S Gill")

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) batsman4s("rahaneTest.csv","Rahane") batsman6s("rahaneTest.csv","Rahane") batsmanMeanStrikeRate("rahane.csv","Rahane") batsman4s("pujaraTest.csv","Pujara") batsman6s("pujaraTest.csv","Pujara") batsmanMeanStrikeRate("pujaraTest.csv","Pujara")

### 3c.Boxplot histogram plot

This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency Kohli’s average is 48, while Rohit,Pujara is 40 with Rahane and Gill around 33.

batsmanPerfBoxHist("kohliTest.csv","Kohli")

batsmanPerfBoxHist("rohitTest.csv","Rohit")

batsmanPerfBoxHist("shubmanTest.csv","S Gill")

batsmanPerfBoxHist("rahaneTest.csv","Rahane")

batsmanPerfBoxHist("pujaraTest.csv","Pujara")

### 3d. Contribution to won and lost matches

For the functions below you will have to use the getPlayerDataSp() function. When Rohit Sharma and Pujara have played well India have tended to win more often

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanContributionWonLost("kohlisp.csv","Kohli") batsmanContributionWonLost("rohitsp.csv","Rohit") batsmanContributionWonLost("rahanesp.csv","Rahane") batsmanContributionWonLost("pujarasp.csv","Pujara")

### 3e. Performance at home and overseas

This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfHomeAway("kohlisp.csv","Kohli") batsmanPerfHomeAway("rohitsp.csv","Rohit") batsmanPerfHomeAway("rahanesp.csv","Rahane") batsmanPerfHomeAway("pujarasp.csv","Pujara")

### 3f. Batsman average at different venues

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("kohliTest.csv","Kohli") batsmanAvgRunsGround("rohitTest.csv","Rohit") batsmanAvgRunsGround("rahaneTest.csv","Rahane") batsmanAvgRunsGround("pujaraTest.csv","Pujara")

### 3g. Batsman average against different opposition

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsOpposition("kohliTest.csv","Kohli") batsmanAvgRunsOpposition("rohitTest.csv","Rohit") batsmanAvgRunsOpposition("rahaneTest.csv","Rahane") batsmanAvgRunsOpposition("pujaraTest.csv","Pujara")

### 3h. Runs Likelihood of batsman

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanRunsLikelihood("kohli.csv","Kohli") ## Summary of Kohli 's runs scoring likelihood ## ************************************************** ## ## There is a 52.91 % likelihood that Kohli will make 12 Runs in 26 balls over 35 Minutes ## There is a 30.81 % likelihood that Kohli will make 52 Runs in 100 balls over 139 Minutes ## There is a 16.28 % likelihood that Kohli will make 142 Runs in 237 balls over 335 Minutes batsmanRunsLikelihood("rohit.csv","Rohit") ## Summary of Rohit 's runs scoring likelihood ## ************************************************** ## ## There is a 43.24 % likelihood that Rohit will make 10 Runs in 21 balls over 32 Minutes ## There is a 45.95 % likelihood that Rohit will make 46 Runs in 85 balls over 124 Minutes ## There is a 10.81 % likelihood that Rohit will make 110 Runs in 199 balls over 282 Minutes batsmanRunsLikelihood("rahane.csv","Rahane") ## Summary of Rahane 's runs scoring likelihood ## ************************************************** ## ## There is a 7.75 % likelihood that Rahane will make 124 Runs in 224 balls over 318 Minutes ## There is a 62.02 % likelihood that Rahane will make 12 Runs in 26 balls over 37 Minutes ## There is a 30.23 % likelihood that Rahane will make 55 Runs in 113 balls over 162 Minutes batsmanRunsLikelihood("pujara.csv","Pujara") ## Summary of Pujara 's runs scoring likelihood ## ************************************************** ## ## There is a 60.49 % likelihood that Pujara will make 15 Runs in 38 balls over 55 Minutes ## There is a 31.48 % likelihood that Pujara will make 62 Runs in 142 balls over 204 Minutes ## There is a 8.02 % likelihood that Pujara will make 153 Runs in 319 balls over 445 Minutes

### 3h1. Moving average of batsman

Kohli’s moving average in tests seem to havw dropped after a peak in 2017, 2018. So it is with Rahane

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanMovingAverage("kohli.csv","Kohli") batsmanMovingAverage("rohit.csv","Rohit") batsmanMovingAverage("rahane.csv","Rahane") batsmanMovingAverage("pujara.csv","Pujara")

### 3i. Cumulative Average runs of batsman in career

Kohli’s cumulative average averages to ~48. Shubman Gill’s cumulative average is on the rise.

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeAverageRuns("kohliTest.csv","Kohli")

batsmanCumulativeAverageRuns("rohitTest.csv","Rohit")

batsmanCumulativeAverageRuns("rahaneTest.csv","Rahane")

batsmanCumulativeAverageRuns("pujaraTest.csv","Pujara")

batsmanCumulativeAverageRuns("shubmanTest.csv","S Gill")

### 3j Cumulative Average strike rate of batsman in career

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeStrikeRate("kohliTest.csv","Kohli")

batsmanCumulativeStrikeRate("rohitTest.csv","Rohit")

batsmanCumulativeStrikeRate("rahaneTest.csv","Rahane")

batsmanCumulativeStrikeRate("pujaraTest.csv","Pujara")

batsmanCumulativeStrikeRate("shubmanTest.csv","S Gill")

### 3k. Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfForecast("kohli.csv","Kohli") batsmanPerfForecast("rohit.csv","Rohit") batsmanPerfForecast("rahane.csv","Rahane") batsmanPerfForecast("pujara.csv","Pujara")

### 3l. Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following

frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeBatsmanSR(frames,names)

### 3m. Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeRunsFreqPerf(frames,names)

### 3n. Relative cumulative average runs in career

Kohli’s tops the list, followed by Pujara and Rohit is 3rd. Gill is on the upswing. Hope he performs well.

frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeBatsmanCumulativeAvgRuns(frames,names)

### 3o. Relative cumulative average strike rate in career

ROhit has the best strike rate followed by Kohli, with Shubman Gill ctaching up fast

frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeBatsmanCumulativeStrikeRate(frames,names)

### 3p. Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

**Note** Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

checkBatsmanInForm("kohli.csv","Kohli") ## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 154 Mean of population: 47.03 \n Sample size: 18 Mean of sample: 32.22 SD of sample: 42.45 \n\n Null hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population average\n\n Kohli 's Form Status: In-Form because the p value: 0.078058 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("rohit.csv","Rohit") ## [1] "**************************** Form status of Rohit ****************************\n\n Population size: 66 Mean of population: 37.03 \n Sample size: 8 Mean of sample: 37.88 SD of sample: 35.38 \n\n Null hypothesis H0 : Rohit 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rohit 's sample average is below the 95% confidence interval of population average\n\n Rohit 's Form Status: In-Form because the p value: 0.526254 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("rahane.csv","Rahane") ## [1] "**************************** Form status of Rahane ****************************\n\n Population size: 116 Mean of population: 34.78 \n Sample size: 13 Mean of sample: 21.38 SD of sample: 21.96 \n\n Null hypothesis H0 : Rahane 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rahane 's sample average is below the 95% confidence interval of population average\n\n Rahane 's Form Status: Out-of-Form because the p value: 0.023244 is less than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("pujara.csv","Pujara") ## [1] "**************************** Form status of Pujara ****************************\n\n Population size: 145 Mean of population: 41.93 \n Sample size: 17 Mean of sample: 33.24 SD of sample: 31.74 \n\n Null hypothesis H0 : Pujara 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Pujara 's sample average is below the 95% confidence interval of population average\n\n Pujara 's Form Status: In-Form because the p value: 0.137319 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("shubman.csv","S Gill") ## [1] "**************************** Form status of S Gill ****************************\n\n Population size: 23 Mean of population: 30.43 \n Sample size: 3 Mean of sample: 51.33 SD of sample: 66.88 \n\n Null hypothesis H0 : S Gill 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Gill 's sample average is below the 95% confidence interval of population average\n\n S Gill 's Form Status: In-Form because the p value: 0.687033 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"

### 3q. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15) Mins <- seq(30,600,length=15) newDF <- data.frame(BF,Mins) kohli1 <- batsmanRunsPredict("kohli.csv","Kohli",newdataframe=newDF) rohit1 <- batsmanRunsPredict("rohit.csv","Rohit",newdataframe=newDF) pujara1 <- batsmanRunsPredict("pujara.csv","Pujara",newdataframe=newDF) rahane1 <- batsmanRunsPredict("rahane.csv","Rahane",newdataframe=newDF) sgill1 <- batsmanRunsPredict("shubman.csv","S Gill",newdataframe=newDF) batsmen <-cbind(round(kohli1$Runs),round(rohit1$Runs),round(pujara1$Runs),round(rahane1$Runs),round(sgill1$Runs)) colnames(batsmen) <- c("Kohli","Rohit","Pujara","Rahane","S Gill") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Kohli Rohit Pujara Rahane S Gill ## 1 10 30 6 3 3 2 7 ## 2 38 71 24 19 16 17 24 ## 3 66 111 41 35 29 31 40 ## 4 94 152 58 51 42 45 56 ## 5 121 193 76 66 55 59 73 ## 6 149 234 93 82 68 74 89 ## 7 177 274 110 98 80 88 106 ## 8 205 315 128 114 93 102 122 ## 9 233 356 145 129 106 116 139 ## 10 261 396 163 145 119 130 155 ## 11 289 437 180 161 132 145 171 ## 12 316 478 197 177 144 159 188 ## 13 344 519 215 192 157 173 204 ## 14 372 559 232 208 170 187 221 ## 15 400 600 249 224 183 202 237

## 4. Analysis of India WTC batsmen from Jan 2016 – May 2023 against Australia

### 4a. Relative cumulative average

Against Australia specifically between 2016 – 2023, Pujara has the best record followed by Rahane, with Gill in hot pursuit. Kohli and Rohit trail behind

frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeBatsmanCumulativeAvgRuns(frames,names)

### 4b. Relative cumulative average strike rate in career

In the Strike Rate department Gill tops followed by Rohit and Rahane

frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv") names <- list("Kohli","Rohit","Pujara","Rahane","S Gill") relativeBatsmanCumulativeStrikeRate(frames,names)

## 5. Analysis of Australia WTC batsmen from Jan 2016 – May 2023

### 5a Basic analyses

par(mfrow=c(3,3)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("stevesmithTest.csv","S Smith") batsmanMeanStrikeRate("stevesmithTest.csv","S Smith") batsmanRunsRanges("stevesmithTest.csv","S Smith") batsmanRunsFreqPerf("warnerTest.csv","Warner") batsmanMeanStrikeRate("warnerTest.csv","Warner") batsmanRunsRanges("warnerTest.csv","Warner") batsmanRunsFreqPerf("labuschagneTest.csv","M Labuschagne") batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne") batsmanRunsRanges("labuschagneTest.csv","M Labuschagne")

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) batsmanRunsFreqPerf("cgreenTest.csv","C Green") batsmanMeanStrikeRate("cgreenTest.csv","C Green") batsmanRunsRanges("cgreenTest.csv","C Green") batsmanRunsFreqPerf("khwajaTest.csv","Khwaja") batsmanMeanStrikeRate("khwajaTest.csv","Khwaja") batsmanRunsRanges("khwajaTest.csv","Khwaja")

### 5b. More analyses

par(mfrow=c(3,3)) par(mar=c(4,4,2,2)) batsman4s("stevesmithTest.csv","S Smith") batsman6s("stevesmithTest.csv","S Smith") batsmanMeanStrikeRate("stevesmithTest.csv","S Smith") batsman4s("warnerTest.csv","Warner") batsman6s("warnerTest.csv","Warner") batsmanMeanStrikeRate("warnerTest.csv","Warner") batsman4s("labuschagneTest.csv","M Labuschagne") batsman6s("labuschagneTest.csv","M Labuschagne") batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne")

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) batsman4s("cgreenTest.csv","C Green") batsman6s("cgreenTest.csv","C Green") batsmanMeanStrikeRate("cgreenTest.csv","C Green") batsman4s("khwajaTest.csv","Khwaja") batsman6s("khwajaTest.csv","Khwaja") batsmanMeanStrikeRate("khwajaTest.csv","Khwaja")

### 5c.Boxplot histogram plot

This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency

Smith, Labuschagne has an average of 53+ since 2016!! Warner & Khwaja are at ~46

batsmanPerfBoxHist("stevesmithTest.csv","S Smith")

batsmanPerfBoxHist("warnerTest.csv","Warner")

batsmanPerfBoxHist("labuschagneTest.csv","M Labuschagne")

batsmanPerfBoxHist("cgreenTest.csv","C Green")

batsmanPerfBoxHist("khwajaTest.csv","Khwaja")

### 5d. Contribution to won and lost matches

For the 2 functions below you will have to use the getPlayerDataSp() function. Australia has won matches when Smith, Warner and Khwaja have played well.

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanContributionWonLost("stevesmithsp.csv","S Smith") batsmanContributionWonLost("warnersp.csv","Warner") batsmanContributionWonLost("labuschagnesp.csv","M Labuschagne") batsmanContributionWonLost("cgreensp.csv","C Green")

batsmanContributionWonLost("khwajasp.csv","Khwaja")

### 5e. Performance at home and overseas

This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfHomeAway("stevesmithsp.csv","S Smith") batsmanPerfHomeAway("warnersp.csv","Warner") batsmanPerfHomeAway("labuschagnesp.csv","M Labuschagne") batsmanPerfHomeAway("cgreensp.csv","C Green")

batsmanPerfHomeAway("khwajasp.csv","Khwaja")

### 5f. Batsman average at different venues

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("stevesmithTest.csv","S Smith") batsmanAvgRunsGround("warnerTest.csv","Warner") batsmanAvgRunsGround("labuschagneTest.csv","M Labuschagne") batsmanAvgRunsGround("cgreenTest.csv","C Green")

batsmanAvgRunsGround("khwajaTest.csv","Khwaja")

### 5g. Batsman average against different opposition

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsOpposition("stevesmithTest.csv","S Smith") batsmanAvgRunsOpposition("warnerTest.csv","Warner") batsmanAvgRunsOpposition("labuschagneTest.csv","M Labuschagne") batsmanAvgRunsOpposition("khwajaTest.csv","Khwaja")

### 5h. Runs Likelihood of batsman

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanRunsLikelihood("stevesmithTest.csv","S Smith") ## Summary of S Smith 's runs scoring likelihood ## ************************************************** ## ## There is a 58.76 % likelihood that S Smith will make 21 Runs in 38 balls over 56 Minutes ## There is a 24.74 % likelihood that S Smith will make 70 Runs in 148 balls over 210 Minutes ## There is a 16.49 % likelihood that S Smith will make 148 Runs in 268 balls over 398 Minutes batsmanRunsLikelihood("warnerTest.csv","Warner") ## Summary of Warner 's runs scoring likelihood ## ************************************************** ## ## There is a 7.22 % likelihood that Warner will make 155 Runs in 253 balls over 372 Minutes ## There is a 62.89 % likelihood that Warner will make 14 Runs in 21 balls over 32 Minutes ## There is a 29.9 % likelihood that Warner will make 65 Runs in 94 balls over 135 Minutes batsmanRunsLikelihood("labuschagneTest.csv","M Labuschagne") ## Summary of M Labuschagne 's runs scoring likelihood ## ************************************************** ## ## There is a 32.76 % likelihood that M Labuschagne will make 74 Runs in 144 balls over 206 Minutes ## There is a 55.17 % likelihood that M Labuschagne will make 22 Runs in 37 balls over 54 Minutes ## There is a 12.07 % likelihood that M Labuschagne will make 168 Runs in 297 balls over 420 Minutes

batsmanRunsLikelihood("khwajaTest.csv","Khwaja") ## Summary of Khwaja 's runs scoring likelihood ## ************************************************** ## ## There is a 64.94 % likelihood that Khwaja will make 14 Runs in 29 balls over 42 Minutes ## There is a 27.27 % likelihood that Khwaja will make 79 Runs in 148 balls over 210 Minutes ## There is a 7.79 % likelihood that Khwaja will make 165 Runs in 351 balls over 515 Minutes

### 5i. Moving average of batsman

Smith and Warner’s moving average has been on a downward trend lately. Khwaja is playing well

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanMovingAverage("stevesmith.csv","S Smith") batsmanMovingAverage("warner.csv","Warner") batsmanMovingAverage("labuschagne.csv","M Labuschagne") batsmanMovingAverage("khwaja.csv","Khwaja")

### 5j. Cumulative Average runs of batsman in career

Labuschagne, SMith and Warner havwe very good cumulative average

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeAverageRuns("stevesmithTest.csv","S Smith")

batsmanCumulativeAverageRuns("warnerTest.csv","Warner")

batsmanCumulativeAverageRuns("labuschagneTest.csv","M Labuschagne")

batsmanCumulativeAverageRuns("khwajaTest.csv","Khwaja")

### 5k. Cumulative Average strike rate of batsman in career

Warner towers over the others in the cumulative strike rate, followed by Labuschagne and Smith

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanCumulativeStrikeRate("stevesmithTest.csv","S Smith")

batsmanCumulativeStrikeRate("warnerTest.csv","Warner")

batsmanCumulativeStrikeRate("labuschagneTest.csv","M Labuschagne")

batsmanCumulativeStrikeRate("khwajaTest.csv","Khwaja")

### 5l. Future Runs forecast

Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.

A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

Take a look at the runs forecasted for the batsman below.

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfForecast("stevesmithTest.csv","S Smith") batsmanPerfForecast("warnerTest.csv","Warner") batsmanPerfForecast("labuschagneTest.csv","M Labuschagne") batsmanPerfForecast("khwajaTest.csv","Khwaja")

### 5m. Relative Mean Strike Rate plot

The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following

frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeBatsmanSR(frames,names)

### 5n. Relative Runs Frequency plot

The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show

frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeRunsFreqPerf(frames,names)

### 5o. Relative cumulative average runs in career

frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeBatsmanCumulativeAvgRuns(frames,names)

### 5p. Relative cumulative average strike rate in career

frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeBatsmanCumulativeStrikeRate(frames,names)

### 5q. Check Batsman In-Form or Out-of-Form

The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

**Note** Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

This is done for the Top 4 batsman

checkBatsmanInForm("stevesmith.csv","S Smith") ## [1] "**************************** Form status of S Smith ****************************\n\n Population size: 144 Mean of population: 53.76 \n Sample size: 17 Mean of sample: 45.65 SD of sample: 56.4 \n\n Null hypothesis H0 : S Smith 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Smith 's sample average is below the 95% confidence interval of population average\n\n S Smith 's Form Status: In-Form because the p value: 0.280533 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("warner.csv","Warner") ## [1] "**************************** Form status of Warner ****************************\n\n Population size: 164 Mean of population: 45.2 \n Sample size: 19 Mean of sample: 26.63 SD of sample: 44.62 \n\n Null hypothesis H0 : Warner 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Warner 's sample average is below the 95% confidence interval of population average\n\n Warner 's Form Status: Out-of-Form because the p value: 0.042744 is less than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("labuschagne.csv","M Labuschagne") ## [1] "**************************** Form status of M Labuschagne ****************************\n\n Population size: 52 Mean of population: 59.56 \n Sample size: 6 Mean of sample: 29.67 SD of sample: 19.96 \n\n Null hypothesis H0 : M Labuschagne 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : M Labuschagne 's sample average is below the 95% confidence interval of population average\n\n M Labuschagne 's Form Status: Out-of-Form because the p value: 0.005239 is less than alpha= 0.05 \n *******************************************************************************************\n\n" checkBatsmanInForm("khwaja.csv","Khwaja") ## [1] "**************************** Form status of Khwaja ****************************\n\n Population size: 89 Mean of population: 41.62 \n Sample size: 10 Mean of sample: 53.1 SD of sample: 76.34 \n\n Null hypothesis H0 : Khwaja 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Khwaja 's sample average is below the 95% confidence interval of population average\n\n Khwaja 's Form Status: In-Form because the p value: 0.677691 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"

### 5r. Predicting Runs given Balls Faced and Minutes at Crease

A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.

BF <- seq( 10, 400,length=15) Mins <- seq(30,600,length=15) newDF <- data.frame(BF,Mins) ssmith1 <- batsmanRunsPredict("stevesmith.csv","S Smith",newdataframe=newDF) warner1 <- batsmanRunsPredict("warner.csv","Warner",newdataframe=newDF) khwaja1 <- batsmanRunsPredict("khwaja.csv","Khwaja",newdataframe=newDF) labuschagne1 <- batsmanRunsPredict("labuschagne.csv","Labuschagne",newdataframe=newDF) cgreen1 <- batsmanRunsPredict("cgreen.csv","C Green",newdataframe=newDF) batsmen <-cbind(round(ssmith1$Runs),round(warner1$Runs),round(khwaja1$Runs),round(labuschagne1$Runs),round(cgreen1$Runs)) colnames(batsmen) <- c("S Smith","Warner","Khwaja","Labuschagne","C Green") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease S Smith Warner Khwaja Labuschagne C Green ## 1 10 30 7 10 10 9 13 ## 2 38 71 23 30 24 24 29 ## 3 66 111 38 50 38 40 44 ## 4 94 152 53 70 53 55 60 ## 5 121 193 69 90 67 70 75 ## 6 149 234 84 110 81 85 91 ## 7 177 274 100 130 95 100 106 ## 8 205 315 115 150 109 116 122 ## 9 233 356 130 170 123 131 137 ## 10 261 396 146 190 137 146 153 ## 11 289 437 161 210 151 161 168 ## 12 316 478 177 230 165 176 184 ## 13 344 519 192 250 179 192 199 ## 14 372 559 207 270 193 207 215 ## 15 400 600 223 290 207 222 230

## 6. Analysis of Australia WTC batsmen from Jan 2016 – May 2023 against India

### 6a. Relative cumulative average runs in career

Labuschagne, Smith and C Green have good records against India

frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeBatsmanCumulativeAvgRuns(frames,names)

### 6b. Relative cumulative average strike rate in career

Warner, Labuschagne and Smith have a good strike rate against India

frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv") names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green") relativeBatsmanCumulativeStrikeRate(frames,names)

## 7. Analysis of India WTC bowlers from Jan 2016 – May 2023

### 7a Wickets frequency chart

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerWktsFreqPercent("shamiTest.csv","Shami") bowlerWktsFreqPercent("sirajTest.csv","Siraj") bowlerWktsFreqPercent("ashwinTest.csv","Ashwin") bowlerWktsFreqPercent("jadejaTest.csv","Jadeja") bowlerWktsFreqPercent("shardulTest.csv","Shardul")

### 7b Wickets Runs chart

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerWktsRunsPlot("shamiTest.csv","Shami") bowlerWktsRunsPlot("sirajTest.csv","Siraj") bowlerWktsRunsPlot("ashwinTest.csv","Ashwin") bowlerWktsRunsPlot("jadejaTest.csv","Jadeja") bowlerWktsRunsPlot("shardulTest.csv","Shardul")

### 7c. Average wickets at different venues

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("shamiTest.csv","Shami") bowlerAvgWktsGround("sirajTest.csv","Siraj") bowlerAvgWktsGround("ashwinTest.csv","Ashwin") bowlerAvgWktsGround("jadejaTest.csv","Jadeja") bowlerAvgWktsGround("shardulTest.csv","Shardul")

### 7d Average wickets against different opposition

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerAvgWktsOpposition("shamiTest.csv","Shami") bowlerAvgWktsOpposition("sirajTest.csv","Siraj") bowlerAvgWktsOpposition("ashwinTest.csv","Ashwin") bowlerAvgWktsOpposition("jadejaTest.csv","Jadeja") bowlerAvgWktsOpposition("shardulTest.csv","Shardul")

### 7e Cumulative average wickets taken

Ashwin’s performance has dropped over the years, while Siraj has been becoming better

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) bowlerCumulativeAvgWickets("shamiTest.csv","Shami")

bowlerCumulativeAvgWickets("sirajTest.csv","Siraj")

bowlerCumulativeAvgWickets("ashwinTest.csv","Ashwin")

bowlerCumulativeAvgWickets("jadejaTest.csv","Jadeja")

bowlerCumulativeAvgWickets("shardulTest.csv","Shardul")

### 7g Cumulative average economy rate

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerCumulativeAvgEconRate("shamiTest.csv","Shami")

bowlerCumulativeAvgEconRate("sirajTest.csv","Siraj")

bowlerCumulativeAvgEconRate("ashwinTest.csv","Ashwin")

bowlerCumulativeAvgEconRate("jadejaTest.csv","Jadeja")

bowlerCumulativeAvgEconRate("shardulTest.csv","Shardul")

### 7h Wicket forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecasted wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerPerfForecast("shamiTest.csv","Shami") #bowlerPerfForecast("sirajTest.csv","Siraj") bowlerPerfForecast("ashwinTest.csv","Ashwin") bowlerPerfForecast("jadejaTest.csv","Jadeja") bowlerPerfForecast("shardulTest.csv","Shardul")

### 7i Relative Wickets Frequency Percentage

frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlingPerf(frames,names)

### 7j Relative Economy Rate against wickets taken

frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlingER(frames,names)

### 7k Relative cumulative average wickets of bowlers in career

Ashwin has the highest wickets followed by Jadeja against all teams

frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlerCumulativeAvgWickets(frames,names)

### 7l Relative cumulative average economy rate of bowlers

Jadeja has the best economy rate followed by Ashwin

frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlerCumulativeAvgEconRate(frames,names)

### 7m Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form

**Note** Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

**Note:** The check for the form status of the bowlers indicate

checkBowlerInForm("shami.csv","Shami") ## [1] "**************************** Form status of Shami ****************************\n\n Population size: 106 Mean of population: 1.93 \n Sample size: 12 Mean of sample: 1.33 SD of sample: 1.23 \n\n Null hypothesis H0 : Shami 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Shami 's sample average is below the 95% confidence\n interval of population average\n\n Shami 's Form Status: In-Form because the p value: 0.058427 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("siraj.csv","Siraj") ## [1] "**************************** Form status of Siraj ****************************\n\n Population size: 29 Mean of population: 1.59 \n Sample size: 4 Mean of sample: 0.25 SD of sample: 0.5 \n\n Null hypothesis H0 : Siraj 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Siraj 's sample average is below the 95% confidence\n interval of population average\n\n Siraj 's Form Status: Out-of-Form because the p value: 0.002923 is less than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("ashwin.csv","Ashwin") ## [1] "**************************** Form status of Ashwin ****************************\n\n Population size: 154 Mean of population: 2.77 \n Sample size: 18 Mean of sample: 2.44 SD of sample: 1.76 \n\n Null hypothesis H0 : Ashwin 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Ashwin 's sample average is below the 95% confidence\n interval of population average\n\n Ashwin 's Form Status: In-Form because the p value: 0.218345 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("jadeja.csv","Jadeja") ## [1] "**************************** Form status of Jadeja ****************************\n\n Population size: 108 Mean of population: 2.22 \n Sample size: 12 Mean of sample: 1.92 SD of sample: 2.35 \n\n Null hypothesis H0 : Jadeja 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Jadeja 's sample average is below the 95% confidence\n interval of population average\n\n Jadeja 's Form Status: In-Form because the p value: 0.333095 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("shardul.csv","Shardul") ## [1] "**************************** Form status of Shardul ****************************\n\n Population size: 13 Mean of population: 2 \n Sample size: 2 Mean of sample: 0.5 SD of sample: 0.71 \n\n Null hypothesis H0 : Shardul 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Shardul 's sample average is below the 95% confidence\n interval of population average\n\n Shardul 's Form Status: Out-of-Form because the p value: 0.04807 is less than alpha= 0.05 \n *******************************************************************************************\n\n"

## 8. Analysis of India WTC bowlers from Jan 2016 – May 2023 against Australia

### 8a Relative cumulative average wickets of bowlers in career

Against Australia specifically Jadeja has the best record followed by Ashwin

frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlerCumulativeAvgWickets(frames,names)

### 8b Relative cumulative average economy rate of bowlers

Jadeja has the best economy followed by Siraj, then Ashwin

frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv") names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul") relativeBowlerCumulativeAvgEconRate(frames,names)

## 8. Analysis of India WTC bowlers from Jan 2016 – May 2023

### 8a. Wickets frequency chart

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerWktsFreqPercent("cumminsTest.csv","Cummins") bowlerWktsFreqPercent("starcTest.csv","Starc") bowlerWktsFreqPercent("hazzlewoodTest.csv","Hazzlewood") bowlerWktsFreqPercent("todd.csv","Todd") bowlerWktsFreqPercent("lyonTest.csv","N Lyon")

**8b. Wickets frequency char**t

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerWktsRunsPlot("cumminsTest.csv","Cummins") bowlerWktsRunsPlot("starcTest.csv","Starc") bowlerWktsRunsPlot("hazzlewoodTest.csv","Hazzlewood") bowlerWktsRunsPlot("todd.csv","Todd") bowlerWktsRunsPlot("lyonTest.csv","N Lyon")

### 8c. Average wickets at different venues

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("cumminsTest.csv","Cummins") bowlerAvgWktsGround("starcTest.csv","Starc") bowlerAvgWktsGround("hazzlewoodTest.csv","Hazzlewood") bowlerAvgWktsGround("todd.csv","Todd") bowlerAvgWktsGround("lyonTest.csv","N Lyon")

### 8d Average wickets against different opposition

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerAvgWktsOpposition("cumminsTest.csv","Cummins") bowlerAvgWktsOpposition("starcTest.csv","Starc") bowlerAvgWktsOpposition("hazzlewoodTest.csv","Hazzlewood") bowlerAvgWktsOpposition("todd.csv","Todd") bowlerAvgWktsOpposition("lyonTest.csv","N Lyon")

### 8e Cumulative average wickets taken

par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) bowlerCumulativeAvgWickets("cumminsTest.csv","Cummins")

bowlerCumulativeAvgWickets("starcTest.csv","Starc")

bowlerCumulativeAvgWickets("hazzlewoodTest.csv","Hazzlewood")

bowlerCumulativeAvgWickets("todd.csv","Todd")

bowlerCumulativeAvgWickets("lyonTest.csv","N Lyon")

### 8g Cumulative average economy rate

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerCumulativeAvgEconRate("cumminsTest.csv","Cummins")

bowlerCumulativeAvgEconRate("starcTest.csv","Starc")

bowlerCumulativeAvgEconRate("hazzlewoodTest.csv","Hazzlewood")

bowlerCumulativeAvgEconRate("todd.csv","Todd")

bowlerCumulativeAvgEconRate("lyonTest.csv","N Lyon")

### 8f. Future Wickets forecast

Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.

A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches

par(mfrow=c(2,3)) par(mar=c(4,4,2,2)) bowlerPerfForecast("cumminsTest.csv","Cummins") bowlerPerfForecast("starcTest.csv","Starc") bowlerPerfForecast("hazzlewoodTest.csv","Hazzlewood") bowlerPerfForecast("lyonTest.csv","N Lyon")

### 8i. Relative Wickets Frequency Percentage

frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv") names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon") relativeBowlingPerf(frames,names)

### 8j Relative Economy Rate against wickets taken

frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv") names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon") relativeBowlingER(frames,names)

### 8k Relative cumulative average wickets of bowlers in career

Cummins, Starc and Lyons are the best performers

frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv") names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon") relativeBowlerCumulativeAvgWickets(frames,names)

### 8l Relative cumulative average economy rate of bowlers

Hazzlewood, Cummins have the best economy against all oppostion

frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv") names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon") relativeBowlerCumulativeAvgEconRate(frames,names)

### 8o Check for bowler in-form/out-of-form

The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are calculated.

The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.

**Note** Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later

**Note:** The check for the form status of the bowlers indicate

checkBowlerInForm("cummins.csv","Cummins") ## [1] "**************************** Form status of Cummins ****************************\n\n Population size: 81 Mean of population: 2.46 \n Sample size: 9 Mean of sample: 2 SD of sample: 1.5 \n\n Null hypothesis H0 : Cummins 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Cummins 's sample average is below the 95% confidence\n interval of population average\n\n Cummins 's Form Status: In-Form because the p value: 0.190785 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("starc.csv","Starc") ## [1] "**************************** Form status of Starc ****************************\n\n Population size: 126 Mean of population: 2.18 \n Sample size: 15 Mean of sample: 1.67 SD of sample: 1.18 \n\n Null hypothesis H0 : Starc 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Starc 's sample average is below the 95% confidence\n interval of population average\n\n Starc 's Form Status: In-Form because the p value: 0.057433 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("hazzlewood.csv","Hazzlewood") ## [1] "**************************** Form status of Hazzlewood ****************************\n\n Population size: 99 Mean of population: 2.04 \n Sample size: 12 Mean of sample: 1.67 SD of sample: 1.5 \n\n Null hypothesis H0 : Hazzlewood 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : Hazzlewood 's sample average is below the 95% confidence\n interval of population average\n\n Hazzlewood 's Form Status: In-Form because the p value: 0.204787 is greater than alpha= 0.05 \n *******************************************************************************************\n\n" checkBowlerInForm("lyon.csv","N Lyon") ## [1] "**************************** Form status of N Lyon ****************************\n\n Population size: 193 Mean of population: 2.08 \n Sample size: 22 Mean of sample: 2.95 SD of sample: 1.96 \n\n Null hypothesis H0 : N Lyon 's sample average is within 95% confidence interval \n of population average\n Alternative hypothesis Ha : N Lyon 's sample average is below the 95% confidence\n interval of population average\n\n N Lyon 's Form Status: In-Form because the p value: 0.975407 is greater than alpha= 0.05 \n *******************************************************************************************\n\n"

## 9. Analysis of Australia WTC bowlers from Jan 2016 – May 2023 against India

### 9a Relative cumulative average wickets of bowlers in career

Against India Lyon, Cummins and Hazzlewood have performed well

frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv") names <- list("Cummins","Starc","Hazzlewood","N Lyon") relativeBowlerCumulativeAvgWickets(frames,names)

### 9b Relative cumulative average economy rate of bowlers

Hazzlewood, Lyon have a good economy rate against India

frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv") names <- list("Cummins","Starc","Hazzlewood","N Lyon") relativeBowlerCumulativeAvgEconRate(frames,names)

## 10 Analysis of teams – India, Australia

#The data for India & Australia teams were obtained with the following calls #indiaTest <-getTeamDataHomeAway(dir=".",teamView="bat",matchType="Test",file="indiaTest.csv",save=TRUE,teamName="India") #australiaTest <- getTeamDataHomeAway(matchType="Test",file="australiaTest.csv",save=TRUE,teamName="Australia")

### 10a. Win-loss of India against all oppositions in Test cricket

Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24

teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

### 10b. Win-loss of Australia against all oppositions in Test cricket

teamWinLossStatusVsOpposition("australiaTest.csv",teamName="Australia",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

### 10c. Win-loss of India against Australia in Test cricket

Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24

teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("Australia"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

### 10d. Win-loss of India at all away venues

At the Oval where WTC is going to be held India has won 4, lost 10 and drawn 10.

teamWinLossStatusAtGrounds("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("away"),matchType="Test",plot=TRUE)

### 10d. Timeline of win-loss of India against Australia in Test cricket

plotTimelineofWinsLosses("indiaTest.csv",team="India",opposition=c("Australia"), homeOrAway=c("away","neutral"), startDate="2016-01-01",endDate="2023-05-01")

## 11. Conclusion

The above analysis performs various analysis of India and Australia in home and away matches. While we know the performance of the player at India or Australia, we cannot judge how the match will progress in the neutral, swinging conditions of the Oval. Let us hope for a good match!

Feel free to try out your own analysis with cricketr. Have fun with cricketr!!

Also see

- GooglyPlusPlus: Win Probability using Deep Learning and player embeddings
- The common alphabet of programming languages
- Practical Machine Learning with R and Python – Part 5
- Deep Learning from first principles in Python, R and Octave – Part 4
- Big Data-4: Webserver log analysis with RDDs, Pyspark, SparkR and SparklyR
- Cricpy takes guard for the Twenty20s
- Using Reinforcement Learning to solve Gridworld
- Exploring Quantum Gate operations with QCSimulator

To see all posts click Index of posts

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Giga thoughts …**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.