**Quantifying Memory**, and kindly contributed to R-bloggers)

*R and Python have different strengths. There’s little you can do in R you absolutely can’t do in Python and vice versa, but there’s a lot of stuff that’s really annoying in one and nice and simple in the other. I’m sure simulations can be run in R, but it seems frightfully tricky. Recently I wrote a simple Tennis simulator in Python, which copies all the Tennis rules, and allows player skill to be entered. It would print running scores as the game went, or if asked to, would run a large number of matches and calculate win percentages. I quickly found that the structure of Tennis is such that marginal gains are really valuable, as only a small increase in skill translated into a large increase in number of matches won. How about mapping this? what does the relationship between skill and tennis matches won look like? Where exactly is the cut-off point of skill, below which winning is not just lucky, but impossible? Does increasing the ‘serve bonus’, meaning service holds are very likely, improve or reduce the odds for the underdog?*

To answer these questions I decided to run the Python simulator from within R, and collect the output for simulations under different conditions. The first step was to get the Python script running through R, which meant making it executable. The simulator I used is the one I posted here previously. To this I only added the following code to make it run in the command line. All this does is take the arguments from the command prompt and map them to variables, which Python then send to runApp simulator:

`def main(argv=None):`

if argv is None:

argv =sys.argv

if not argv[1:]:

sys.exit()

number=int(argv[1])

player1=argv[2]

skill1=int(argv[3])

player2=argv[4]

skill2=int(argv[5])

serveBonus=float(argv[6])

runApp(number,player1,skill1,player2,skill2,serveBonus)

if __name__ == "__main__":

sys.exit(main())

My output from the simulator |

`results = NULL`

minSkill = 0

maxSkill = 200

for (i in minSkill:maxSkill) {

results = c(results, as.numeric(system(paste0("python tennis.py 100 murray ",

i, " djokovic 100 0.5"), intern = T)))

}

`library(ggplot2)`

df=data.frame(results)

df$skill=minSkill:maxSkill

ggplot(df,aes(skill,results))+

geom_hline(yintercept=50,colour="red")+ #add the reference point of 50% matches won

geom_point()+ #show individual points

geom_smooth(span=.5)+ #trend line

ylab("n matches won (of 100)")+

ggtitle("Number of matches won by Murray v Djokovic (skill=100)")

`library(ggplot2)`

ggplot(df[df$skill>90&df$skill<110,],aes(skill,results))+

geom_hline(yintercept=50,colour="red")+ #add the reference point of 50% matches won

geom_point()+ #show individual points

geom_smooth(method="lm")+ #trend line

ylab("n matches won (of 100)")+

ggtitle("Number of matches won by Murray v Djokovic (skill=100)")

`results1=NULL`

minSkill=30

maxSkill=180

for(i in minSkill:maxSkill){

results1=c(results1,as.numeric(system(paste0("python tennis.py 100 murray ",i," djokovic 100 0" ),intern=T)))

}

results2=NULL

minSkill=30

maxSkill=180

for(i in minSkill:maxSkill){

results2=c(results2,as.numeric(system(paste0("python tennis2.py 100 murray ",i," djokovic 100 2" ),intern=T)))

}

df1=data.frame(results1)

df1$skill=minSkill:maxSkill

df1$sets="three"

colnames(df1)[1] <- span=""> "results"

df2=data.frame(results2)

df2$skill=minSkill:maxSkill

df2$sets="two"

colnames(df2)[1] <- span=""> "results"

df=rbind(df1,df2)

ggplot(df[df$skill>50&df$skill<150,],aes(skill,results,group=sets,colour=sets))+

geom_hline(yintercept=50,colour="red")+ #add the reference point of 50% matches won

geom_point()+ #show individual points

geom_smooth()+ #trend line

ylab("n matches won (of 100)")+

ggtitle("Number of matches won by Murray v Djokovic (skill=100)")

Clearly playing three sets favours the stronger player. In a two set match the increased chance of winning due to an increase in skill is lower. Often the women’s side is seen as weaker than the men’s side, due to more surprises and new names making late stages of tournaments. However, at least part of the reason for this must be that luck plays a noticably larger role for two set matches than for three set matches.

**leave a comment**for the author, please follow the link and comment on their blog:

**Quantifying Memory**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...