**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers)

On http://www.bakadesuyo.com, there was recently an interesting discussion about infidelity, the key question being “*at what ages are men and women most likely to have affairs?*” The discussion is based on some graphs, e.g.

The source is a paper by Donald Cox. Based on a sample of 36 men and 22 women. And to be honest, I have been surprised by the shape of the curves. Especially for men… In order to compare, it is possible to use a dataset that can be found in R,

> library(Ecdat) > data(Fair) > tail(Fair) sex age ym child religious education occupation 596 male 47 15.0 yes 3 16 4 597 male 22 1.5 yes 1 12 2 598 female 32 10.0 yes 2 18 5 599 male 32 10.0 yes 2 17 6 600 male 22 7.0 yes 3 18 6 601 female 32 15.0 yes 3 14 1 rate nbaffairs 596 2 7 597 5 1 598 4 7 599 5 2 600 2 2 601 5 1

with 601 observations (from Fair (1977)). It is possible to run a Poisson regression to describe the number of affairs in the past year. E.g for men

> library(splines) > regM=glm(nbaffairs~bs(age),family=poisson, + data=Fair[Fair$sex=="male",]) > a=seq(20,60) > N=predict(regM,newdata=data.frame(age=a),type="response") > plot(a,N,type="l",lwd=2,col="red")

or for women,

> regF=glm(nbaffairs~bs(age),family=poisson, + data=Fair[Fair$sex=="female",]) > N=predict(regF,newdata=data.frame(age=a),type="response") > plot(a,N,type="l",lwd=2,col="red",lty=2)

On that (larger) dataset, we obtain curves that are more intuitive… But maybe the Poisson distribution is not an appropriate model. For instance, having *no affairs *do not mean that the person did not want to… So perhaps, a more interesting model would be a Poisson model with a zero-inflation, i.e. some people are honest and do not want to have affairs (and appear as 0), while some do want to have some affairs, and the number of affairs is Poisson distributed (and can take the value 0). If we focus on people wo do not want to have affairs, the model (and the prediction) is the following, where we plot the probability of not being interested in having an affair,

> library(pscl) > regM0=zeroinfl(nbaffairs~bs(age)|bs(age),family=poisson, + link="logit",data=Fair[Fair$sex=="male",]) > N0=predict(regM0,newdata=data.frame(age=a),type="zero") > plot(a,N0,type="l",lwd=2,col="blue")

For those willing to have an affair, here is the parameter of the Poisson distribution of the number of affairs,

> Nc=predict(regM0,newdata=data.frame(age=a),type="count") > plot(a,Nc,type="l",lwd=2,col="purple")

The same can be done for women, with the probability of no-willing to have an affair,

and to Poisson rate for women willing to have an affair,

If we focus on people willing to have an affair, the curves are the following,

i.e. men below 40 have more interested, but after 40, the probability drops, while women are still more and more likely to be willing to have an affair. On the other hand, young women having affairs might be less, but they usually have much more affairs than men…

**leave a comment**for the author, please follow the link and comment on his blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...