# Infidelity and econometrics

January 17, 2012
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

On http://www.bakadesuyo.com, there was recently an interesting discussion about infidelity, the key question being "at what ages are men and women most likely to have affairs?" The discussion is based on some graphs, e.g. The source is a paper by Donald Cox. Based on a sample of 36 men and 22 women. And to be honest, I have been surprised by the shape of the curves. Especially for men... In order to compare, it is possible to use a dataset that can be found in R,
> library(Ecdat)
> data(Fair)
> tail(Fair)
sex age   ym child religious education occupation
596   male  47 15.0   yes         3        16          4
597   male  22  1.5   yes         1        12          2
598 female  32 10.0   yes         2        18          5
599   male  32 10.0   yes         2        17          6
600   male  22  7.0   yes         3        18          6
601 female  32 15.0   yes         3        14          1
rate nbaffairs
596    2         7
597    5         1
598    4         7
599    5         2
600    2         2
601    5         1
with 601 observations (from Fair (1977)). It is possible to run a Poisson regression to describe the number of affairs in the past year. E.g for men
> library(splines)
> regM=glm(nbaffairs~bs(age),family=poisson,
+ data=Fair[Fair$sex=="male",]) > a=seq(20,60) > N=predict(regM,newdata=data.frame(age=a),type="response") > plot(a,N,type="l",lwd=2,col="red") or for women, > regF=glm(nbaffairs~bs(age),family=poisson, + data=Fair[Fair$sex=="female",])
> N=predict(regF,newdata=data.frame(age=a),type="response")
> plot(a,N,type="l",lwd=2,col="red",lty=2)
On that (larger) dataset, we obtain curves that are more intuitive... But maybe the Poisson distribution is not an appropriate model. For instance, having no affairs do not mean that the person did not want to... So perhaps, a more interesting model would be a Poisson model with a zero-inflation, i.e. some people are honest and do not want to have affairs (and appear as 0), while some do want to have some affairs, and the number of affairs is Poisson distributed (and can take the value 0). If we focus on people wo do not want to have affairs, the model (and the prediction) is the following, where we plot the probability of not being interested in having an affair,
> library(pscl)
> regM0=zeroinfl(nbaffairs~bs(age)|bs(age),family=poisson,
> N0=predict(regM0,newdata=data.frame(age=a),type="zero")
> plot(a,N0,type="l",lwd=2,col="blue")
For those willing to have an affair, here is the parameter of the Poisson distribution of the number of affairs,
> Nc=predict(regM0,newdata=data.frame(age=a),type="count")
> plot(a,Nc,type="l",lwd=2,col="purple")
The same can be done for women, with the probability of no-willing to have an affair, and to Poisson rate for women willing to have an affair, If we focus on people willing to have an affair, the curves are the following, i.e. men below 40 have more interested, but after 40, the probability drops, while women are still more and more likely to be willing to have an affair. On the other hand, young women having affairs might be less, but they usually have much more affairs than men...

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...