# Where hiding if you don’t want to get wet ?

April 5, 2012
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Following the previous post, two additional remarks. Following a comment by @cosi, I have investigated quickly a binomial fit to the distribution of the number of people not getting wet, with a fixed number of players on the field. It looks like it should be a binomial distribution with a fixed probability (2/3) and with size parameter affine in the number of players. @guigui suggested some connexion with with "birds on a wire" problem (see e.g. http://www.cut-the-knot.org/)

n=p=rep(NA,20)
for(i in 1:20){
NSim=10000
N=Vectorize(NOTWET)(n=rep(3+2*i,NSim))
n[i]=mean(N)/(1-var(N)/mean(N))
p[i]=1-var(N)/mean(N)
}
plot(seq(5,43,by=2),n,col="red",type="b")

for the implied size parameter above, and below the implied probability parameter.

plot(seq(5,43,by=2),p,col="blue",type="b")

(as functions of the number of players). I'd be glad to get more details on that 2/3 probability.

Now, let us investigate another question sent by email: "Where should you hide if you don't want to get wet ?" A first idea could be the following: given that some players are already on the field, where should I go if I do not want to get wet ? Below are some simulations for 7 or 25 players (already on the field). The red area is the area so that I will become someone's target (perhaps even the target of two players...). The green area is the safe zone.

(with 7 players above, and 25 below)

It looks like, on the border, it might be safer than in the middle of the field. But we have to confirm that intuition... or at least see if that intuition is valid.

Based on what was done the other day, it is possible to look where people that got wet were located (instead of counting dry players as done in the previous function). So here, we simply look where non wet players were standing

NOTWET=function(n,p){
x=runif(n)
y=runif(n)
(d=as.matrix(dist(cbind(x,y), method = "euclidean",upper=TRUE)))
diag(d)=999999
dmin=apply(d,2,which.min)
whonotwet=( (1:n) %notin% names(table(dmin)) )
#plot(x[-whonotwet],y[-whonotwet],pch=19,col="blue",type="p")
#points(x[whonotwet],y[whonotwet],pch=19,col="red")
M=matrix(NA,p,p);u=seq(0,1,by=1/p)
for(i in 1:p){
for(j in 1:p){
M[i,j]=sum((x[whonotwet]>=u[i])&(x[whonotwet]<u[i+1])&
(y[whonotwet]>=u[j])&(y[whonotwet]<u[j+1]))
}}
return(M)}

based on function

"%notin%" <- function(x, y) x[!x %in% y]

On a given grid, we count people playing the game that ended dry (with might avoid boundary bias on nonparametric smooth estimator of distribution, as we'll see later on). For instance with 11 players,

M11=matrix(0,25,25);
for(s in 1:100000){
M11=M11+NOTWET(11,25)
}

Then we can plot the distribution, on the field,

COL=rev(heat.colors(101)); p=25
u=seq(0,1,by=1/p)
plot(0:1,0:1,col="white",xlab="",ylab="")
for(i in 1:p){
for(j in 1:p){
polygon(c(u[i],u[i],u[i+1],u[i+1]), c(u[j],u[j+1],u[j+1],u[j]),border=NA,
col=COL[trunc(M11[i,j])/max(M11)*100+1])
}}

Red means a lot of non-wet people (i.e safer zones). Graphs below are with 7 and 11 players respectively (from the left to the right)

with the following distribution on the diagonal: corners are almost 4 times safer than the middle of the field, with 7 players,
Below are plotted distributions of locations of non-wet players when the total number of players was either 25 (on the left) and 101 (on the right)
with again on the diagonal
Hence, the border is rather safe, but next to the border, it is no safe any longer: is someone is standing right on the border, he will probably shoot at you: there is no one behind him ! This explains the stange behavior on the borders (and corners, thanks JP for the intuitive explanation).
But would it be completely different with a field shaped as a disk ? using the previous technique of working on a fixed grid (or correcting for boundary bias, since the disk might cover only a fraction of the grid-square), or keeping coordinates of non-wet players, and using standard kernel-based estimator of the distribution (the light yellow circle outside the disk is simply due to bias of the kernel estimator on the border)
NOTWET=function(n){
x=(runif(n*20)*2-1)*1
y=(runif(n*20)*2-1)*1
I=which((x^2+y^2<1))
x=x[I];y=y[I]
x=x[1:n];y=y[1:n]
(d=as.matrix(dist(cbind(x,y),
method = "euclidean",upper=TRUE)))
diag(d)=999999
dmin=apply(d,2,which.min)
whonotwet=( (1:n) %notin% names(table(dmin)) )
return(cbind(x[whonotwet],y[whonotwet]))
}

M=t(c(0,0))
for(s in 1:10000){
M=rbind(M11,NOTWET(25))
}
M=M[-1,]

library(ks)
HP=matrix(c(.001,0,0,.001),2,2)
K=kde(x=M11, H=HP)
image(K$eval.points[[1]],K$eval.points[[2]],K\$estimate2,
col=rev(heat.colors(101)),xlim=c(-1,1),ylim=c(-1,1))

And note that the distribution of the number of players ending the game dry is the same, for a square field, or a disk,
NOTWET2=function(n){
x=(runif(n*20)*2-1)*1
y=(runif(n*20)*2-1)*1
I=which((x^2+y^2<1))
x=x[I];y=y[I]
x=x[1:n];y=y[1:n]
(d=as.matrix(dist(cbind(x,y), method = "euclidean",upper=TRUE)))
diag(d)=999999
dmin=apply(d,2,which.min)
notwet=n-length(table(dmin))
return(notwet)}

NOTWET=function(n){
x=runif(n)
y=runif(n)
(d=as.matrix(dist(cbind(x,y), method = "euclidean",upper=TRUE)))
diag(d)=999999
dmin=apply(d,2,which.min)
notwet=n-length(table(dmin))
return(notwet)}

NSim=100000
Nsquare=Vectorize(NOTWET)(n=rep(25,NSim))
Ndisk=Vectorize(NOTWET2)(n=rep(25,NSim))
Tsq=table(Nsquare)
Tdk=table(Ndisk)
plot(as.numeric(names(Tsq)),Tsq/NSim,type="b",col="red")
lines(as.numeric(names(Tdk)),Tdk/NSim,type="b",pch=4,col="blue")

But so far, it was still simple... I wonder what it might become if we consider a non-convex place, with walls, where player might hide.... Next time, a post on indoor paint-ball !

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...