Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For Data Science positions that require some knowledge of Statistics and Programming skills, is common to ask questions like those below.

Question 1

Suppose an urn contains 40 red, 25 green and 35 blue balls. Balls are drawn from the urn one-by-one, at random and without replacement. Let $$N$$ denote the draw at which the first blue ball appears, and $$S$$ denote the number of green balls drawn until the $$N_{th}$$ draw (i.e. until the first bue ball appears). Estimate $$E[N|S=2]$$ by generating $$10000~iid$$ copies of $$(S,N)$$

Solution 1

urn<-c(rep("red",40), rep("green",25), rep("blue",35))

v<-{}

for (i in 1:10000) {

s<-sample(urn,100, replace = FALSE)

blue_ball<-min(which(s=="blue"))
green_balls<-min(which(s[1:blue_ball]=='green'))
green_balls[!is.finite(green_balls)] <- 0

if (green_balls==2) {

v<-c(v,blue_ball)
}

}

mean(v)

[1] 4.792257


Question 2

Suppose that claims are made to an insurance company according to a Poisson process with rate 10 per day. The amount of a claim is a random variable that has an exponential distribution with mean $$\1000$$. The insurance company receives payments continuously in time at a constant rate of $$\11000$$ per day. Starting with an initial capital of $$\25000$$, use $$10000$$ simulations to estimate the probability that the firm’s capital is always positive throughout its first $$365$$ days.

Solution 2

output<-{}
for(i in 1:10000) {

initial_capital<-25000
sums<-initial_capital

for (d in 1:365) {
P<-rpois(1,10)
C<-rexp(1,1/1000)
R<-11000

sums<-sums+R-C*P

}
output<-c(output,sums)

}
mean(output>0)

[1] 0.9644