# Rejection Sampling

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Playing with R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An interesting sampling method that was covered briefly in my Bayesian statistics course was rejection sampling. Since I have nothing better to do, I thought it would be fun to make an acceptance-rejection algorithm using R. FUN!

The Rejection Sampling method is usually used to simulate data from an unknown distribution. To do this one samples from a distribution that covers the suport of the unknown distribution and use certain criteria for accepting/rejecting the sampled values. One way to do this is as follows (Rice, p 92).

**Step 1:**Generate T with density m.

**Step 2:**Generate U, uniform on [0,1] and independent of T. If M(T)*U ≤ ƒ(T), then let X = T (accept T). Otherwise, go to Step 1 (Reject T).

Where M(x) is a function such that M(x) ≥ ƒ(x) on [a,b].

To keep things simple for myself I will be simulating a Beta distribution with parameters 6 and 3 (ƒ). To do this I will sample T’s from a scaled uniform distribution (M), and reject sampled values where M(T)*U ≥ ƒ(T).

In a plot of the beta distribution with parameters 6 and 3 we can see that the ƒ(x) never goes above 3. For this reason I chose to scale the uniform distribution M by multiplying it by 3.

Here is the R code to implement rejection sampling for 100,000 observations in this example.

```
sample.x = runif(100000,0,1)
accept = c()
for(i in 1:length(sample.x)){
U = runif(1, 0, 1)
if(dunif(sample.x[i], 0, 1)*3*U <= dbeta(sample.x[i], 6, 3)) {
```

` accept[i] = 'Yes' `

` } `

```
else if(dunif(sample.x[i],0,1)*3*U > dbeta(sample.x[i], 6, 3)) {
accept[i] = 'No'
}
}
T = data.frame(sample.x, accept = factor(accept, levels= c('Yes','No')))
```

We can plot the results along with the true distribution with the following code.

```
hist(T[,1][T$accept=='Yes'], breaks = seq(0,1,0.01), freq = FALSE, main = 'Histogram of X', xlab = 'X')
lines(x, dbeta(x,6,3))
```

With 100,000 observations sampled, the data fit very well.

We can look at the densities of both the accepted and rejected values to get an idea of what's going on.

```
library(ggplot2)
print(qplot(sample.x, data = T, geom = 'density', color = accept))
```

Looking at a stacked histogram of all the sampled values together we can really see how much wasted data there are in this example.

```
print(qplot(sample.x, data = T, geom = 'histogram', fill = accept, binwidth=0.01))
```

In fact, when I ran this example I got 33,114 accepted values and 66,886 rejected values. I probably could have chosen a better value than 3 to scale the uniform distribution, but ideally rejection sampling uses a known distribution that is only slightly different from the unknown distribution we're trying to estimate.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Playing with R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.