Generate Quasi-Poisson Distribution Variable

[This article was first published on Category: R | Huidong Tian, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Most of regression methods assume that the response variables follow some exponential distribution families, e.g. Guassian, Poisson, Gamma, etc. However, this assumption was frequently violated in real world data by, for example, zero-inflated overdispersion problem. A number of methods were developed to deal with such problem, and among them, Quasi-Poisson and Negative Binomial are the most popular methods perhaps due to that major statistical softwares contain such functions.

Unlike Negative Binomial distribution, there is no function for generating Quasi-Poisson distributed random variable in R. In this blog, I will show you how to generate Quasi-Poisson distributed variable using Negative Binomial distribution.

Let variable follows Quasi-Poisson distribution, then the variance of should have a linear relationship with the mean of :

where, $\theta$ is called the disperision parameter, and for overdispersion variables , $\theta$ should greater than 1.

If variable follows Negative Binomial distribution, the variance of $Y_{nb}$ should have quadratic relationship with the mean of .

Random Negative Binomial variable can be generated in R using function rnbinom:

rnbinom
1
2
3
4
5
> x <- rnbinom(n = 10000, size = 8, mu = 5)
> mean(x)
[1] 4.9674
> var(x)
[1] 7.874925

If we can find the relationship between $\theta$ and $size$, then we can use the Negative Binomial distribution to generate Quasi-Poisson distributed random variable. The proof is listed as the following:

So, we can define such a function in R:

rqpois
1
2
3
rqpois <- function(n, mu, theta) {
  rnbinom(n = n, mu = mu, size = mu/(theta-1))
}

Take an example to diagnose the performance of the above function: $\mu = 3$ and $\theta = 5$. According to the relationship $Var(Y_{qp}) = \theta \times \mu$, the generated variable should have variance arround 15.

test
1
2
3
4
5
6
> set.seed(0)
> x <- rqpois(n = 10000, mu = 3, theta = 5);
> mean(x)
[1] 2.9718
> var(x)
[1] 14.66027

So, it works!

To leave a comment for the author, please follow the link and comment on their blog: Category: R | Huidong Tian.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)