Site icon R-bloggers

Fitting Distribution X to Data From Distribution Y

[This article was first published on Cerebral Mastication » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed the parameters of the gamma distribution to see how it impacted fit. An exercise like this is what I call building a “toy model” and I think this is invaluable as a method for building intuition and a visceral understanding of data.
Here’s some example code which we played with:

set.seed(3)
x <- rgamma(1e5, 2, .2)
plot(density(x))
 
# normalize the gamma so it's between 0 & 1
# .0001 added because having exactly 1 causes fail
xt <- x / ( max( x ) + .0001 )
 
# fit a beta distribution to xt
library( MASS )
fit.beta <- fitdistr( xt, "beta", start = list( shape1=2, shape2=5 ) )
 
x.beta <- rbeta(1e5,fit.beta$estimate[[1]],fit.beta$estimate[[2]])
 
## plot the pdfs on top of each other
plot(density(xt))
lines(density(x.beta), col="red" )
 
## plot the qqplots
qqplot(xt, x.beta)

Created by Pretty R at inside-R.org

It’s not illustrated above, but it’s probably useful to transform the simulated data (x.beta) back into pre normalized space by multiplying by max( x ) + .0001 . (I swore I’d never say this but I lied) I’ll leave that as an exercise for the reader.

Another very useful tool in building a mental road map of distributions is the graphical chart of distribution relationships that John Cook introduced me to.

To leave a comment for the author, please follow the link and comment on their blog: Cerebral Mastication » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.