Example 9.13: Negative binomial regression with proc mcmc

November 8, 2011
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)


In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support for Bayesian methods in R can be found in the Bayesian Task View.

SAS

As noted in example 8.30, the SAS rand function lacks the option to input the mean directly, instead using the basic parameters of the probability of success and the number of successes k. (Though note the negative binomial has several formulations, which can cause problems when using multiple software systems.) As developed in that example, we use the the proc fcmp function to instead work with the mean.

proc fcmp outlib=sasuser.funcs.test;
function poismean_nb(mean, size);
return(size/(mean+size));
endsub;
run;

options cmplib=sasuser.funcs;
run;

With that preparation out of the way, we simulate some data--here an intercept of 0 and a slope of 1.

data test;
do i = 1 to 10000;
x = normal(0);
mu = exp(0 + x);
k = 2;
y = rand("NEGBINOMIAL", poismean_nb(mu, k),k);
output;
end;
run;

The proc mcmc code presents a slight difficulty: the k successes before the random number of failures ought to be an integer, and proc mcmc appears to lack an integer-valued distribution. The model will run with continuous values of k, but its behavior is strange. Instead, we put a prior on a new parameter, kstar and take k as the rounded value (section 1.8.4) of kstar; since the values must be > 0, we also add 1 to the rounded value.

proc mcmc data=test nmc=1000 thin=1 seed=10061966;
parms beta0 1 beta1 1 kstar 10;

prior b: ~ normal(0, var = 10000);
prior kstar ~ igamma(.01, scale=0.01);

k=round(kstar+1, 1);
mu = exp(beta0 + beta1 * x);

model y ~ negbin(k, poismean_nb(mu, k));
run;

The way the kstar and k business works is that SAS actually processes the programming statements in each iteration of the chain. Posterior summaries just below, sample diagnostic plot above.

Posterior Summaries

Parameter N Mean Standard Percentiles
Deviation 25% 50% 75%
beta0 10000 0.00712 0.0131 -0.00171 0.00721 0.0156
beta1 10000 0.9818 0.0128 0.9732 0.9814 0.9905
kstar 10000 0.9648 0.2855 0.7112 0.9481 1.1974

Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
beta0 0.050 -0.0195 0.0321 -0.0182 0.0328
beta1 0.050 0.9569 1.0074 0.9562 1.0063
kstar 0.050 0.5208 1.4709 0.5001 1.4348

If a simple model like the one shown here is all you need, proc genmod's bayes statement can work for you. But the formulation demonstrated above would be useful for a generalized linear mixed model, for example.

To leave a comment for the author, please follow the link and comment on his blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.