SAS PROC MCMC example in R; Poisson Regression

November 16, 2014
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

In this post I will try to copy the calculations of SAS’s PROC MCMC example 61.5 (Poisson Regression) into the various R solutions. In this post Jags, RStan, MCMCpack, LaplacesDemon solutions are shown. Compared to the first post in this series, rcppbugs and mcmc are not used. Rcppbugs has no poisson distribution and while I know how to go around that (ones trick, second post in series) I just don’t think that should be the approach for a standard distribution such as poisson. Mcmc has a weird summary which I dislike.

Data and model

Data are insurance claims. To quote SAS manual ‘The variable n represents the number of insurance policy holders and the variable c represents the number of insurance claims. The variable car is the type of car involved (classified into three groups), and it is coded into two levels. The variable age is the age group of a policy holder (classified into two groups)‘.
There is a nice trick in the model, again to quote; ‘The logarithm of the variable n is used as an offset—that is, a regression variable with a constant coefficient of 1 for each observation. Having the offset constant in the model is equivalent to fitting an expanded data set with 3000 observations, each with response variable y observed on an individual level. The log link relates the mean and the factors car and age‘.
There was an nice SAS trick in the data. The raw data has car as one variable and a design matrix is needed, for which PROC TRANSREG is used. In R using model.matrix() is at least a bit more transparant.
insure <- read.table(text=’
n c car  age
500  42 small 0
1200 37 medium 0
100  1 large 0
400 101 small 1
500  73 medium 1
300  14 large 1′,
skip=1,header=TRUE)

insure$ln=log(insure$n)
insure$car  <- relevel(insure$car,’small’)
input_insure <-  model.matrix(~ car + age,data=insure)

Model

The models are all straightforward implementations of the SAS code.

MCMCpack

Probably the shortest code is for MCMCpack.
library(MCMCpack)
MCMCfun <- function(parm) {
    mu <- input_insure %*% parm
    LL <- sum(dnorm(parm,0,1e3,log=TRUE))
    yhat <- exp(mu+insure$ln)
    LP <- LL+sum(dpois(insure$c,yhat,log=TRUE))
    return(LP)
}
mcmcout <- MCMCmetrop1R(MCMCfun,rep(0,4))
summary(mcmcout)

LaplacesDemon

LaplacesDemon has a bit more code around essentially the same likelihood code as MCMCpack. In addition one needs to chose the sampling regime. I chose RWM with a first run providing values for parameter variances/covariances as a input for the final run. Of the four R based methods,  LaplacesDemon() is the function which creates most output.
library(‘LaplacesDemon’)
mon.names <- “LP”
parm.names <- c(‘alpha’,’beta_car1′,’beta_car2′,’beta_age’)
PGF <- function(Data) c(rnorm(4,0,1))

MyData <- list(mon.names=mon.names,
    parm.names=parm.names,
    PGF=PGF,
    y=insure$c,
    x=input_insure,
    ln=insure$ln)
Model <- function(parm, Data) {
    mu <- Data$x %*% parm
    LL <- sum(dnorm(parm,0,1e3,log=TRUE))
    yhat <- exp(mu+Data$ln)
    LP <- LL+sum(dpois(Data$y,yhat,log=TRUE))
    Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
        yhat=yhat,
        parm=parm)
    return(Modelout)
}
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Fit1 <- LaplacesDemon(Model,
    Data=MyData,
    Algorithm=’RWM’,
    Covar=rep(.1,4),
    Initial.Values = Initial.Values
)
Fit2 <- LaplacesDemon(Model,
    Data=MyData,
    Algorithm=’RWM’,
    Covar=var(Fit1$Posterior2),
    Initial.Values = apply(Fit1$Posterior2,2,median)
)
Fit2

JAGS

The code is so simple, no comments are needed.

library(R2jags)
datain <- list(
    car1=input_insure[,2],
    car2=input_insure[,3],
    agev=insure$age,
    ln=insure$ln,
    c=insure$c,
    n=nrow(insure))
parameters <- c(‘a0′,’c1′,’c2′,’age’)

jmodel <- function() {
    for (i in 1:n) {       
        mu[i] <- a0+car1[i]*c1+car2[i]*c2+agev[i]*age
        y[i] <- exp(mu[i]+ln[i])
        c[i] ~ dpois(y[i])
    }
    a0 ~ dnorm(0,.0001)
    c1 ~ dnorm(0,.0001)
    c2 ~ dnorm(0,.0001)
    age ~ dnorm(0,.0001)
}

jj <- jags(model.file=jmodel,
    data=datain,
    parameters=parameters)
jj

Stan

The code is again straightforward. As in previous posts, datain and parameters from Jags are re-used. Since I am still struggling a bit with Rstan I am using dplyr to combine model and simulation in one function. Stan is the slowest of all R based methods, the overhead of compiling is relative large for such a simple model 
library(dplyr)
library(rstan)
fit <- ‘
    data {
    int n;
    int c[n];
    vector[n] agev;
    vector[n] ln;
    vector[n] car1;
    vector[n] car2;
    }
    parameters {
    real a0;
    real c1;
    real c2;
    real age;
    }
    transformed parameters {
    vector[n] y;
    vector[n] mu;
    mu <- a0+car1*c1+car2*c2+agev*age;
    y <- exp(mu+ln);
    }
    model {
    c ~ poisson(y);
    a0 ~ normal(0,100);
    c1 ~ normal(0,100);
    c2 ~ normal(0,100);
    age ~ normal(0,100);
    }
    ‘ %>%
    stan(model_code = .,
        data = datain,
        pars=parameters)
fit

Results

MCMCpack

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The Metropolis acceptance rate was 0.37195
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> summary(mcmcout)

Iterations = 501:20500
Thinning interval = 1
Number of chains = 1
Sample size per chain = 20000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

       Mean     SD  Naive SE Time-series SE
[1,] -2.643 0.1304 0.0009223       0.003448
[2,] -1.787 0.2736 0.0019350       0.007368
[3,] -0.696 0.1273 0.0008999       0.003315
[4,]  1.327 0.1335 0.0009440       0.003497

2. Quantiles for each variable:

        2.5%     25%     50%     75%   97.5%
var1 -2.9082 -2.7309 -2.6388 -2.5552 -2.3911
var2 -2.3606 -1.9639 -1.7772 -1.5944 -1.2809
var3 -0.9456 -0.7789 -0.6962 -0.6091 -0.4504
var4  1.0712  1.2363  1.3268  1.4188  1.5914

LaplacesDemon

Call:
LaplacesDemon(Model = Model, Data = MyData, Initial.Values = apply(Fit1$Posterior2, 2, median), Covar = var(Fit1$Posterior2), Algorithm = “RWM”)

Acceptance Rate: 0.3294
Algorithm: Random-Walk Metropolis
Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
     alpha  beta_car1  beta_car2   beta_age
0.02421653 0.09052011 0.01687834 0.02480825

Covariance (Diagonal) History: (NOT SHOWN HERE)
Deviance Information Criterion (DIC):
        All Stationary
Dbar 62.614     62.614
pD    0.000      0.000
DIC  62.614     62.614
Initial Values:
     alpha  beta_car1  beta_car2   beta_age
-2.6482937 -1.7698730 -0.6848875  1.3175961

Iterations: 10000
Log(Marginal Likelihood): -33.58429
Minutes of run-time: 0.03
Model: (NOT SHOWN HERE)
Monitor: (NOT SHOWN HERE)
Parameters (Number of): 4
Posterior1: (NOT SHOWN HERE)
Posterior2: (NOT SHOWN HERE)
Recommended Burn-In of Thinned Samples: 0
Recommended Burn-In of Un-thinned Samples: 0
Recommended Thinning: 30
Specs: (NOT SHOWN HERE)
Status is displayed every 100 iterations
Summary1: (SHOWN BELOW)
Summary2: (SHOWN BELOW)
Thinned Samples: 1000
Thinning: 10

Summary of All Samples
                 Mean           SD         MCSE      ESS          LB     Median
alpha      -2.6488552 1.342421e-01 6.565916e-03 582.9301  -2.9160940  -2.650964
beta_car1  -1.8137509 2.822791e-01 1.474688e-02 508.9384  -2.3852812  -1.815035
beta_car2  -0.6856638 1.283016e-01 6.234377e-03 701.7594  -0.9401333  -0.684194
beta_age    1.3241569 1.353041e-01 6.180607e-03 752.9705   1.0582629   1.324364
Deviance   62.6135632 1.379071e-06 6.585127e-08 643.3371  62.6135607  62.613563
LP        -49.8707670 1.445016e+00 6.295406e-02 648.5900 -53.5791639 -49.554267
                   UB
alpha      -2.4029950
beta_car1  -1.2906070
beta_car2  -0.4358381
beta_age    1.5855214
Deviance   62.6135661
LP        -48.0070303

Summary of Stationary Samples
                 Mean           SD         MCSE      ESS          LB     Median
alpha      -2.6488552 1.342421e-01 6.565916e-03 582.9301  -2.9160940  -2.650964
beta_car1  -1.8137509 2.822791e-01 1.474688e-02 508.9384  -2.3852812  -1.815035
beta_car2  -0.6856638 1.283016e-01 6.234377e-03 701.7594  -0.9401333  -0.684194
beta_age    1.3241569 1.353041e-01 6.180607e-03 752.9705   1.0582629   1.324364
Deviance   62.6135632 1.379071e-06 6.585127e-08 643.3371  62.6135607  62.613563
LP        -49.8707670 1.445016e+00 6.295406e-02 648.5900 -53.5791639 -49.554267
                   UB
alpha      -2.4029950
beta_car1  -1.2906070
beta_car2  -0.4358381
beta_age    1.5855214
Deviance   62.6135661
LP        -48.0070303

Jags

Inference for Bugs model at “/tmp/RtmpgHy1qo/modelc8f275c94f1.txt”, fit using jags,
 3 chains, each with 2000 iterations (first 1000 discarded)
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
a0        -2.649   0.131 -2.908 -2.737 -2.647 -2.559 -2.394 1.001  2700
age        1.327   0.136  1.063  1.235  1.323  1.419  1.588 1.001  3000
c1        -1.795   0.280 -2.393 -1.976 -1.787 -1.595 -1.295 1.003   990
c2        -0.696   0.128 -0.948 -0.783 -0.695 -0.610 -0.436 1.002  1400
deviance  36.929   2.817 33.407 34.867 36.332 38.308 43.995 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.0 and DIC = 40.9
DIC is an estimate of expected predictive error (lower deviance is better).

Stan

Inference for Stan model: __LHS.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
a0    -2.65    0.00 0.13  -2.93  -2.74  -2.64  -2.56  -2.40  1248    1
c1    -1.79    0.01 0.28  -2.34  -1.98  -1.78  -1.60  -1.27  1706    1
c2    -0.69    0.00 0.13  -0.94  -0.78  -0.69  -0.61  -0.44  1859    1
age    1.33    0.00 0.14   1.06   1.23   1.32   1.42   1.60  1411    1
lp__ 835.44    0.04 1.43 831.88 834.75 835.76 836.50 837.20  1294    1

Samples were drawn using NUTS(diag_e) at Sat Nov 15 14:15:39 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)