Converting a JAGS model to STAN

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For my first experience with STAN I wanted to convert my last JAGS program into STAN. This was a bit more difficult than I expected. The JAGS program was Fe concentration in rainwater including values below detection level.

Data

Data has been explained before. The only difference is that by now I am reading the data using read.xls from the gdata package. The new code is at the bottom of this post.

Converting the code

Initially I thought the conversion would be simple. I made some simplification and the code worked. However, it appeared my initial code was not very fast. Then I added the out of range data and that resulted in a problem seen before, out of range initialized data and hence no MCMC samples. After muddying along for a bit my second step was to make a simple regression model and get to know STAN a bit better. This knowledge set in a simple model. Extended it till the model had the features which I wanted except the out of range data. To add the out of range data I decided to initialize the more complex model using estimates from a more simple model. This worked well, the first code also serves as a kind of warm-up too. There are some warnings the way I set it up, but that is only in the first model, which is only run for a few cycles. Final running time was a disappointing close to two hours. This could probably be reduced a bit using less but longer chains.

learned

  • STAN is not so similar to JAGS that you can drop in code, it is better to start a program fresh than try a quick convert 
  • Using normal(0,1) and a multiplication factor gives faster code than using normal(0,f). It says so in the manual and yes it does make a difference
  • Having aliased parameters slows things down
  • STAN’s compilation step makes for slower development than JAGS but it is still best to start small and extend the model
    • at some point I just worked on code with few samples and two chains, which helped a bit.
  • the STAN code does not feel faster than JAGS code

Run results

output during fitting first model

TRANSLATING MODEL ‘Fe_code1’ FROM Stan CODE TO C++ CODE NOW.
DIAGNOSTIC(S) FROM PARSER:

Warning (non-fatal): sampling statement (~) contains a transformed parameter or local variable. 
You must increment lp__ with the log absolute determinant of the Jacobian of the transform. 
Sampling Statement left-hand-side expression:
          LAmountC ~ normal_log(…)

COMPILING THE C++ CODE FOR MODEL ‘Fe_code1’ NOW.
SAMPLING FOR MODEL ‘Fe_code1’ NOW (CHAIN 1).

Informational Message: The current Metropolis proposal is about to be rejected becuase of the following issue:
Error in function stan::prob::normal_log(N4stan5agrad3varE): Scale parameter[0]  is 0:0, but must be > 0!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:  1 / 75 [  1%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected becuase of the following issue:
Error in function stan::prob::normal_log(N4stan5agrad3varE): Location parameter[0]  is inf:0, but must be finite!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:  7 / 75 [  9%]  (Warmup)
Iteration: 14 / 75 [ 18%]  (Warmup)
Iteration: 21 / 75 [ 28%]  (Warmup)
Iteration: 28 / 75 [ 37%]  (Warmup)
Iteration: 35 / 75 [ 46%]  (Warmup)
Iteration: 42 / 75 [ 56%]  (Warmup)
Iteration: 49 / 75 [ 65%]  (Warmup)
Iteration: 56 / 75 [ 74%]  (Sampling)
Iteration: 63 / 75 [ 84%]  (Sampling)
Iteration: 70 / 75 [ 93%]  (Sampling)
Iteration: 75 / 75 [100%]  (Sampling)
Elapsed Time: 55.0709 seconds (Warm-up)
              56.8683 seconds (Sampling)
              111.939 seconds (Total)

[Etc chains 2 to 4]

fit1

Inference for Stan model: Fe_code1.
4 chains, each with iter=75; warmup=50; thin=1;
post-warmup draws per chain=25, total post-warmup draws=100.

             mean se_mean  sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
intercept     0.0     0.0 0.1   -0.1    0.0    0.0    0.1    0.1    46  1.1
rate         -0.4     0.0 0.0   -0.5   -0.4   -0.4   -0.4   -0.3    45  1.1
FLoc_r[1]    -0.2     0.0 0.3   -0.8   -0.5   -0.2    0.0    0.4    58  1.1
FLoc_r[2]     1.4     0.1 0.4    0.8    1.1    1.3    1.6    2.2    25  1.1
FLoc_r[3]    -0.3     0.1 0.3   -0.9   -0.5   -0.3   -0.1    0.2    37  1.1
.
…output lines deleted
.
lp__       4194.4     0.9 4.4 4186.6 4190.8 4194.9 4197.5 4202.4    26  1.2

Samples were drawn using NUTS(diag_e) at Thu Jan  9 19:54:38 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Running second model

TRANSLATING MODEL ‘Fe_code2’ FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL ‘Fe_code2’ NOW.
SAMPLING FOR MODEL ‘Fe_code2’ NOW (CHAIN 1).

Iteration:   1 / 500 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected becuase of the following issue:
Error in function stan::prob::normal_log(N4stan5agrad3varE): Location parameter[0]  is inf:0, but must be finite!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Sampling)
Iteration: 200 / 500 [ 40%]  (Sampling)
Iteration: 250 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)
Elapsed Time: 322.58 seconds (Warm-up)
              1212.62 seconds (Sampling)
              1535.2 seconds (Total)

[etc chains 2 to 4]

fit2

Inference for Stan model: Fe_code2.
4 chains, each with iter=500; warmup=100; thin=1;
post-warmup draws per chain=400, total post-warmup draws=1600.

             mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff
intercept     0.2     0.0  0.1     0.1     0.2     0.2     0.3     0.4   562
yearlydec     0.9     0.0  0.0     0.9     0.9     0.9     0.9     0.9   601
sigmaF[1]     0.5     0.0  0.0     0.5     0.5     0.5     0.5     0.5   394
.
…output lines deleted
.
lp__      -1836.8     3.6 56.4 -1944.1 -1876.3 -1830.9 -1802.3 -1723.9   245
          Rhat
intercept  1.0
yearlydec  1.0
sigmaF[1]  1.0
.
.output lines deleted
.
lp__       1.0

Samples were drawn using NUTS(diag_e) at Thu Jan  9 21:32:53 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Code

Reading data

# http://www.lml.rivm.nl/data/gevalideerd/data/lmre_1992-2005.xls
# https://www.r-bloggers.com/read-excel-files-from-r/
library(gdata)

readsheet <- function(cursheet,fname) {
df = read.xls(fname, 
sheet = cursheet , 
blank.lines.skip=FALSE, 
colClasses = “character”)
topline <- 8
test <- which(df[6,]=='C')+1
numin <- df[topline:nrow(df),test]
names(numin) <- make.names( gsub(' ','',df[6,test]))
for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(',','.',numin[,i]))
status = df[topline:nrow(df),test-1]
names(status) <- paste('C',make.names( gsub(' ','',df[6,test])),sep='')
numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
numin$EndDate <-  as.Date(substr(df[topline:nrow(df),2],1,10))
numin <- cbind(numin,status)
numin <- numin[!is.na(numin$StartDate),]
tall <- reshape(numin,
varying=list(
                            make.names(gsub(‘ ‘,”,df[6,test])),
                            paste(‘C’,
                               make.names(gsub(‘ ‘,”,df[6,test])),sep=”)),
v.names=c(‘Amount’,’Status’),
timevar=’Compound’,
idvar=c(‘StartDate’,’EndDate’),
times=paste(df[6,test],'(‘,df[5,test],’)’,sep=”),
direction=’long’)
tall$Compound <- factor(gsub(' )',')',gsub(' +',' ',tall$Compound)))
row.names(tall)<- NULL
tall$Location <- cursheet
tall
}

readfile <- function(fname) {
sheets <- sheetNames(fname)[-1]
tt <- lapply(sheets,readsheet,fname=fname)
tt2 <- do.call(rbind,tt) 
tt2$Location <- factor(tt2$Location)
tt2$fname <- fname
tt2
}

fnames <- dir(pattern='*.xls') #.\\. ?
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)

rf2$machine <- factor(ifelse(rf2$fname=="lmre_1992-2005.xls",
‘Van Essen/ECN vanger’,
‘ Eigenbrodt NSA 181 KHT’))

Select Fe data and prepare 

Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<0.4] <- NA

Fe2 <- Fe[!is.na(Fe$Status),]
Fe3 <- Fe2[!is.na(Fe2$LAmount),]
Fe4 <- Fe2[is.na(Fe2$LAmount),]

library(rstan) 
datain <- list(
LAmount = Fe3$LAmount,
Machine=(Fe3$machine==’Van Essen/ECN vanger’)+1,
Location=(1:nlevels(Fe3$Location))[Fe3$Location],
time=as.numeric(Fe3$StartDate),
nobs =nrow(Fe3),
nloc=nlevels(Fe3$Location),
MachineC=(Fe4$machine==’Van Essen/ECN vanger’)+1,
LocationC=(1:nlevels(Fe4$Location))[Fe4$Location],
timeC=as.numeric(Fe4$StartDate),
nobsC =nrow(Fe4),
L=log10(0.399)
)

First model

parameters1 <- c('intercept','rate','FLoc_r',
                    ‘sigmaF’,’sfmach’,’sfloc’,’FMach_r’ )

Fe_code1 <- ' 
data {
int nloc; 
      int nobs;
        vector[nobs] LAmount;
        int Machine[nobs];
        int Location[nobs];
        vector[nobs] time;
        int nobsC;
        real L;
        int MachineC[nobsC];
        int LocationC[nobsC];
        vector[nobsC] timeC;
  }
parameters {
        real intercept;
        real rate;  
        vector [nloc] FLoc_r;
        vector [2] sigmaF;
        real sfmach;
        real sfloc;
        vector[2] FMach_r;
        }
    transformed parameters {
vector[nobs] ExpLAmount;
        vector[nobs] sigma;
        real mean_FLoc;
        real mean_FMach;
        vector[nloc] FLoc;
        vector[2] FMach;
        vector[nobsC] ExpLAmountC;
        vector [nobsC] sigmaC;
        vector[nobsC] LAmountC;
        mean_FLoc <- mean(FLoc_r);
        FLoc <- (FLoc_r-mean_FLoc)*sfloc;
        mean_FMach <- mean(FMach_r);
        FMach <- (FMach_r-mean_FMach)*sfmach;
        for (i in 1:nobs) {
            ExpLAmount[i] <- intercept  
+ .0001*rate*time[i] 
+ FMach[Machine[i]] + 
+ FLoc[Location[i]];
            sigma[i] <- sigmaF[Machine[i]];
        }
        for (i in 1:nobsC) {
            ExpLAmountC[i] <- intercept  
+ .0001*rate*timeC[i] 
+ FMach[MachineC[i]] + 
+ FLoc[LocationC[i]];
            sigmaC[i] <- sigmaF[MachineC[i]];
            LAmountC[i] <- log10(.2); 
        }
     }  
     model {        
        sfmach ~ cauchy(0,4);
        sfloc ~ cauchy(0,4);
        FMach_r ~ normal(0,1);
        FLoc_r ~ normal(0,1);
        sigmaF ~ cauchy(0,4);
LAmount ~ normal(ExpLAmount,sigma);
        LAmountC ~ normal(ExpLAmountC,sigmaC);
        rate ~ normal(0,1);
   }
     generated quantities {
        real dailydec;
        real yearlydec; 
      dailydec <- pow(10,rate*.0001);
        yearlydec <- pow(dailydec,365);
        }

fit1 <- stan(model_code = Fe_code1, 
data = datain, 
pars=parameters1,
warmup=50,
iter = 75, 
chains = 4)
fit1

Second model

samples <- as.array(fit1)
lastsample <- dim(samples)[1]
nchain <- dim(samples)[2]
init2 <- lapply(1:nchain,function(i) {
   fin <- samples[lastsample,i,]
   list(
intercept=fin[1],
rate=fin[2],
FLoc_r=fin[3:(3+16)],
sigmaF=fin[20:21],
sfmach=fin[22],
sfloc=fin[23],
FMach_r=fin[24:25],
LAmountC=rep(log10(.2),datain$nobsC)
)
})

parameters2 <- c('intercept','yearlydec',
‘sigmaF’,’FMach’,’sfmach’,’sfloc’,
‘FLoc’   ,’rate’ )

Fe_code2 <- ' 
data {
int nloc; 
int nobs;
vector[nobs] LAmount;
int Machine[nobs];
int Location[nobs];
vector[nobs] time;
int nobsC;
real L;
int MachineC[nobsC];
int LocationC[nobsC];
vector[nobsC] timeC;
}
parameters {
real intercept;
real rate;  
vector [nloc] FLoc_r;
vector [2] sigmaF;
real sfmach;
real sfloc;
vector[2] FMach_r;
vector[nobsC] LAmountC;
        
}
transformed parameters {
vector[nobs] ExpLAmount;
vector[nobs] sigma;
real mean_FLoc;
real mean_FMach;
vector[nloc] FLoc;
vector[2] FMach;
vector[nobsC] ExpLAmountC;
vector [nobsC] sigmaC;
mean_FLoc <- mean(FLoc_r);
FLoc <- (FLoc_r-mean_FLoc)*sfloc;
mean_FMach <- mean(FMach_r);
FMach <- (FMach_r-mean_FMach)*sfmach;
for (i in 1:nobs) {
ExpLAmount[i] <- intercept  
  + .0001*rate*time[i] 
+ FMach[Machine[i]] + 
+ FLoc[Location[i]];
sigma[i] <- sigmaF[Machine[i]];
}
for (i in 1:nobsC) {
ExpLAmountC[i] <- intercept  
+ .0001*rate*timeC[i] 
+ FMach[MachineC[i]] + 
+ FLoc[LocationC[i]];
sigmaC[i] <- sigmaF[MachineC[i]];
}
}  
model {        
sfmach ~ cauchy(0,4);
sfloc ~ cauchy(0,4);
FMach_r ~ normal(0,1);
FLoc_r ~ normal(0,1);
sigmaF ~ cauchy(0,4);
LAmount ~ normal(ExpLAmount,sigma);
LAmountC ~ normal(ExpLAmountC,sigmaC);
rate ~ normal(0,1);
}
generated quantities {
real dailydec;
real yearlydec; 
dailydec <- pow(10,rate*.0001);
yearlydec <- pow(dailydec,365);
}

fit2 <- stan(model_code = Fe_code2, 
data = datain, 
pars=parameters2,
init=init2,
warmup=100,
iter = 500, 
chains = 4)

fit2

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)