Enhanced meboot package, simulating regression standard errors

August 11, 2013
By

In my June 25 post I described R- (i) code to change scale without changing the mean, and (ii) code to make a probability distribution symmetric by modifying order statistics.  Both are commonly encountered problems by R programmers.  My coauthor Javier Lopez-de-Lacalle of Spain has incorporated an efficient version of my code inside the maximum entropy bootstrap (meboot) package in R See the package vignette at http://www.jstatsoft.org/v29/i05/.  You can appreciate Javier’s compact code by typing library(meboot); meboot

Guest post by Hrishikesh D. Vinod. Professor of Economics, Fordham University, New York, E-mail: [email protected]. June 25, 2013

Complete simulation using R available at: http://ssrn.com/abstract=2295723

 

Note that one sets “sym=TRUE” in using the meboot function to  symmetrize the maximum entropy (ME) density. Most students of quantitative sciences are familiar with the Durbin-Watson tests, needed because traditional standard errors (SEs) of regression coefficients are unreliable in the presence of autoregressive, AR(1), regression errors.  However they may not know how unreliable!

Recall that ME bootstrap is a computer intensive tool for time series statistical inference problems where traditional confidence intervals can be unreliable. Vinod (textbook ch.9http://www.worldscibooks.com/economics/6895.html) explains. The Wiener-Kolmogorov-Khintchine (WKK) theory developed long before we had powerful computers relies on the ensemble Ω of time series with the stationarity assumption.

A simulation can be devised where the true SEs are known (from known error AR(1) parameters). It shows that confidence intervals for SEs using the traditional Chi-square density often has nearzero coverage of the true SE.  The simulation also allows us to compare meboot with the moving block bootstrap (MBB), designed for m-dependent series.  A short paper Maximum Entropy Bootstrap Simulations for Variance Estimation (July 18, 2013) available at: http://ssrn.com/abstract=2295723 reports details about my simulation (requiring several days of number crunching) which shows that both traditional SEs and those from MBB perform poorly compared to the meboot. Unfortunately the meboot coverage probability does not exactly equal the nominal coverage 0.95, suggesting that further enhancements may be needed.  If we compare “sym=TRUE” with “sym=FALSE,” the former has a mild advantage according to the simulation.

Each simulation takes about 13 hours of number crunching on my PC.  The following illustrative code is for an abridged version.  The results on so few observations and simulations are not expected to be anywhere close to the reliable bootstrap simulations.  The reader might want to copy my code which produces an output table similar to the tables in  http://ssrn.com/abstract=2295723  almost ready for Latex in a research paper without having to copy and paste individual results. It is not claimed to be efficient, but should be self explanatory.

 

#Lahiri2Sym
 
#jasa page 240 march 2012 Lahiri
 
#integer block length b, hk sequence=a'Qkna quad from
 
#set.seed(235)
 
#author  H. D. Vinod, Fordham Univ. NY
 
 
 
rm(list=ls()) 
 
options(prompt = " ", continue = "  ", width = 68,
 
useFancyQuotes = FALSE)
 
 
 
require(meboot)
 
 
 
 
 
 
 
###function theta to get one parameter SE of interest here
 
theta=function(yi,ui,coef.no=2){
 
reg1=lm(yi~ui)
 
su=summary(reg1) #second col. contains standard errors of reg. coeff
 
return(su$coef[coef.no,2])
 
} #returns standard error where coef.no=2 is slope, but can be changed
 
###function end
 
 
 
 
 
###function begin  CI for std deviation Chi-sq based
 
sd.interval = function(data, conf.level = 0.95, df) {
 
chilower = qchisq((1 - conf.level)/2, df)
 
chiupper = qchisq((1 - conf.level)/2, df, lower.tail = FALSE)
 
v = var(data)
 
low= sqrt(df * v/chiupper)
 
upper= sqrt(df * v/chilower)
 
list(low=low, upper=upper)
 
}
 
###function sd.interval ends
 
 
 
 
 
###bootstrap matrix create function needs "coef.no" argument for theta
 
bstar.meboot <- function(yt, theta, ui,
 
    level = 0.95, bigJ = iter, coef.no=2) { 
 
  semy <- meboot(x = yt, reps = bigJ,
 
    expand.sd=FALSE, sym=TRUE)$ensemble
 
  n <- NROW(yt)
 
    m <- length(theta(yi=yt,ui=ui,coef.no=coef.no))
 
    if(m!=1) stop("too many parameters in theta")
 
    bb <- matrix(NA, bigJ)
 
    for(j in 1:bigJ) {
 
      yy <- semy[,j]   #select j-th column yt
 
      bb[j] <- theta(yi=yy, ui=ui, coef.no=coef.no)
 
    }
 
    return(bb)
 
  }
 
 
 
### end function bstar.meboot  which finds J=1000? versions
 
### of the standard errors (the parameter of interest here)
 
 
 
#begin simulation code for moving block boot
 
require(boot)
 
###function bstar.MBB is similar to bstar.meboot
 
bstar.MBB <- function(yt, theta, ui,
 
  level = 0.95, bigJ = iter, blocklen=5, coef.no=coef.no) {
 
#sim="fixed" gives Moving block bootstrap in the call below
 
out=tsboot(yt,statistic=theta, R=bigJ, sim="fixed", l=blocklen,
 
ui=ui, coef.no=coef.no)
 
bs=out$t
 
list(out=out,bs=bs)
 
} ##end of bstar.MBB function
 
 
 
###function covg.prob to determine coverage prob. and CI width
 
covg.prob=function(xblow, xbup,true.se){
 
#author  H. D. Vinod, Fordham Univ. NY
 
#function to compute coverage prob. and width
 
covg=0    #initialize
 
wid=xbup-xblow #CI widths
 
if (wid<0)
 
print(c("Error lower limit xblow> upper", xblow, xbup))
 
if (true.se>= xblow & true.se <=xbup) covg=1
 
list(wid=wid, covg=covg)
 
}
 
###function checks how well true value is covered by CI
 
 
 
 
 
###function fixed part of Lahiri specification
 
fixed.part=function(bigT,smalli=3){
 
zi=rnorm(bigT)
 
ui=(2+zi)*sqrt(smalli)
 
return(ui)
 
}
 
###function end
 
 
 
###function dgp sets up the DGP for each simulation
 
dgp=function(ui, bet=c(1,-1), ei){
 
bigT=length(ui)
 
if (length(ei) != bigT) stop("dgp ui, ei wrong lengths")
 
yi=bet[1]+bet[2]*ui+ei
 
return(yi)
 
}
 
### dgp function ends
 
 
 
#author  H. D. Vinod, Fordham Univ. NY
 
# the following code not only runs simulation but
 
#collects the results in a nice table ready for publication
 
#only after all above functions compile, code to simulate
 
date()
 
bet=c(1,-1)   #true beta values chosen for simulation
 
print("Lahiri2Sym.R  is the name of the program")
 
print(c("true regression intercept=1, slope=-1"))
 
blocklen=5  #block length for MBB
 
print(c("block length of MBB=", blocklen))
 
expan=1 #this is a relic of old meboot versions
 
#by setting it at 1 it ignores the stuff
 
print(c("expansion number used for meboot=", expan))
 
iter =20  #change this to a larger value   #########################
 
#iter <- 1000  #notation is J in the text
 
nsim=20 #change this  to a larger value   #########################
 
#nsim <- 1000#500#20#500  #notation is N in the text
 
#Tseq=c(50,75,100,150)
 
Tseq=20  #only one T is computed in test run
 
#Tseq=c(100, 300)   #change this    #########################
 
len.Tseq=length(Tseq)
 
ar1thet=c(0.3,0.6,0.9) #define models 1 to 3
 
#b=c(3, 5, 10, 20, 30, 50)#block sizes in Lahiri
 
#ar1thet=c(0.3) #define models 1 to 3
 
 
 
set.seed(132)
 
 #### begin coef.no LOOP
 
for (coef.no in 1:1){
 
if (coef.no==1){
 
print(c(" focus is on Intercept coefficient number ", coef.no))}
 
if (coef.no==2){
 
print(c(" focus is on slope coefficient number ", coef.no))}
 
 
 
 
 
####begin ar1theta Loop
 
for (ithet in 1:length(ar1thet)){
 
ar1theta=ar1thet[ithet]
 
print(c("theta for AR(1)errors=",ar1theta))
 
 
 
out=matrix(NA,ncol=4, nrow=4*len.Tseq) #4 sets for each experiment
 
 
 
 
 
#### begin loop over bigT
 
for (jT in 1:len.Tseq){
 
bigT=Tseq[jT]
 
 
 
print(c("sample size=", bigT))
 
 
 
seei=1/(1-ar1theta^2)# infinite sum 1+ thet^2 + thet^4  etc
 
 
 
ui=fixed.part(bigT)  #define the right hand regressor variable
 
#since regressors are randomly created (fixed)
 
#set.seed is needed above this call
 
 
 
ane=rep(1,bigT)#column of ones
 
bigx=cbind(ane,ui)#matrix of regressors
 
xtxinv=solve(t(bigx)%*% bigx)
 
#print(xtxinv)
 
true.se0=sqrt(xtxinv[1,1]*seei)
 
true.se1=sqrt(xtxinv[2,2]*seei)
 
 
 
if (jT==1){
 
print(c("true SE intercept/ slope",true.se0,true.se1))
 
} #end if
 
true.SE= sqrt(xtxinv[coef.no,coef.no]*seei)
 
#need to initialize counters for each model
 
 
 
counter.iid <- counter.MBB <- counter.meboot <- 0
 
wid.iid <-  wid.meboot <- wid.MBB <- rep(NA, nsim)
 
var.iid <-  var.meboot <- var.MBB <- rep(NA, nsim)
 
eyT.iid <-  eyT.meboot <- eyT.MBB <- rep(NA, nsim)
 
 
 
outw=matrix(NA,ncol=3, nrow=nsim)
 
outv=matrix(NA,ncol=3, nrow=nsim)
 
oute=matrix(NA,ncol=3, nrow=nsim)
 
 
 
 
 
### begin i loop with new regression errors for each i
 
for (i in seq(1, nsim)){
 
ei <- arima.sim(list(order = c(1,0,0), ar = ar1theta), n = bigT)
 
 
 
#Note that var(ei) is chi-sq r.v. with T-2 df
 
#95 percent CI are (n-1) s^2/ chisq(alp/2, n-1)
 
 
 
#### call sd.interval = function(data, conf.level = 0.95, df)
 
 
 
se.b0=sqrt(xtxinv[1,1]*var(ei))
 
se.b1=sqrt(xtxinv[2,2]*var(ei))
 
if (i==1) print(c("seb0,seb1=",se.b0,se.b1))
 
 
 
sd.iid=sd.interval(data=ei, df=(bigT-2))
 
 
 
obs.SE= sqrt(xtxinv[coef.no,coef.no]*var(ei))
 
#### call seq covg.prob=function(xblow, xbup,true.se)  #call details
 
xblow=sd.iid$low
 
xbup=sd.iid$upper
 
if (xblow > xbup) print(c("Error, xblow=", xblow, xbup, "i=",i))
 
covg.iid=covg.prob(xblow=sd.iid$low, xbup=sd.iid$upper,  true.se=true.SE)
 
wid.iid[i]=covg.iid$wid
 
eyT.iid[i]= obs.SE
 
var.iid[i]= (obs.SE-true.SE)^2
 
counter.iid=counter.iid+covg.iid$covg
 
 
 
#### call dgp=function(ui, bet=c(1,-1), ei)  ## the call details`
 
yt=dgp(ui=ui, ei=ei, bet=bet)
 
 
 
bsMBB=bstar.MBB(yt,theta, bigT, bigJ=iter,
 
blocklen=blocklen, coef.no=coef.no, ui=ui)$bs
 
qnMBB <- quantile(bsMBB, c(0.025, 0.975), type = 8)
 
covg.MBB=covg.prob(xblow=qnMBB[1], xbup=qnMBB[2],  true.se=true.SE)
 
wid.MBB[i]=covg.MBB$wid
 
eyT.MBB[i]=mean(bsMBB)
 
var.MBB[i]=var(bsMBB)
 
counter.MBB=counter.MBB+covg.MBB$covg
 
 
 
bsme0= bstar.meboot(yt=yt,theta, bigT,
 
bigJ=iter, coef.no=coef.no, ui=ui)
 
bsme=bsme0 +expan*(bsme0-mean(bsme0))
 
#these are (2x-xbar) when expan=1. Following for 95% CI
 
qnmeboot <- quantile(bsme, c(0.025, 0.975), type = 8)
 
#qnmeboot <- quantile(bsme, c(0.00005, 0.99995), type = 8)
 
covg.meboot=covg.prob(xblow=qnmeboot[1], xbup=qnmeboot[2],
 
  true.se=true.SE)
 
wid.meboot[i]=covg.meboot$wid  #store width for i-th simulation
 
eyT.meboot[i]=mean(bsme)#store average of SEs of coeff
 
var.meboot[i]=var(bsme)
 
counter.meboot=counter.meboot+covg.meboot$covg
 
 
 
if(i<4){
 
print(c(counter.iid, counter.MBB,counter.meboot))
 
print(c(wid.iid[i], wid.MBB[i] ,wid.meboot[i]))
 
print(c(eyT.iid[i],eyT.MBB[i] ,eyT.meboot[i]))
 
print(c(var.iid[i],var.MBB[i] ,var.meboot[i]))
 
 }
 
outw[i,1:3]=cbind(wid.iid[i], wid.MBB[i], wid.meboot[i])
 
oute[i,1:3]=cbind(eyT.iid[i], eyT.MBB[i], eyT.meboot[i])
 
outv[i,1:3]=cbind(var.iid[i], var.MBB[i], var.meboot[i])
 
}#end nsim loop over i
 
outp=cbind(counter.iid, counter.MBB, counter.meboot)/nsim
 
#apply(outc,MARGIN=2,mean)
 
print(c("coverage prob", outp))
 
 
 
 
 
outw.mean=apply(outw,MARGIN=2,mean)
 
oute.mean=apply(oute,MARGIN=2,mean)
 
outv.mean=apply(outv,MARGIN=2,mean)
 
print(date())
 
#inter-leaf the output, first row has probabilities and second has widths
 
out1=rbind(outp, outw.mean,oute.mean,outv.mean)
 
print(out1)
 
print(date())
 
out[(4*jT-3),1]=nsim
 
out[(4*jT-2),1]=bigT
 
out[(4*jT-1),1]=ar1theta
 
out[4*jT,1]=5
 
out[(4*jT-3),2:4]=outp
 
out[(4*jT-2),2:4]=outw.mean
 
out[(4*jT-1),2:4]=oute.mean
 
out[4*jT,2:4]=outv.mean
 
print(c("above results have T= ", bigT))
 
} #end of bigT loop
 
 
 
print(out)
 
 
 
require(xtable)
 
print(xtable(out, digits=4))
 
 
 
print(c("Above ar1theta=", ar1theta))
 
} #end of ar1theta loop
 
print(c("coefficient number above", coef.no))
 
} #end of coef.no loop
 
print("Symmetric ME density, sym=TRUE, expand.sd=FALSE")
 
print(date())
 
 


If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.