Guest post by Hrishikesh D. Vinod*

(^{∗}Professor of Economics, Fordham University, New York, E-mail: [email protected]. June 25, 2013)

### Complete paper with R code freely available at: http://ssrn.com/abstract=2285041

R code for changing scale without changing mean or to make a probability distribution symmetric. These are commonly encountered problems by R programmers. We provide code for both of these tasks in the context of maximum entropy bootstrap (meboot) package in R.

Why study bootstrap? It is a vital computer intensive tool for statistical inference (not estimation). It is particularly suited for complicated nonlinear problems where traditional (asymptotic) conﬁdence intervals tend to be too wide, and diﬃcult. Vinod (textbook ch.9 http://www.worldscibooks.com/

The meboot algorithm available as **R package** also called **meboot** offers computer intensive construction of Ω . See the package vignette at http://www.jstatsoft.org/v29/

The maximum entropy (ME) density is maximally noncommittal about unavailable information regarding its functional form. It is constructed from the order statistics x(t) of time series x_{t}. It constructs exactly T intervals, each of which contains exactly one x(t) . The bootstrap resamples will contain one observation from each such interval with probability 1/T . This is called mass-preserving constraint. The ME density also imposes a mean-preserving constraint. For a toy example of five observation in x, the following R code (in red font) creates J=4 resamples x(t,j) in a T ×J matrix representing the ensemble. Usually T and J>999 are much larger in a realistic ensemble for inference purposes. The aim, of course, is to note what might happen to the time series shape in a large population of time series.

1 | require(meboot); set.seed(234); x=c(4,12,36,20,8); xtj=meboot(x,reps=4)$ensem; xtj |

The overall variance of the ME density is smaller than that of the original data. The enhancement R code equates the population variance of ME density to that of the data. The basic idea is to use a linear transformation and multiply the deviations of resampled data from population mean by a suitably found constant kappa.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | findKapa=function (x, trim = 0.1) { #find kappa by which to multiply sd of each ensemble n <-length(x) xx <-sort(x) # ordxx <-order(x) z <-rowMeans(embed(xx, 2)) dv <-abs(diff(as.numeric(x))) dvtrim <-mean(dv, trim = trim) xmin <-xx[1] -dvtrim xmax <-xx[n] + dvtrim aux <-colSums(t(embed(xx, 3)) * c(0.25, 0.5, 0.25)) #following ensures mean preserving constraint desintxb <-c(0.75 * xx[1] + 0.25 * xx[2], aux, 0.25 * xx[n -1] + 0.75 * xx[n]) #desired means zz=c(xmin,z,xmax)#extended list of z values v=rep(NA,n) #storing within variances for (i in 2: (n+1)){ v[i-1]=((zz[i]-zz[i-1])^2)/12 } xb=mean(x) s1=sum((desintxb-xb)^2) uv=(s1+sum(v))/n #ME density variance desired.sd=sd(x) actualME.sd=sqrt(uv) if (actualME.sd<=0) print("actualME.sd<=0 Error") out=desired.sd/actualME.sd return(out-1) } # The paper shows how the ‘ﬁndKapa’ function works on a toy example. require(meboot); set.seed(234); x=c(4,12,36,20,8); kap=findKapa(x);kap xtj=meboot(x,reps=4, expand.sd=FALSE)$ensem xbar=mean(x);xbar ytj=xtj+kap*(xtj-xbar) apply(ytj,2,sd) #report sd for y(t,j) apply(ytj,2,mean)/apply(xtj,2,mean) apply(ytj,2,sd)/apply(xtj,2,sd) |

For the toy example κ =0.184718 holds. The transformation changes only the population variance. The sample standard deviation of y(t,j) for any particular j-thresample (column of ytj) need not equal σx = 12.64911. The last line of the code verifies that standard deviations of transformed data get multiplied by the common factor 1.184718.

What is the motivation behind this scale adjustment? Given a time series xt the unadjusted meboot constructs a large number J of similar time series x(t,j) to form an ensemble of time series to represent the population of time series using the ME density. Our scale adjustment from x(t,j) to y(t,j) makes sure that the population variance of the transformed series equals σ_{x}^{2 }. This is intuitively desirable.

Since many of the sample statistics have asymptotically Normal distributions (based on central limit theorem type arguments), it may be desirable to have symmetric sampling distributions. This motivation leads to the next symmetrizing enhancement. Theil (1980) ﬁrst considered this problem for a version of the ME density having exponential tails. He suggested a symmetrizing transform new order statistics y(t). The adjustment needs to get into the guts of the meboot algorithm. R Software code for symmetrizing is a bit too long for short description here. It is provided with detailed descriptions at : http://ssrn.com/abstract=