Site icon R-bloggers

What happens if we forget a trivial assumption ?

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week, @dmonniaux published an interesting post entitled l’erreur n’a rien d’original  on  his blog. He was asking the following question : let , and denote three real-valued coefficients, under which assumption on those three coefficients does has a real-valued root ?

Everyone aswered , but no one mentioned that it is necessary to have a proper quadratic equation, first. For instance, if both and are null, there are no roots.

It reminds me all my time series courses, when I define processes, i.e.

To have a proper process, has to be a polynomial of order , and has to be a polynomial of order . But that is not enough ! Roots of and have to be differents ! If they have one root in common then we do not deal with a process.

It sounds like something trivial, but most of the time, everyone forgets about it. Just like the assumption that and should be non-null in @dmonniaux‘s problem.

And most of the time, those theoretical problems are extremely important in practice ! I mean, assume that you have an time series,

but you don’t know it is an , and you fit an ,

Most of the time, we do not look at the roots of the polynomials, we just mention the coefficients of the polynomials,

The statistical interpreration is that the model is mispecified, and we have a non-identifiable parameter here. Is our inference procedure clever enough to understand that should be null ? What kind of coefficients and do we get ? Is the first one close to and the second one close to ? Because that is the true model, somehow….

Let us run some monte carlo simulations to get some hints

> ns=1000
> fit2=matrix(NA,ns,3)
> for(s in 1:ns){
+ X=arima.sim(n = 240, list(ar=0.7,sd=1))
+ fit=try( arima(X,order=c(2,0,1))$coef[1:3] )
+ if(!inherits(fit, "try-error")) fit2[s,]=fit
+ }

If we just focus on the estimations that did run well, we get

> library(ks)
> H=diag(c(.01,.01))
> U=as.data.frame(fit2)
> U=U[!is.na(U[,1]),]
> fat=kde(U,H,xmin=c(-2.05,-1.05),xmax=c(2.05,1.05))
> z=fat$estimate
> library(RColorBrewer)
> reds=colorRampPalette(brewer.pal(9,"Reds"))(100)
> image(seq(-2.05,2.05,length=151),
+ seq(-1.05,1.05,length=151),
+ z,col=reds)

The black dot is were we expect to be : close to  and close to . (the stationnarity triangle for time series was added on the graph) But the numerical output is far away from what we were expecting.

So yes, the theoretical assumption to have distinct roots is very import, even if everyone forgets about it ! From a numerical point of view, we can get almost anything if we forget about that trivial assumption ! Actually, I still wonder which kind of “anything” we have… When we look at the distribution of , it is clearly not “uniform”

> hist(fit2[,3],col="light blue",probability=TRUE)

And actually, there are a priori no reason to have . But that’s what we observe here

> range(fit2[!is.na(fit2[,3]),3])
[1] -1  1

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.