# Functional regression, INLA, and how to build the World’s Slowest Fourier Transform

March 29, 2012
By

(This article was first published on dahtah » R, and kindly contributed to R-bloggers)

[This was originally going to be about multilevel modelling using INLA, but to do that flexibly you need INLA's observation matrix mechanism, and this one deserves an explanation in its own right, so I'm introducing it by way of an easier example (for which you don't actually need the observation matrix mechanism, but it's clearer that way and it generalises better). For more on INLA, see the INLA website or this previous post.]

In Ramsay and Silverman’s Functional Data Analysis there’s a chapter devoted to doing regression when what you are trying to predict is a scalar quantity and what you have as covariate (input) is a function. For example, you might have measured someone’s heart rate during an effort test (which can be thought of as ${h(t)}$, a function of time), and you want to predict their performance in a marathon. The model they suggest is a functional linear regression, which is simply:

$\displaystyle y_{i}=\int\beta(t)h_{i}(t)\mbox{d}t+\epsilon_{i}$

with the usual assumption that ${\epsilon_{i}\sim\mathcal{N}\left(0,\sigma^{2}\right)}$. You also need to constrain ${\beta(t)}$ in some way (otherwise you are guaranteed to always get a perfect fit), and they choose to force ${\beta(t)}$ to be smooth, which makes sense in a lot of contexts. In the marathon example, maybe what matters is not heart rate at the beginning of the effort test, but heart rate after a few minutes when subjects have reached a steady-state of sorts, so that ${|\beta(t)|}$ would be small for ${t}$ around 0 and gradually grow, but we don’t expect it to suddenly jump at some ${t}$.

The linear functional regression problem actually comes up quite a bit: in neuroscience and psychology, reverse correlation and classification images problems can be viewed as more or less exactly the same thing (trying to predict a neuron’s linear response from an audio signal is exactly the marathon example, with a different likelihood function). In other fields, it might be known as a signal reconstruction problem, or as an integral equation problem. In signal reconstruction we know think of ${\beta(t)}$ as a signal to be reconstructed from measurements, and the ${h_{i}(t)}$‘s are measurement functions. If ${h_{i}(t)=\delta\left(t-t_{i}\right)}$, then we have sampled ${\beta(t)}$ at locations ${t_{1},\ldots,t_{n}}$. If ${h_{i}(t)=\frac{1}{\left(b_{i}-a_{i}\right)}\mathbb{I}\left(t\in[a_{i},b_{i}]\right)}$, and ${\mathbb{I}}$ is an indicator function, we have measured the average value of ${\beta\left(t\right)}$ over a sequence of intervals ${[a_{1},b_{1}]\ldots[a_{n},b_{n}]}$. If the measurement functions are Fourier polynomials, we are measuring ${\beta\left(t\right)}$ in Fourier space (apparently this happens in some real physical systems).

So suppose we have measured a bunch of Fourier coefficients. If the measurements are noisy we can’t just use the inverse Fourier transform, there is some uncertainty involved. The Bayesian way would be to set a prior on ${\beta(t)}$, and look at ${p\left(\beta(t)|\mathbf{y}\right)}$. If the prior can be expressed as a Gauss-Markov random field, which is the case for smoothness priors, then we can do inference using INLA. It’ll be useful to describe the framework in full generality first.

In latent Gaussian models you have, on the one hand, a set of ${m}$ latent variables, which we’ll call ${\mathbf{z}}$, and on the other a set of observations, ${\mathbf{y}}$. The latent variables z have a prior distribution that is Gaussian, hence the name. INLA has the further constraint that the ${y_{i}}$‘s are independently distributed given a specific linear combination of the ${z_{i}}$‘s. The whole thing can be summarised as:

$\displaystyle \mathbf{z}\sim\mathcal{N}\left(\mu,\mathbf{Q}^{-1}\right)$

(the prior)

and:

$\displaystyle y_{i}\sim l(\mathbf{a}_{i}^{t}\mathbf{z})$

(the likelihood)

where ${l}$ is some distribution parameterised by ${\eta_{i}=\mathbf{a}_{i}^{t}\mathbf{z}}$, for example a Gaussian with mean ${\eta_{i}}$. Most often ${\mathbf{a}_{i}}$ will define what sort of observation we are making on datapoint ${i}$, and it is useful to stack all those vectors in an observation matrix ${\mathbf{A}}$, and to define:

$\displaystyle \eta=\mathbf{Az}$

Usually ${\mathbf{A}}$ is the identity matrix, and that is indeed the default in INLA, but modifying ${\mathbf{A}}$ greatly expands the class of models that can be fitted. In our case we can take the latent field to be the values of ${\beta(t)}$ at different time points, ${t_{1}\ldots t_{k}}$, arranged on a regular grid. Each of our measurements corresponds to a noisy version of ${\int\beta(t)h_{i}(t)\mbox{d}t}$, which we can approximate as:

$\displaystyle \Delta_{t}\sum_{j=1}^{m}\beta(t_{j})h_{i}\left(t_{j}\right)$

so that ${\mathbf{a}_{i}=\left[\begin{array}{cccc} h_{i}\left(t_{1}\right) & h_{i}\left(t_{2}\right) & \ldots & h_{i}\left(t_{m}\right)\end{array}\right]}$ (that’s not the optimal way of doing quadrature, but for fine grids it’s good enough).

Here’s a concrete example. We take ${\beta(t)}$ to be a relatively complicated function over ${[0,1]}$:

Our measurement functions will be cosines at various frequencies ${k_{i}}$ and phases ${\phi_{i}}$

$\displaystyle h_{i}\left(t\right)=\cos\left(2\pi k_{i}\left(t+\phi_{i}\right)\right)$

To get the equivalent of a discrete Fourier transform we can restrict the ${k_{i}}$‘s to be integers, and the ${\phi_{i}}$ to be either 0 (cosine phase) or ${\pi/2}$ (sine phase).

We construct the matrix ${\mathbf{A}}$ by stacking ${h_{1}\ldots h_{n}}$ row-wise, so that A looks something like that when plotted as a heatmap:

(This image makes my eyes water)

R code to generate data looks like this:

gendat   {
nMes     compute.dotprod     mes     data.frame(k=ks,y = mes + noise.sigma*rnorm(length(ks)))
}


The INLA call I use is:

res

This bit uses a “rw2” model to enforce smoothness, and a variable “ind” which corresponds to the time index (it corresponds to ${j}$ in the expressions above). There are two notable changes compared to normal INLA calls. First, data, which is usually a data.frame, is now a list, because ${\mathbf{y}}$ has a different length from the index variable. Second, the interpretation of the formula changes. The left hand-side defines the response as usual, but the right-hand side now defines the latent field ${\mathbf{z}}$. This makes the notation ${y\sim...}$ a tad inadequate, because the terms are not in one-to-one correspondence anymore. It probably should be ${y\sim A(\ldots)}$, or something similar. I’m putting this out there in case INLA developers happen to read this post (*).

We take measurements for a sequence of (non-integer) ${k}$ from 0 to 15, and a noise level ${\sigma=0.05}$, and run the INLA command shown above. Below, the measurement values (blue dots), along with the posterior expectations and the usual 2.5% and 97.5% quantiles.

And here is the estimated signal

If you remove measurement noise, you’ll have the world’s slowest discrete Fourier transform! It’s actually interesting to play around with this example a bit, because there are plenty of ways to make it go completely wrong (for example when removing sine measurements).

(*) Not only would this notation be clearer, but it would also make it possible to have things like

${y~A(foo)+B(bar)}$

that’d define two latent fields with different dimensions. On the other hand, I imagine it would be a lot harder to parse.

Here’s the full R code:


##An example of functional regression in INLA
##Author: Simon Barthelmé, TU Berlin, 2012
##--------------------------------------------

##Source the file, then run
##---
##res ##plot.measurements(res)
##plot.functions(res)
##---
##to generate the figures in the blog post (give or take some noise)

require(INLA)
require(ggplot2)
require(plyr)

#The "signal" we're measuring
fun
#The measurement functions: k = freq., phi = phase
#phi = pi/2 corresponds to a sine function
mfun
#Generate random measurement data
gendat {
nMes compute.dotprod mes data.frame(k=ks,y = mes + noise.sigma*rnorm(length(ks)))
}

#Generate a discrete measurement matrix
measurement.matrix {
t(sapply(1:length(ks),function(ind) mfun(ts,ks[ind],phis[ind])/length(ts)))
}

#Generate some data and run INLA on them
run.example {
nMes
#A is the measurement matrix
ts A
dat
res res$ks res$phis res$ts res } plot.measurements { nMes df.measure sine=factor((res$phis==pi/2),lab=c('cosine phase','sine phase')),
fitted.mean=res$summary.linear.predictor[(1:nMes),1], fitted.low=res$summary.linear.predictor[(1:nMes),3],
fitted.up=res$summary.linear.predictor[(1:nMes),5], measured=res$data$y ) ggplot(df.measure,aes(x=k))+geom_ribbon(aes(ymin=fitted.low,ymax=fitted.up),alpha=.1)+geom_line(aes(y=fitted.mean))+geom_point(aes(y=measured),col="darkblue")+theme_bw()+labs(y="Measurements",x="Frequency (Hz)")+facet_wrap(~ sine) } plot.functions { nMes #In function space df.function fitted.mean=res$summary.linear.predictor[-(1:nMes),1],
fitted.low=res$summary.linear.predictor[-(1:nMes),3], fitted.up=res$summary.linear.predictor[-(1:nMes),5],
true=fun(res\$ts)
)
ggplot(df.function,aes(x=t))+geom_ribbon(aes(ymin=fitted.low,ymax=fitted.up),alpha=.1)+geom_line(aes(y=fitted.mean))+geom_line(aes(y=true),col="darkred")+theme_bw()+labs(x="Time (sec.)",y="f(t)")
}