Identifying periods of change in time series with GAMs
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
In previous posts (here and here) I looked at how generalized additive models (GAMs) can be used to model nonlinear trends in time series data. In my previous post I extended the modelling approach to deal with seasonal data where we model both the within year (seasonal) and between year (trend) variation with separate smooth functions. One of the complications of time series modelling with smoothers is how to summarize the fitted model; you have a nonlinear trend which may be statistically significant but it may not be increasing or decreasing everywhere. How do we identify where in the series that the data are changing? That’s the topic of this post, in which I’ll use the method of finite differences to estimate the rate of change (slope) in the fitted smoother and, through some mgcv magic, use the information recorded in the fitted model to identify periods of statistically significant change in the time series.
Catching up
First off, if you haven’t already done so, go read my post on modelling seasonal data with GAMs as it provides the background info on the data and the model that I’ll be looking at here but also explains how we fitted the model etc. Don’t worry; you don’t need to run all that code to follow this post as I have put the relevant data processing parts in a Github gist that we’ll download and source()
shortly.
OK, now that you’ve read the previous post we can begin…
To bring you up to speed, I put the bits of code from the previous post that we need here in a gist on Github. To run this code you can simply do the following
<span class="o">></span> <span class="c1">## Load the CET data and process as per other blog post</span>
<span class="o">></span> tmpf <span class="o"><</span> tempfile<span class="p">()</span>
<span class="o">></span> download.file<span class="p">(</span><span class="s">"https://gist.github.com/gavinsimpson/b52f6d375f57d539818b/raw/2978362d97ee5cc9e7696d2f36f94762554eefdf/loadprocesscetmonthly.R"</span><span class="p">,</span>
<span class="o">+</span> tmpf<span class="p">,</span> method <span class="o">=</span> <span class="s">"wget"</span><span class="p">)</span>
<span class="o">></span> source<span class="p">(</span>tmpf<span class="p">)</span>
<span class="o">></span> ls<span class="p">()</span>
[1] "annCET" "bs" "cet" "CET" "coefs" "ctrl" "dat"
[8] "fx" "knots" "m" "m1" "m2" "m3" "op"
[15] "p" "p1" "p2" "p3" "pdat" "res" "rn"
[22] "spl" "tmpf" "want" "Years" "ylab" "ylim"
The gist contains code to download and process the monthly Central England Temperature (CET) time series so that it is ready for analysis via gamm()
. Next we fit an additive model with seasonal and trend smooth and an AR(2) process for the residuals; the code predicts from the model at 200 locations over the entire time series and generates a pointwise, approximate 95% confidence interval on the trend spline.
<span class="o">></span> ctrl <span class="o"><</span> list<span class="p">(</span>niterEM <span class="o">=</span> <span class="m">0</span><span class="p">,</span> msVerbose <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> optimMethod<span class="o">=</span><span class="s">"LBFGSB"</span><span class="p">)</span>
<span class="o">></span> m2 <span class="o"><</span> gamm<span class="p">(</span>Temperature <span class="o">~</span> s<span class="p">(</span>nMonth<span class="p">,</span> bs <span class="o">=</span> <span class="s">"cc"</span><span class="p">,</span> k <span class="o">=</span> <span class="m">12</span><span class="p">)</span> <span class="o">+</span> s<span class="p">(</span>Time<span class="p">,</span> k <span class="o">=</span> <span class="m">20</span><span class="p">),</span>
<span class="o">+</span> data <span class="o">=</span> cet<span class="p">,</span> correlation <span class="o">=</span> corARMA<span class="p">(</span>form <span class="o">=</span> <span class="o">~</span> <span class="m">1</span><span class="o"></span>Year<span class="p">,</span> p <span class="o">=</span> <span class="m">2</span><span class="p">),</span>
<span class="o">+</span> control <span class="o">=</span> ctrl<span class="p">)</span>
<span class="o">></span>
<span class="o">></span> want <span class="o"><</span> seq<span class="p">(</span><span class="m">1</span><span class="p">,</span> nrow<span class="p">(</span>cet<span class="p">),</span> length.out <span class="o">=</span> <span class="m">200</span><span class="p">)</span>
<span class="o">></span> pdat <span class="o"><</span> with<span class="p">(</span>cet<span class="p">,</span>
<span class="o">+</span> data.frame<span class="p">(</span>Time <span class="o">=</span> Time<span class="p">[</span>want<span class="p">],</span> Date <span class="o">=</span> Date<span class="p">[</span>want<span class="p">],</span>
<span class="o">+</span> nMonth <span class="o">=</span> nMonth<span class="p">[</span>want<span class="p">]))</span>
<span class="o">></span> p2 <span class="o"><</span> predict<span class="p">(</span>m2<span class="o">$</span>gam<span class="p">,</span> newdata <span class="o">=</span> pdat<span class="p">,</span> type <span class="o">=</span> <span class="s">"terms"</span><span class="p">,</span> se.fit <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
<span class="o">></span> pdat <span class="o"><</span> transform<span class="p">(</span>pdat<span class="p">,</span> p2 <span class="o">=</span> p2<span class="o">$</span>fit<span class="p">[,</span><span class="m">2</span><span class="p">],</span> se2 <span class="o">=</span> p2<span class="o">$</span>se.fit<span class="p">[,</span><span class="m">2</span><span class="p">])</span>
<span class="o">></span>
<span class="o">></span> df.res <span class="o"><</span> df.residual<span class="p">(</span>m2<span class="o">$</span>gam<span class="p">)</span>
<span class="o">></span> crit.t <span class="o"><</span> qt<span class="p">(</span><span class="m">0.025</span><span class="p">,</span> df.res<span class="p">,</span> lower.tail <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span>
<span class="o">></span> pdat <span class="o"><</span> transform<span class="p">(</span>pdat<span class="p">,</span>
<span class="o">+</span> upper <span class="o">=</span> p2 <span class="o">+</span> <span class="p">(</span>crit.t <span class="o">*</span> se2<span class="p">),</span>
<span class="o">+</span> lower <span class="o">=</span> p2 <span class="o"></span> <span class="p">(</span>crit.t <span class="o">*</span> se2<span class="p">))</span>
Note that I didn’t compute the confidence interval last time out, but I’ll use it here to augment the plots of the trend produced later.
First derivatives and finite differences
One measure of change in a system is that the rate of change of the system is nonzero. If we had a simple linear regression model for a trend, then the estimated rate of change would be ( ), the slope of the regression line. Then because we’re being all statistical we can ask questions such as whether the nonzero estimate we might obtain for the slope is distinguishable from zero given the uncertainty in the estimate. This slope or the estimate ( ) is the first derivative of the regression line. Technically, this is the instantaneous rate of change of the function that defines the line, an idea can be extended to any function, even one as potentially complex as a fitted spline function.
The problem we have though is that in general we don’t have an equation for the spline from which we can derive the derivatives. So how do we estimate the derivative of a spline function in our additive model? One solution is to use the method of finite differences, the essentials of which are displayed in the plots below
If you have heard of the trapezoidal rule for approximating the integral of a function, finite differences will be very familiar; instead of being interested in the area under the function we’re interested in estimating the slope of the function at any point. Consider the situation in the left hand plot above. The thick curve is the fitted spline for the trend component of the additive model m2
. Superimposed on this curve are five points, between pairs of which we can approximate the first derivative of the function. We know the distance along the x axis (time) between each pair of points and we can evaluate the value of the function on the y axis by predicting from m2
for the times at the indicated time points. It is then trivial to compute the slope ( m ) of the lines connecting the points as the change in the y direction divided by the change in the x direction, or
[ m = ]
As you can see from the left hand plot, in some parts of the function the derivative is reasonably well approximated by this crude method using just five points, but in most places this approach either under or over estimates the derivative of the function. The solution to this problem is to evaluate the slope at more points that are located closer together on the function. The right hand plot above shows the finite difference method for 20 points, which for all intents and purposes produces an acceptable estimate of the first derivative of the function. We can increase the accuracy of our derivatives estimates by using points that are closer and closer together. At some point the points would be infinitely close and we’d know the first derivative exactly, except with a computer we can’t get to that point using the finite difference method.
To sum up, we can approximate the first derivative of a fitted spline by choosing a set of points (p) on the function and another set of points (p ) positioned a very small distance (say +10^{5}) from the first set. Using the predict()
method with type = “lpmatrix”
plus some other mgcv magic we can evaluate the fitted trend spline at the locations (p) and (p ) and compute the change in the function between the pairs of points and voilà we have our estimate of the first derivative of the fitted spline.
Confidence intervals on derivatives…?
Earlier, when discussing the simple linear regression line, I touched on the issue of error or uncertainty in the estimate of the slope parameter () and how we allow for this in deciding whether the estimated rate of change was different from zero (technically: whether zero was a likely value for the slope given the model). The same issue concerns us with the first derivatives of the fitted spline; we still want to know where on the function the function is changing sufficiently that we can distinguish this change from nochange given our uncertainty knowledge of the function.
Thankfully, with a little bit of magic from mgcv we can compute the uncertainty in the estimates of the first derivative of the spline. I’ve encapsulated this in some functions that are currently only available as a gist on Github. I intend to package these up at some point with other functions for working with GAM(M) time series models. I’m also grateful to Simon Wood, author of mgcv, who provided the initial code to compute the derivatives and which I have generalized somewhat to the user to specify particular terms and to identify the correct terms from the model without having to count columns in the smoother prediction matrix.
To load the functions into R from Github use
<span class="o">></span> <span class="c1">## download the derivatives gist</span>
<span class="o">></span> tmpf <span class="o"><</span> tempfile<span class="p">()</span>
<span class="o">></span> download.file<span class="p">(</span><span class="s">"https://gist.github.com/gavinsimpson/e73f011fdaaab4bb5a30/raw/82118ee30c9ef1254795d2ec6d356a664cc138ab/Deriv.R"</span><span class="p">,</span>
<span class="o">+</span> tmpf<span class="p">,</span> method <span class="o">=</span> <span class="s">"wget"</span><span class="p">)</span>
<span class="o">></span> source<span class="p">(</span>tmpf<span class="p">)</span>
<span class="o">></span> ls<span class="p">()</span>
[1] "annCET" "bs" "cet" "CET"
[5] "coefs" "confint.Deriv" "crit.t" "ctrl"
[9] "dat" "Deriv" "df.res" "fx"
[13] "knots" "m" "m1" "m2"
[17] "m3" "op" "p" "p1"
[21] "p2" "p3" "pdat" "plot.Deriv"
[25] "res" "rn" "signifD" "spl"
[29] "take" "take2" "tmpf" "want"
[33] "Years" "ylab" "ylim"
I don’t intend to explain all the code behind those functions, but the salient parts of the derivative computation are
X0 <span class="o"><</span> predict<span class="p">(</span>mod<span class="p">,</span> newDF<span class="p">,</span> type <span class="o">=</span> <span class="s">"lpmatrix"</span><span class="p">)</span>
newD <span class="o"><</span> newD <span class="o">+</span> eps
X1 <span class="o"><</span> predict<span class="p">(</span>mod<span class="p">,</span> newDF<span class="p">,</span> type <span class="o">=</span> <span class="s">"lpmatrix"</span><span class="p">)</span>
Xp <span class="o"><</span> <span class="p">(</span>X1 <span class="o"></span> X0<span class="p">)</span> <span class="o">/</span> eps
Here, newDF
is a data frame of points along x at which we wish to evaluate the derivative and eps
is the distance along x we nudge the points to give us locations (p ). type = “lpmatrix”
forces predict()
to return a matrix which, when multiplied by the vector of model coefficients, returns values fror the linear predictor of the model. The useful thing about this representation is that we can derive information on the fits or confidence intervals on quantities derived from the fitted model. Here we subtract one lpmatrix
from another and divide these values by eps
and proceed to do inference on those “slopes” directly.
The other critical bit of code is
<span class="kr">for</span><span class="p">(</span>i <span class="kr">in</span> seq_len<span class="p">(</span>nt<span class="p">))</span> <span class="p">{</span>
Xi <span class="o"><</span> Xp <span class="o">*</span> <span class="m">0</span>
want <span class="o"><</span> grep<span class="p">(</span>t.labs<span class="p">[</span>i<span class="p">],</span> colnames<span class="p">(</span>X1<span class="p">))</span>
Xi<span class="p">[,</span> want<span class="p">]</span> <span class="o"><</span> Xp<span class="p">[,</span> want<span class="p">]</span>
df <span class="o"><</span> Xi <span class="o">%*%</span> coef<span class="p">(</span>mod<span class="p">)</span> <span class="c1"># derivatives</span>
df.sd <span class="o"><</span> rowSums<span class="p">(</span>Xi <span class="o">%*%</span> mod<span class="o">$</span>Vp <span class="o">*</span> Xi<span class="p">)</span><span class="o">^</span><span class="m">.5</span>
lD<span class="p">[[</span>i<span class="p">]]</span> <span class="o"><</span> list<span class="p">(</span>deriv <span class="o">=</span> df<span class="p">,</span> se.deriv <span class="o">=</span> df.sd<span class="p">)</span>
<span class="p">}</span>
This is where we compute the actual derivative and their standard errors. This is done in a loop as we have to work separately with the columns of the lpmatrix
that relate to each spline. The first three lines of code within the loop are housekeeping that handle this part, with Xi
being the matrix into which we insert the relevant values from the lpmatrix
for the current term and contains zeroes elsewhere. Xi %*% coef(mod)
gives us values of the derivative by a matrix multiplication with the model coefficients. Because irrelevant columns are 0 we don’t need to subset the coefficient vector, but we could make this more efficient by avoiding the copies of Xp
by doing some further subsetting steps if we wished to.
The variances for the terms relating to current spline are computed using Xi %*% mod\(Vp * Xi
where mod\)Vp
is the covariance matrix of the (fixed effects) parameters of the GAM(M). These are summed and their square root take to give the standard error for the entire spline and not for each of the basis functions that comprise the spline. The last line of the loop just stores the computed derivatives and standard errors in a list that will form part of the object returned by the Deriv()
function.
With that, most of the hard work is done. The confint.Deriv()
method will compute confidence intervals for the derivatives from their standard errors and some information on the fitted GAM(M), the coverage of these intervals is given by ( 1 – ) with ( ) commonly set to 0.05, but this can be controlled via argument alpha
. It is worth noting that these confidence intervals are of the usual pointwise flavour; they would be OK if you only looked at the interval for a single point in isolation but they can’t be correct when we look at the entire spline. The reason they can’t be correct for the entire spline is related to the issue of multiple comparisons and in effect, by looking at the entire spline we are making a lot of multiple comparisons of tests. We could compute simultaneous intervals but I haven’t written the functions to do it just yet; it’s not difficult as we can simulate from the fitted model and derive simultaneous intervals that way. That will have to wait for another post at a future date though!
There are two further trivial functions contained in the gist:

signifD()
plows through the points at which we evaluated the derivative and looks for locations where the pointwise ( 1 – ) confidence interval doesn’t include zero. Where zero is contained within the confidence interval the function returns anNA
and for points where zero isn’t included the value of the function (or the derivative, depending on what you supplied tosignifD()
) is returned.The function gives you back a list with two components
incr
anddecr
which contain the locations where the estimated derivative is positive or negative, respectively, and zero is not contained in the confidence interval. As you’ll see soon, we can use these two separate lists to colour the increasing a decreasing parts of the fitted spline. 
plot.Deriv()
is an S3 method forplot()
which will plot the estimated derivatives and associated confidence intervals. You can do this for selected terms via argumentterm
. The confidence interval can be displayed as lines or a solid polygon via argumentpolygon
. Other graphical parameters can be passed along via…
.The one interesting argument is
sizer
, which if set toTRUE
will colour increasing parts of the spline in blue and in red for the decreasing parts (where zero is not contained in the confidence interval). These colours come from the siZer method.
Back to the CET example
With that taken care of, we can use the functions to compute derivatives, confidence intervals, and ancillary information for the trend spline in model m2
with a few simple lines of code
<span class="o">></span> Term <span class="o"><</span> <span class="s">"Time"</span>
<span class="o">></span> m2.d <span class="o"><</span> Deriv<span class="p">(</span>m2<span class="p">)</span>
<span class="o">></span> m2.dci <span class="o"><</span> confint<span class="p">(</span>m2.d<span class="p">,</span> term <span class="o">=</span> Term<span class="p">)</span>
<span class="o">></span> m2.dsig <span class="o"><</span> signifD<span class="p">(</span>pdat<span class="o">$</span>p2<span class="p">,</span> d <span class="o">=</span> m2.d<span class="p">[[</span>Term<span class="p">]]</span><span class="o">$</span>deriv<span class="p">,</span>
<span class="o">+</span> m2.dci<span class="p">[[</span>Term<span class="p">]]</span><span class="o">$</span>upper<span class="p">,</span> m2.dci<span class="p">[[</span>Term<span class="p">]]</span><span class="o">$</span>lower<span class="p">)</span>
Most of that is selfexplanatory, but the signifD()
call, in lieu of any documentation yet, deserves some explanation. The first argument is the the thing you want to return in the incr
and decr
components. In this case I use the contribution to the fitted values for the trend spline alone pdat$p2
. The argument d
is the vector of derivatives for a single term in the model. The two remaining arguments^{1} are upper
and lower
which need to be supplied with the upper and lower bounds on the confidence interval respectively. Extracting all of that is a bit redundant when we could pass in the object returned by confint.Deriv()
but I was thinking more simply when I wrote signifD()
in a few dark days of number crunching when people were waiting on me to deliver some results.
A quick plot of the first derivative of the spline is achieved using
From the plot it is clear that there are two periods of (statistically) significant change, both increases, as shown by the blue indicators. However, with a plot like the one above you really have to dial in your brain to thinking about an increasing (decreasing) trend in the data when the derivative line is above (below) the zero line, even if the derivative is decreasing (increasing). As such, I have found an alternative display that combines a plot of the estimated trend spline with periods of significant change indicated by a thicker line with or without the siZer colours. You can see examples of such plots in a previous post.
That post used the lattice and ggplot2 packages to draw the plots. I still find it easier to just bash out a base graphics plot for such things unless I need the faceting features offered by those packages. Below I take such an approach and build a base graphics plot up from a series of plotting calls that successively augment the plot, starting with the estimated trend spline and pointwise confidence interval, then two calls to superimpose the periods of significant increase and decrease (although the latter doesn’t actually draw anything in this plot)
When looking at a plot like this, it is always important to have in the back of your mind a picture of the original data and how variance is decomposed between the seasonal, trend and residual terms, as well as the effect size associated with the trend and seasonal splines. You get a very different impression of the magnitude of the trend from the plot shown below, which contains the original data as well as the fitted trend spline.
Yet, whilst small compared to the seasonal amplitude in temperature, the CET exhibits about a 1° increase in mean temperature over the past 150 or so years. Not something to dismiss as trivial.
Summing up
In this post I’ve attempted to explain the method of finite differences for estimating the derivatives of a function such as the splines of an additive model. This was illustrated using custom functions to extract information from the fitted model and make use of features of the mgcv package that allowed the estimation of standard errors for quantities derived from the fitted model, such as the first derivatives used in the post.
So far we’ve looked at fitting an additive model to seasonal data employing two smoothers in an additive model, the use of cyclic splines to represent cyclical features of the data, and estimation of an appropriate ARMA correlation structure to account for serial dependence in the data. Here we’ve seen how to identify periods of significant change in the trend term using the first derivative of the fitted spline. Where do we go from here?
One important improvement is to turn the pointwise confidence interval into a simultaneous one. There is one other feature potentially present in the data that we have thus far failed to address; we might reasonably expect that the seasonal pattern in temperature has changed with the increase in the level of the series over time. To fit a model that includes this possibility we’ll need to investigate smooth functions of two (or more) variables as well as look in more detail at how to build more complex GAM(M) models with mgcv. I’ll see about addressing these two issues in future posts.

There’s a further argument,
eval
, which contains the value you wish to test for inclusion in the coverage of the confidence interval. By default this is set to0
.↩
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.