Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve spent a lot of time thinking about and using R’s spectrum() function and the Fast Fourier Transform (FFT) in the last 5+ years. Lately, they’ve begun to remind me a little of a Theremin: simple to use, difficult to master.

While prepping a figure for the R Graph Gallery (which is awesome, by the way), I ran across this curious example — its striking visual appearance was definitely not what I was expecting.

I decided to use the logistic map to generate an ensemble of time series with a range of frequencies. The logistic map goes through characteristic period doubling, and then exhibits broad-spectrum “noise” at the onset of chaos. And it’s really easy to compute. So, I can tune my time series to have a range of frequency distributions.

In this example, I’m in the periodic regime (r=3.5, period 4) . So why am I getting this Batman motif?

Actually, it’s weirdly simple. This is an artifact of the (default) tapering that R’s spectrum() function applies to the series before the FFT is computed. In theory, the Fourier Transform assumes an infinite-length sequence, while in practice the FFT assumes the series is circular — in essence, gluing the beginning and end of the series together to make a loop. If the series is not perfectly periodic, though, this gluing introduces a sharp discontinuity. Tapering is typically applied to reduce or control this discontinuity, so that both ends gently decline to zero. When the series is genuinely periodic, though, this does some weird effects. In essence, the taper transfers power from the fundamental frequencies in a curious way. Note that the fundamental period is 4, as seen by the peak at frequency 1/4, with a strong harmonic at period 2, frequency 1/2.

Zero-padding has a similar effect. If your series is zero (or relatively small) at both ends, then you could glue it together without introducing discontinuities, but peaks near the beginning and the end will still be seen as near by, resulting in spurious peaks. In any case, R pads your series by default (fast=T) unless the length of the series is highly composite (i.e. it can be factored into many small divisors). Below, it turns out that a series of length 1e5 is composite enough that R doesn’t pad it. 1e5? Totally different results.

If I understand this correctly, both padding and tapering are attempts to reduce or control spectral leakage. We’ve constructed an example where no spectral leakage is expected from the original time series, which closely fits the assumptions of the FFT without modification.

The moral? Know thy functions (and their default parameter values, and their effects)!

## using R --
## logistic map function:
logmap = function(n, r, x0=0.5) {
ret = rep(x0,n)
for (ii in 2:n) {
ret[ii] = r*ret[ii-1]*(1-ret[ii-1])
}
return(ret)
}

## compare:
spectrum(logmap(1e5, 3.5), main='R tapers to 0.1 by default')
spectrum(logmap(1e5, 3.5), taper=0, main='No taper')
spectrum(logmap(1e5+1, 3.5), taper=0, main='Non-composite length, fast=T pads by default')
spectrum(logmap(1e5, 3.5), taper=0.5, sub='Lots of padding approximately recovers "correct" results')


R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.