**me nugget**, and kindly contributed to R-bloggers)

In the natural sciences, it is common to have incomplete or unevenly sampled time series for a given variable. Determining cycles in such series is not directly possible with methods such as Fast Fourier Transform (FFT) and may require some degree of interpolation to fill in gaps. An alternative is the Lomb-Scargle method (or least-squares spectral analysis, LSSA), which estimates a frequency spectrum based on a least squares fit of sinusoid.

The above figure shows a Lomb-Scargle periodogram of a time series of sunspot activity (1749-1997) with 50% of monthly values missing. As expected (link1, link2), the periodogram displays a a highly significant maximum peak at a frequency of ~11 years.

The function comes from a nice set of functions that I found here: http://research.stowers-institute.org/efg/2005/LombScargle/R/index.htm. An accompanying paper focusing on its application to time series of gene expression can be found here.

Below is a comparison to an FFT of the full time series. For another great resource on spectral analysis, and time series-related R methods in general, see the following website: http://zoonek2.free.fr/UNIX/48_R/15.html.

#Required function # from: http://research.stowers-institute.org/efg/2005/LombScargle/R/index.htm

source("LombScargle.R")

source("R-Mnemonics.R")

#Required data

data(sunspot.month) # Historical sunspots, class(sunspot.month) = "ts"

plot(sunspot.month)

#Create vector version of data

nspot <- as.vector(sunspot.month)

time <- seq(1749+0.5/12, 1997+11.5/12, by=1/12)

plot(nspot ~ time, t="l") # same as above

#make gaps in data to create "observed" data series

set.seed(1)

obs <- sample(seq(nspot), 0.5*length(nspot)) # 50% gaps

nspot.obs <- nspot[obs]

time.obs <- time[obs]

plot(nspot.obs ~ time.obs, pch=".", cex=2)

# Create test frequencies corresponding to

# periods from 1 to 50 years

M <- 4*length(time.obs) # often 2 or 4

TestFrequencies <- (1/50) + (1 - 1/50) * (1:M / M)

# Use Horne & Baliunas' estimate of independent frequencies

Nindependent <- NHorneBaliunas(length(nspot.obs))

#Compute

RES <- ComputeAndPlotLombScargle(time.obs, nspot.obs,

TestFrequencies, Nindependent,

"Sunspots")

#Output consists of a single figure with four plots showing:

#1. time series. The peak of the loess-smoothed curve (dotted line) is used for phase.

#2. Histogram of time interval variability

#3. Lomb-Scargle Periodogram with peak corresponding to a 11 year period #RES$PeakPeriod

#4. Peak Significance Curve: p-value of Lomb-Scargle peak is quite significant: 1 - (1-exp(-RES$SpectralPowerDensity[which(1/RES$Frequency == RES$PeakPeriod)]))^Nindependent

#Alternate plots

p01 <- -log(1-((1-0.01)^(1/Nindependent))) #P=0.01

plot(RES$Frequency, RES$SpectralPowerDensity, t="n", xlab="Frequency [1/years]", ylab="Normalized Power Spectral Density")

abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)

abline(v=seq(par()$xaxp[1], par()$xaxp[2],, par()$xaxp[3]+1), col=8, lty=3)

lines(RES$Frequency, RES$SpectralPowerDensity, lwd=1)

mtext("Lomb-Scargle Periodogram", side=3, line=1)

abline(h=p01, col=2, lty=2)

text(0.9, p01, label="p = 0.01", pos=3, col=2)

plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density")

abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)

abline(v=c(1,2,5,10,20,50), col=8, lty=3)

lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)

mtext("Lomb-Scargle Periodogram", side=3, line=1)

abline(h=p01, col=2, lty=2)

text(2, p01, label="p = 0.01", pos=3, col=2)

png("GappySunspots_LombScargle.png", width=5, height=7, units="in", res=400)

#x11(width=5, height=7)

op <- par(mfcol=c(2,1), mar=c(5,5,2,1), ps=10)

plot(nspot.obs ~ time.obs, t="n", xlab="", ylab=expression(paste("No. sunspots [", "month"^-1, "]")))

grid()

points(nspot.obs ~ time.obs, pch=".", cex=2)

mtext("50% missing values", side=3, line=0.5)

plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density")

abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)

abline(v=c(1,2,5,10,20,50), col=8, lty=3)

lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)

mtext("Lomb-Scargle Periodogram", side=3, line=0.5)

abline(h=p01, col=2, lty=2)

text(2, p01, label="p = 0.01", pos=3, col=2)

par(op)

dev.off()

##############################

###Compare to other methods###

##############################

plot(sunspots)

spectrum(sunspot.month)

abline(v=1:10, lty=3)

z <- spectrum(sunspots, spans=2) #smoothed spectrum

png("LombScargle_vs_DFT.png", width=5, height=4, res=400, units="in")

XLIM <- c(2,20)

op <- par(mar=c(5,5,2,5), ps=10)

plot(1/RES$Frequency, RES$SpectralPowerDensity, type="n", log = "x", xlab="Frequency [years]", ylab="Normalized Power Spectral Density", xlim=XLIM)

abline(h=seq(par()$yaxp[1], par()$yaxp[2],, par()$yaxp[3]+1), col=8, lty=3)

abline(v=c(1,2,5,10,20,50), col=8, lty=3)

lines(1/RES$Frequency, RES$SpectralPowerDensity, lwd=1)

par(new=TRUE)

plot(1/z$freq, z$spec, t="l", col=3, log = "x", xlim=XLIM, xlab="", ylab="", xaxt="n", yaxt="n")

axis(4)

mtext("Spectral Density", line=3, side=4, col=3)

legend("topleft", legend=c("Lomb-Scargle", "Discrete Fourier\nTransform"), bg="white", col=c(1,3), lty=1)

par(op)

dev.off()

**leave a comment**for the author, please follow the link and comment on their blog:

**me nugget**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...