# The spectrum of my personal accountancy

**PabRod - R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Since year 2012, I keep and maintain a *.csv* file containing
information about my bank accounts. The most relevant data is the time
series of the total amount of money vs. date. The table below shows a
small subset of this data:

Date | Total |
---|---|

2012-09-26 | 641.52 |

2012-09-24 | 703.52 |

2012-09-22 | 723.52 |

2012-09-19 | 755.52 |

2012-09-18 | 765.52 |

When plotted, the data looks as noisy as the example on the figure below. The most noticeable regularity is the periodic increment at the end of each month, corresponding with the **pay day**… followed by a big expenditure corresponding to paying the **rent**. Notice also the higher pay in late June, corresponding to the summer extra salary that is customarily applied in several countries. The timeseries looks irregular, but there is some clear periodical component on it.

I know that the Fourier transform of a continuous timeseries is the perfect tool for spotting underlying periodicities. But my timeseries is uneven and discontinous. How can we fix this?

## First step: Interpolation

The first problem, unevenness of data, can be solved by good old interpolation methods. The only tricky part here is that our *x* coordinates are dates, so it is advisable to use *lubridate*.

I used the following:

library(dplyr) library(lubridate) first_day <- min(df$Date) last_day <- max(df$Date) # Add a column with relative day as an integer df <- mutate(df, Day_index = as.numeric(df$Date - first_day)) # Create the time vector used for interpolation all_days <- seq(first_day, last_day, by = 1) all_indices <- seq(1, length(all_days), by = 1) daily_total <- approx(x = df$Day_index, y = df$Total, xout = all_indices, yright = head(df$Total, 1))

## Second step: Fourier transform

For discrete, evenly spaced timeseries as the one we have after interpolation, fast Fourier transform can be used to estimate dominant frequencies. The fast Fourier transform or *FFT* is defined as a function of an integer parameter *k* and a set of *N* complex numbers {*x*_{0}, *x*_{1}, ...*x*_{N − 1}}.

The definition goes as follows:

$$ X_k = FFT [ k; \lbrace x_n \rbrace] \equiv \sum_{n=0}^{N-1} x_n\cdot e^{-i 2 \pi k \frac{n}{N}} $$

and the inverse:

$$ x_n = FFT^{-1} [ n; \lbrace X_k \rbrace] \equiv \frac{1}{N}\sum_{k=0}^{N-1} X_k \cdot e^{i 2 \pi k \frac{n}{N}} $$

The *FFT* can be used to estimate the spectrum of a signal. While doing this, it is important to note that *n* and *k* are unit-less vector indices. In order to link those indices with physical quantities as time and frequency, we could notice that:

$$ x_n = x(n \cdot \Delta t) \equiv x(t_n) $$

where *Δ**t* is the sampling period. If we take a number *N* of samples, the final sample time should be:

$$ t_{end} = N \cdot \Delta t \equiv t_N $$

Going to our first equation with all this in mind, we have:

$$ X_k \equiv \\ \sum_{n=0}^{N-1} x_n \cdot e^{-i 2 \pi k \frac{n}{N}} = \\ \sum_{n=0}^{N-1} x(n \cdot \Delta t) \cdot e^{-i 2 \pi k \frac{n \cdot \Delta t}{t_{end}}} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i 2 \pi k \frac{t_n}{t_{end}}} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i \frac{2 \pi k}{t_{end}} t_n} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i \omega_k t_n} $$

so, the link between the indices *k* and the corresponding physical frequencies *ω*_{k} is given by:

$$ \omega_k = \frac{2 \pi}{t_{end}} k $$

We just added units (and thus, physical) significance to the indexes *k* and *n*!

The (quite primitive) code I used for doing this is:

fft_spectrum <- function(ts, ys, as.freq = TRUE) { # Translate k indices into frequencies tEnd <- tail(ts, 1) fs <- seq(0, length(ys)-1) / tEnd # In T^-1 units (typically Hz, if the time is in seconds) #Perform fast Fourier transform Xk <- Mod(fft(ys)) if(as.freq) { # Use frequencies list(f = fs, Xk = Xk) } else { # Use periods list(T = 1/fs, Xk = Xk) } }

## Results

After applying the two algorithms described above I find the following distribution of the relative lengths of each period:

Note that the strengths corresponding to periods of *30* and *31* days are higher than their neighbors, corresponding to pay-days and monthly payments. Even my very basic algorithm could see it!

*PS: I removed several details, plus trimmed both vertical and horizontal scales. I did this in order to keep my privacy. After all, the FFT is an invertible operation!*

This entry appears in R-bloggers.com

**leave a comment**for the author, please follow the link and comment on their blog:

**PabRod - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.