The spectrum of my personal accountancy

[This article was first published on PabRod - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Since year 2012, I keep and maintain a .csv file containing information about my bank accounts. The most relevant data is the time series of the total amount of money vs. date. The table below shows a small subset of this data:

Date Total
2012-09-26 641.52
2012-09-24 703.52
2012-09-22 723.52
2012-09-19 755.52
2012-09-18 765.52

When plotted, the data looks as noisy as the example on the figure below. The most noticeable regularity is the periodic increment at the end of each month, corresponding with the pay day… followed by a big expenditure corresponding to paying the rent. Notice also the higher pay in late June, corresponding to the summer extra salary that is customarily applied in several countries. The timeseries looks irregular, but there is some clear periodical component on it.

a

I know that the Fourier transform of a continuous timeseries is the perfect tool for spotting underlying periodicities. But my timeseries is uneven and discontinous. How can we fix this?

First step: Interpolation

The first problem, unevenness of data, can be solved by good old interpolation methods. The only tricky part here is that our x coordinates are dates, so it is advisable to use lubridate.

I used the following:

library(dplyr)
library(lubridate)

first_day <- min(df$Date)
last_day <- max(df$Date)

# Add a column with relative day as an integer
df <- mutate(df, Day_index = as.numeric(df$Date - first_day))

# Create the time vector used for interpolation
all_days <- seq(first_day, last_day, by = 1)
all_indices <- seq(1, length(all_days), by = 1)

daily_total <- approx(x = df$Day_index, y = df$Total, xout = all_indices, yright = head(df$Total, 1))

Second step: Fourier transform

For discrete, evenly spaced timeseries as the one we have after interpolation, fast Fourier transform can be used to estimate dominant frequencies. The fast Fourier transform or FFT is defined as a function of an integer parameter k and a set of N complex numbers {x0, x1, ...xN − 1}.

The definition goes as follows:

$$ X_k = FFT [ k; \lbrace x_n \rbrace] \equiv \sum_{n=0}^{N-1} x_n\cdot e^{-i 2 \pi k \frac{n}{N}} $$

and the inverse:

$$ x_n = FFT^{-1} [ n; \lbrace X_k \rbrace] \equiv \frac{1}{N}\sum_{k=0}^{N-1} X_k \cdot e^{i 2 \pi k \frac{n}{N}} $$

The FFT can be used to estimate the spectrum of a signal. While doing this, it is important to note that n and k are unit-less vector indices. In order to link those indices with physical quantities as time and frequency, we could notice that:

$$ x_n = x(n \cdot \Delta t) \equiv x(t_n) $$

where Δt is the sampling period. If we take a number N of samples, the final sample time should be:

$$ t_{end} = N \cdot \Delta t \equiv t_N $$

Going to our first equation with all this in mind, we have:

$$ X_k \equiv \\ \sum_{n=0}^{N-1} x_n \cdot e^{-i 2 \pi k \frac{n}{N}} = \\ \sum_{n=0}^{N-1} x(n \cdot \Delta t) \cdot e^{-i 2 \pi k \frac{n \cdot \Delta t}{t_{end}}} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i 2 \pi k \frac{t_n}{t_{end}}} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i \frac{2 \pi k}{t_{end}} t_n} = \\ \sum_{t_n=0}^{t_{end}-\Delta t} x(t_n) \cdot e^{-i \omega_k t_n} $$

so, the link between the indices k and the corresponding physical frequencies ωk is given by:

$$ \omega_k = \frac{2 \pi}{t_{end}} k $$

We just added units (and thus, physical) significance to the indexes k and n!

The (quite primitive) code I used for doing this is:

    fft_spectrum <- function(ts, ys, as.freq = TRUE) {
      # Translate k indices into frequencies
      tEnd <- tail(ts, 1)
      fs <- seq(0, length(ys)-1) / tEnd # In T^-1 units (typically Hz, if the time is in seconds)

      #Perform fast Fourier transform
      Xk <- Mod(fft(ys))

      if(as.freq) { # Use frequencies
        list(f = fs, Xk = Xk)
      } else { # Use periods
        list(T = 1/fs, Xk = Xk)
      }
    }

Results

After applying the two algorithms described above I find the following distribution of the relative lengths of each period:

b

Note that the strengths corresponding to periods of 30 and 31 days are higher than their neighbors, corresponding to pay-days and monthly payments. Even my very basic algorithm could see it!

PS: I removed several details, plus trimmed both vertical and horizontal scales. I did this in order to keep my privacy. After all, the FFT is an invertible operation!

This entry appears in R-bloggers.com

To leave a comment for the author, please follow the link and comment on their blog: PabRod - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)