Testing the Correlation between Time Series Variables

Posted on March 17, 2020 by Selcuk Disci in R bloggers | 0 Comments

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous article, we examined trends and seasonality in gasoline prices in Turkey. This time we will examine whether the gasoline prices are related to the variables that are thought to affect gasoline prices the most by the Turkish people. One of the variables is the Brent crude oil prices that are averaged monthly in dollars; the other is the dollar exchange rate in Turkish currency (TL) that are averaged per month as well. These variables will be shown brent and dollar respectively in the dataset below. The range of the dataset is between 2013 and 2020 as the previous article.

head(df)
#        date gasoline  brent dollar
#1 2013-01-01     4.67 115.55 1.7589
#2 2013-02-01     4.85 111.38 1.7985
#3 2013-03-01     4.75 110.02 1.8090
#4 2013-04-01     4.61 102.37 1.7930
#5 2013-05-01     4.64 100.39 1.8756
#6 2013-06-01     4.72 102.16 1.9288

The T-test is used to examine whether the population correlation coefficient is zero or not. The pre-acceptance is that the sample is normally distributed. This pre-acceptance is violated in some situations, in those cases, an alternative non-parametric test is needed. The Spearman’s rank correlation test takes over here; because profit or price data generally do not show normal distribution. Therefore, it is not appropriate to use the Pearson correlation coefficient test in our dataset.

Spearman’s rank correlation test consider ranking while it measures the correlation between two variables. The value is as between +1 and -1 as is the Pearson correlation coefficient $\rho_s$ . Two-way hypothesis test is described as:

$H_0: \rho_s=0$

$H_A: \rho_s \ne 0$

First of all, the sample spearman rank correlation coeffficient $r_s$ is calculated to execute the test; this happens in a couple of steps.

Gasoline prices are ranked from small to big; in the case of equality, the ranking of equal observations are averaged and the ranking continues from where it left off. The same process is executed for Brent prices.

library(dplyr)

df_spearman<- df %>% mutate(
  rank_gasoline=rank(gasoline),
  rank_brent=rank(brent),
  d=rank_gasoline-rank_brent,
  d_square=d^2) %>% 
  select(-dollar)

head(df_spearman)

#        date gasoline  brent rank_gasoline rank_brent   d d_square
#1 2013-01-01     4.67 115.55            23         84 -61     3721
#2 2013-02-01     4.85 111.38            34         81 -47     2209
#3 2013-03-01     4.75 110.02            27         79 -52     2704
#4 2013-04-01     4.61 102.37            20         67 -47     2209
#5 2013-05-01     4.64 100.39            21         65 -44     1936
#6 2013-06-01     4.72 102.16            26         66 -40     1600

The difference between the rankings of each binary observation is calculated as $\Sigma d_i=0$ .

sum(df_spearman$d)
#[1] 0

Later, the squares of the difference are summed. $\Sigma d_i^2=69107$

d_square_sum <- sum(df_spearman$d_square) 

d_square_sum
#[1] 69107

Spearman rank correlation coefficient $r_s$ , is formulated as:

$r_s=1- \frac {6\Sigma d_i^2} {n(n^2-1)}$

n <- nrow(df)
rho_s <- (1-(6*(sum(d_square_sum)))/(n*(n^2-1))) %>% round(2)

rho_s
#[1] 0.3

This result shows us that there is a positive and weak relation between gasoline and brent prices. Let’s examine this result is at a significance level of %5 and find if the alternative hypothesis is true.

The point we have to look at is highlighted in the chart above for $\alpha=0.05$ and n=84; because of the ${r_s}=0.3 \geq 0.215$ the null hypothesis ( $H_0: \rho_s=0$ ) is rejected and at the %5 significance level, we can say that although it is weak there is a positive relation between gasoline and brent prices.

Let’s check the results with another way by calling the function ggscatter.

library("ggpubr")

ggscatter(df, x = "brent", y = "gasoline",
          color = "blue", cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "Brent (TL)", ylab = "Gasoline (TL)")

As we can see in the chart above, spearman’s ranked correlation coefficient (R=0.3) is the same we found before; and p-value (0.0055) less than 0.05 significance level which means the alternative hypothesis is true ( $H_A: \rho_s \ne 0$ ).

Finally, we will examine the relation between gasoline and dollar (USD/TRY)

ggscatter(df, x = "dollar", y = "gasoline",
          color = "red", cor.coef = TRUE, 
          cor.method = "spearman",
          xlab = "USD/TRY", ylab = "Gasoline (TL)")

The graphic above appears to have a strong positive relationship between gasoline and the dollar. P-value value less than 0.05 indicates that the result is significant and one more time null hypothesis is rejected.

References

Sanjiv Jaggia, Alison Kelly (2013). Business Intelligence: Communicating with Numbers.

STHDA: Correlation Test Between Two Variables in R

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Testing the Correlation between Time Series Variables

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)