I took the stock returns data for INFOSYS (INFY on NSE) for the past one year and tried to see if I could make this data confess its underlying linear/non-linear generating process. I started by employing a rather simple, straight forward and easy to interpret Runs test. Its a non-parametric statistical test that will test the null hypothesis of whether the underlying series is identical and independent distributed. For those who are not too familiar with statistical parlance, non-parametric in simple term means that we have to make no assumptions about what the underlying data should be like. There is a huge surge in the applications of non-parametric statistics to explain various processes, this is because the biggest deterrence to conducting these kinds of tests, i.e the computational issues, are no longer a problem in this generation of rapid computation. The idea of empirical analysis is about trying to theorize a null hypothesis and then try your best to bring it down using empirical evidence. (analogous to Karl Popper’s idea of falsification of a theory, you hang on to a theory so long as it has not betrayed you yet)
## Doing runs test on INFY daily returns
> infy <- read.csv("01-10-2010-TO-01-10-2011INFYEQN.csv") ## Reading the stock price data
> infy_ret <- 100*diff(log(infy[,2])) ## Since the second column in the data has the stock prices I have used [log(Pt) – log(Pt-1)]*100 as the returns.
> runs.test(factor(infy_ret > 0)) ## what this has done is that it has created a category variable that takes value 1 of infy_ret > 0 and 0 otherwise.
What this does is that tells me whether the runs of the returns are predictable, i.e say if I represent possitive return by + and negative return by – then my series of returns would probably look like +,+,-, +, -, -, -, +, …
now that this test check is can I predict whether the next day will have + or –
data: factor(infy_ret > 0)
Standard Normal = 0.1308, p-value = 0.8959 ## High p-value means you cannot trash your null hypothesis.
For those not familiar with statistics, the p-value is nothing but the probability of you reject a null hypothesis when it is actually true. So in simple words it gives me the probability that I might end up rejecting a correct null hypothesis. (be very careful with the interpretation of p-value, many times people end up misunderstanding it, many a times even I have fallen prey to this). Therefore you cannot reject your null hypothesis under such a high probability of committing this error or wrongly rejecting a correct hypothesis , you just don’t have enough evidence. Therefore your series is a random walk (you can understand this in the literal English language sense, but the definition is not so trivial in time series parlance).
P.S In case you want to replicate this exercise the data can be obtained from here.