**We think therefore we R**, and kindly contributed to R-bloggers)

*everyone*to come in with their respective expectations and allows them to interact and exchange securities. I emphasize on

*everyone*because this everyone includes a auto-rickshaw driver, a clerk and also sophisticated econometricians and analysts. An obvious point then is that if your expectations are consistently correct, i.e you can predict the price movements before it happens on the exchange, you are a rich man. Assuming for all practical purposes that there is no

*oracle*in our universe, who can do these predictions with 100% accuracy, the job of this prediction rests upon an econometrician/statistician. Lets see if they can do a good job too.

I took the stock returns data for INFOSYS (INFY on NSE) for the past one year and tried to see if I could make this data confess its underlying linear/non-linear generating process. I started by employing a rather simple, straight forward and easy to interpret **Runs test. **Its a non-parametric statistical test that will test the null hypothesis of whether the underlying series is identical and independent distributed. For those who are not too familiar with statistical parlance, non-parametric in simple term means that we have to make no assumptions about what the underlying data should be like. There is a huge surge in the applications of non-parametric statistics to explain various processes, this is because the biggest deterrence to conducting these kinds of tests, i.e the computational issues, are no longer a problem in this generation of rapid computation. The idea of empirical analysis is about trying to theorize a null hypothesis and then try your best to bring it down using empirical evidence. (analogous to Karl Popper’s idea of falsification of a theory, you hang on to a theory so long as it has not betrayed you yet)

## Doing runs test on INFY daily returns

> infy <- read.csv(“01-10-2010-TO-01-10-2011INFYEQN.csv”) ## Reading the stock price data

> infy_ret <- 100*diff(log(infy[,2])) ## Since the second column in the data has the stock prices I have used [log(Pt) – log(Pt-1)]*100 as the returns.

> runs.test(factor(infy_ret > 0)) ## what this has done is that it has created a category variable that takes value 1 of infy_ret > 0 and 0 otherwise.

What this does is that tells me whether the runs of the returns are predictable, i.e say if I represent possitive return by + and negative return by – then my series of returns would probably look like +,+,-, +, -, -, -, +, …

now that this test check is can I predict whether the next day will have + or –

Output:

Runs Test

data: factor(infy_ret > 0)

Standard Normal = 0.1308, p-value = 0.8959 ## High p-value means you cannot trash your null hypothesis.

For those not familiar with statistics, the p-value is nothing but the probability of you reject a null hypothesis when it is actually true. So in simple words it gives me the probability that I might end up rejecting a correct null hypothesis. (be very careful with the interpretation of p-value, many times people end up misunderstanding it, many a times even I have fallen prey to this). Therefore you cannot reject your null hypothesis under such a high probability of committing this error or wrongly rejecting a correct hypothesis , you just don’t have enough evidence. Therefore your series is a random walk (you can understand this in the literal English language sense, but the definition is not so trivial in time series parlance).

P.S In case you want to replicate this exercise the data can be obtained from here.

**leave a comment**for the author, please follow the link and comment on their blog:

**We think therefore we R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...