# Modelling returns using PCA : Evidence from Indian equity market

**We think therefore we R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

__Methodology:__

- Companies that have been actively traded on NSE stock exchange for the past 10 years (
**218**companies) were selected and their daily stock returns data for these 10 years was taken from PROWESS. - Using PCA, first 10 components from the returns data of the 218 companies was extracted. More on PCA in my previous post, here.
- These components were then separately regressed first on NIFTY returns (first regression)
- Then these components were regressed on NIFTY returns, MIBOR rate changes, and INR/USD exchange rate changes (second regression).
- The explanatory power of the 2 regressions were compared using a F-statistic. (refer to pg. 10 in the paper attached in the end of the post)

__Findings and R codes:__

We start with calculating the PCA of the returns on the 218 companies daily return data, then employing the 2 regressions, then comparing the 2 regressions using a F-statistic. F-stat tells us if there is any additional explanation offered when we include macroeconomic variables (viz. MIBOR, INR/USD) in our equation.

The results that I obtained pose an interesting observation. We find that the F-stat is significant at 5% for 7 out of the 10 regressions, meaning that out of the 10 regressions (each regression with a separate component) we find *statistically significant addition in the explanatory power of the model after adding the macroeconomic variables*. Therefore, on **statistical ground **I can argue that a multi factor model (APT) is preferable over a single factor model (CAPM) for modelling stock returns in the case of Indian equity market. This assertion, if holds true, can have reaching implications for asset pricing for Indian securities. Let me explain why. The principal components (that are the dependent variables in the model) are essentially the common factor across all the companies stock returns with the idiosyncratic effects discounted, so any variables that explains this common component would be the systematic risk (think why!). Now we can relate it to the debate between the CAPM and APT guys. If the CAPM guys were correct, I would obtain no additional explanation in my model after adding the macroeconomic variables i.e their assertion that the market risk (market beta) capture the entire systematic risk holds true.

The results, however, suggest that in **7 out of 10** regressions there is statistically additional explanation offered by the macroeconomic variables. Well, so we can out-rightly reject the applicability (of the much prevalent) CAPM in the case of Indian equities. Or is there something amiss? Now if I closely look at the absolute increase in the explanatory power by looking at the Adjusted-R-squared values before and after the addition of the macroeconomic variables, the absolute increase in all the cases is < 1% (refer to pg. 11 in the paper at the end of this post). Therefore, *although we obtain statistical efficiency after the addition of the variables, the economic efficiency (intuition) is called to question*. Is it worth while to complicate our model with additional macroeconomic variables, when we can simply have the market rate used as a reasonable proxy for all the variables? And all this just to prove a point that we have macro-variables that can provide 0.5% additional explanation in our model? This takes us back to the eternal debate of statistical vs economic efficiency, what is more important? Is the above result robust enough (on economic intuition) to question the much used, simple and powerful CAPM? Is there a threshold even in statistical efficiency to ensure economic efficiency? These are some questions that still linger on in my mind.

If we view the above result with this caveat of economic efficiency then there is reason for us to believe that a single factor model would be a preferable way to model stock returns. There are, however, evidences in the literature to suggest that multi factor (APT) is a superior way of modelling returns, but the identification of these “multi factors” remains a contentious issue among the researchers. In some desperate attempts to refute CAPM, researcher extracted principal components from a number of macroeconomic variables as the input to the PCA. This resulted in factors that had no economic intuition at all, that were then used as independent variables in explaining the returns. The APT (Arbitrage pricing theory) is a ‘theory’, whereas CAPM is a ‘model’ that approximates reality. So even if in reality there are multiple factors that give rise to the returns signals as we see them, the identification of these factors is not a trivial exercise as we have seen above. Statistically we managed to overturn the CAPM in the context of Indian equity markets but in term of economic intuition the results do not seem to be that promising. Therefore, the above exercise tells us exactly why people still stick to the evergreen CAPM as an asset pricing model.

In case you wish to replicate the exercise the data can be obtained from here: Returns_CNX_500, Nifty_returns, MIBOR, Exchange_rates.

Here is the full text of my paper. Feedback are welcome.

**leave a comment**for the author, please follow the link and comment on their blog:

**We think therefore we R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.