Modelling returns using PCA : Evidence from Indian equity market

December 26, 2011

(This article was first published on We think therefore we R, and kindly contributed to R-bloggers)

As my finance term paper, I investigated an interesting question where I tried to identify macroeconomic variables that explain the returns on equities. Much of the debate has already taken place on this topic which has given rise to two competing theories of asset pricing viz. CAPM (capital asset pricing theory or single factor model) and APT (arbitrage pricing theory or multi-factor model). Here is a brief discussion on the two in my previous post. In this post I would like to discuss my approach to answering this question in the context of Indian stock market.


  • Companies that have been actively traded on NSE stock exchange for the past 10 years (218 companies) were selected and their daily stock returns data for these 10 years was taken from PROWESS. 
  • Using PCA, first 10 components from the returns data of the 218 companies was extracted. More on PCA in my previous post, here
  • These components were then separately regressed first on NIFTY returns (first regression) 
  • Then these components were regressed on NIFTY returns, MIBOR rate changes, and INR/USD exchange rate changes (second regression).
  • The explanatory power of the 2 regressions were compared using a F-statistic. (refer to pg. 10 in the paper attached in the end of the post)

Findings and R codes:
We start with calculating the PCA of the returns on the 218 companies daily return data, then employing the 2 regressions, then comparing the 2 regressions using a F-statistic. F-stat tells us if there is any additional explanation offered when we include macroeconomic variables (viz. MIBOR, INR/USD) in our equation.

The results that I obtained pose an interesting observation. We find that the F-stat is significant at 5% for 7 out of the 10 regressions, meaning that out of the 10 regressions (each regression with a separate component) we find statistically significant addition in the explanatory power of the model after adding the macroeconomic variables. Therefore, on statistical ground I can argue that a multi factor model (APT) is preferable over a single factor model (CAPM) for modelling stock returns in the case of Indian equity market. This assertion, if holds true, can have reaching implications for asset pricing for Indian securities. Let me explain why. The principal components (that are the dependent variables in the model) are essentially the common factor across all the companies stock returns with the idiosyncratic effects discounted, so any variables that explains this common component would be the systematic risk (think why!). Now we can relate it to the debate between the CAPM and APT guys. If the CAPM guys were correct, I would obtain no additional explanation in my model after adding the macroeconomic variables i.e their assertion that the market risk (market beta) capture the entire systematic risk holds true.

The results, however, suggest that in 7 out of 10 regressions there is statistically additional explanation offered by the macroeconomic variables. Well, so we can out-rightly reject the applicability (of the much prevalent) CAPM in the case of Indian equities. Or is there something amiss? Now if I closely look at the absolute increase in the explanatory power by looking at the Adjusted-R-squared values before and after the addition of the macroeconomic variables, the absolute increase in all the cases is < 1% (refer to pg. 11 in the paper at the end of this post). Therefore, although we obtain statistical efficiency after the addition of the variables, the economic efficiency (intuition) is called to question. Is it worth while to complicate our model with additional macroeconomic variables, when we can simply have the market rate used as a reasonable proxy for all the variables? And all this just to prove a point that we have macro-variables that can provide 0.5% additional explanation in our model? This takes us back to the eternal debate of statistical vs economic efficiency, what is more important? Is the above result robust enough (on economic intuition) to question the much used, simple and powerful CAPM? Is there a threshold even in statistical efficiency to ensure economic efficiency? These are some questions that still linger on in my mind.

If we view the above result with this caveat of economic efficiency then there is reason for us to believe that a single factor model would be a preferable way to model stock returns. There are, however, evidences in the literature to suggest that multi factor (APT) is a superior way of modelling returns, but the identification of these “multi factors” remains a contentious issue among the researchers. In some desperate attempts to refute CAPM, researcher extracted principal components from a number of macroeconomic variables as the input to the PCA. This resulted in factors that had no economic intuition at all, that were then used as independent variables in explaining the returns. The APT (Arbitrage pricing theory) is a ‘theory’, whereas CAPM is a ‘model’ that approximates reality. So even if in reality there are multiple factors that give rise to the returns signals as we see them, the identification of these factors is not a trivial exercise as we have seen above. Statistically we managed to overturn the CAPM in the context of Indian equity markets but in term of economic intuition the results do not seem to be that promising. Therefore, the above exercise tells us exactly why people still stick to the evergreen CAPM as an asset pricing model.

In case you wish to replicate the exercise the data can be obtained from here: Returns_CNX_500Nifty_returnsMIBORExchange_rates.

Here is the full text of my paper. Feedback are welcome. 

To leave a comment for the author, please follow the link and comment on their blog: We think therefore we R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)