Direction of Change Forecasting using a Dynamic Binary Model

September 12, 2013

(This article was first published on unstarched» R, and kindly contributed to R-bloggers)

While it is generally accepted that the returns of financial assets are almost impossible to forecast with any degree of accuracy which would provide meaningful profit1 , there is evidence that the sign of the returns is much more forecastable. Theoretically, Christoffersen and Diebold (2006) have shown how the forecastability of the sign is related to the presence of heteroscedasticity, with a number of further papers investigating the dynamics of the sign including Lee and Yang (2006), Anatolyev and Gospodinov (2010) and NYberg (2011), among others. In a very nicely written paper, Kauppi and Saikkonen (2008) (KS2008 henceforth) put forward a parsimonious and feasible model for modelling a binary response by extending the standard probit/logit regression models to include autoregressive dynamics in both the reponse variable and the motion dynamics. This model has been successfully applied by Nyberg (2010, 2011, 2012) in a number of forecast studies of the S&P 500 and the NBER recession turning points. This blog article discusses this model and its value in an empirical application for forecasting the S&P 500 monthly direction.

[This is an updated version of an earlier article which has been expanded].

Logistic Regression and Autoregressive Dynamics

Binary response models have a very rich history in the statistical research literature with diverse applications in many fields where the dependent variable takes on a dichotomous value. Early work on these models included the tetrachoric correlation analysis of Pearson (1900) and the standard biometric textbook of Finney (1971), with more recent advances dealing with such issues as full or partial separation in Zorn (2005), small sample bias reduction in Firth (1993) and Heinze (2002) and heteroscedasticity (see for instance Keele (2006)). In econometrics, Amemiya (1981) provided an early review of applications, but more recent research has focused on direction of change forecast for stock market returns and recession forecasting which is also the motivation of this article. Consider the stochastic process \( y_t \), which is binary valued, and the vector of \( k \) explanatory variables, \( x_{i,t} \) for \( i=1\ldots k \). Let \( \Im_t \) be the information set available at time \( t \), then \( y_t \) has a Bernoulli distribution with probability \( p_t \): \[ {y_t}\left| {{\Im _{t – 1}} \sim B\left( {{p_t}} \right)} \right. \] The objective is to model \( p_t \) through the CDF transformed dynamics of a linear process \( \pi_t \) such that \( {p_t} = \Phi \left( {{\pi _t}} \right) \). Formally, \[ {E_{t – 1}}\left( {{y_t}} \right) = {P_{t – 1}}\left( {y = 1} \right) = \Phi {\text{ }}\left( {{\pi _t}} \right) = {p_t} \] The CDF function, also called the link function, can be one of any number of distributions but has typically been either the Gaussian (probit model), Logistic (logit model) or some skewed variation (scobit  model) such as the Generalized Logistic distribution which nests the Logistic. In the model of KS2008, the dynamics of \( \pi_t \) take on the following form: \[ {\pi _t} = \omega + \sum\limits_{i = 1}^k {{\beta _i}{x_{i,t – {l_i}}}} + \sum\limits_{i = 1}^q {{\delta _i}{y_{t – i}}} + \sum\limits_{i = 1}^p {{\alpha _i}{\pi _{t – i}}} \] where \( \delta_i \) represents the coefficient on the \( q \) autoregressive terms of the binary variable \( y_t \), \( \beta_i \) the coefficient on the \( i^{th} \)(of \( k \)) explanatory variable \( x_t \) with lag \( l_i \) and \( \alpha_i \) the coefficient on the \( p \) autoregressive terms of the dynamics \( \pi_t \). The specification without the latter term has already been examined elsewhere, and some results with regards to its asymptotic properties can be found in Jong (2011). Related literature on general binomial ARMA type models can be found, among others, in Al-Osh (1991) and more recently, with financial/econometric applications, in Rydberg (2003) and Startz (2008). Nyberg (2011) introduced a restriction to the dynamics equation by setting \( \delta_1=1-\alpha_1 \), leading to a type of error correction model with strong persistence in the autoregressive parameter \( \alpha_1 \) usually observed. The log-likelihood \( {l_t} \) takes the following form: \[ l\left( \theta \right) = \sum\limits_{t = 1}^T {\left[ {{y_t}\log \Phi \left( {{\pi _t}\left( \theta \right)} \right) + \left( {1 – {y_t}} \right)\log \left( {1 – \Phi \left( {{\pi _t}\left( \theta \right)} \right)} \right)} \right]} \] where \( \Phi \) is the distribution function, with the logistic or normal distributions typically used and giving rise to the logit and probit models respectively. In order to avoid over-fitting in systems with many parameters or generally ill posed problems, regularization, originally proposed by Tikhonov (1974), may be employed. While there are a number of ways to achieve this, one method, based on the \( L_2 \) norm which is rotationally invariant, adjusts the log-likelihood in the following manner: \[ l\left( \theta \right) = \sum\limits_{t = 1}^T {\left[ {{y_t}\log \Phi \left( {{\pi _t}\left( \theta \right)} \right) + \left( {1 – {y_t}} \right)\log \left( {1 – \Phi \left( {{\pi _t}\left( \theta \right)} \right)} \right)} \right]} – \frac{C}{2}\sum\limits_{j = 1}^m {\theta _j^2} \] where it is clear how higher values of the \( m \) parameters are penalized, with the cost \( C \) determined by the user. The objective function of the logistic regression model with regularization is closely related to that of support vector regression (svr) which uses either an \( L_1 \) or \( L_2 \) loss function. The differences however do extend beyond this since svr usually makes use of a rich choice of nonlinear basis functions to maximize the margin of the decision variables, though it is just as feasible to use any number of basis functions in the binary response model or any other regression modelling context (see for example Kohn (2001)). One such interesting approach is to use hinge basis functions via the multivariate adaptive regression splines (MARS) model of Friedman (1991), within a logistic regression setting (something which is easy to do thanks to the excellent earth package). This approach is left for a followup article.

Estimation and Forecasting

Because of the presence of the autoregressive parameter in the dynamics, the model is highly nonlinear necessitating a more considered estimation strategy. Restarting the estimation from a different set of points a few times is likely to avoid local optima in this respect. For the backtesting application described in the next section, which uses a moving window, each model is estimated conditioning on the value of the previous window’s likelihood which should be close together. Additionally, in order to increase confidence in the numerical solution and reduce estimation time, analytic derivatives have been utilized throughout. Unlike other nonlinear models, the binary nature of the model allows explicit multi-period iterated forecasts by enumeration of all the possible binary paths. Following from the Appendix of Kauppi (2008), and adjusting the notation to use forward times, the h-period ahead forecast can be represented as follows: \[  {E_t}\left( {{y_{t + h}}} \right) = {E_t}\Phi \left( {{\alpha ^h}{\pi _t} + \sum\limits_{j = 1}^h {\left[ {{\alpha ^{j – 1}}\left( {\omega  + \delta {y_{t + h – j}} + \sum\limits_{i = 1,(h – {l_i}) < j}^p {{\beta _i}{x_{i,t + \left( {h – {l_i}} \right) + 1 – j}}} } \right)} \right]} } \right) \] \[ = \sum\limits_{y_{t + 1}^{t + h – 1} \in {B_{h – 1}}} {{P_t}\left( {y_{t + 1}^{t + h – 1}} \right)} \Phi \left( {{\alpha ^h}{\pi _t} + \sum\limits_{j = 1}^h {\left[ {{\alpha ^{j – 1}}\left( {\omega + \delta {y_{t + h – j}} + \sum\limits_{i = 1,L\left( {{x_i}} \right) \geqslant j}^l {{\beta _i}{x_{i,t + 1 – j}}} } \right)} \right]} } \right) \] where, \( {y_{t + 1}^{t + h – 1} \in {B_{h – 1}}} \) indicates the evaluation of all possible binary paths for \( y \) up to time \( t+h-1 \), \( l_i \) is the lag of each explanatory variable \( x_i,i=1,\ldots k \), and \[ {P_t}\left( {y_{t + 1}^{t + h – 1}} \right) = \prod\limits_{n = 1}^{h – 1} {{{\left( {{p_{t + n}}} \right)}^{{y_{t + n}}}}{{\left( {1 – {p_{t + n}}} \right)}^{\left( {1 – {y_{t + n}}} \right)}}}\\ {p_{t + n}} = \Phi \left( {{\alpha ^n}{\pi _t} + \sum\limits_{j = 1}^n {{\alpha ^{j – 1}}\left( {\omega + \delta {y_{t – 1 + j}} + \sum\limits_{i = 1,{l_i} \geqslant j}^k {{\beta _i}{x_{\left\{ {i,t – {l_i} + j} \right\}}}} } \right)} } \right) \\ \]

Predicting the S&P 500 Monthly Direction

The objective of the application is the monthly directional forecast of the S&P 500 excess return. As such, a number of economic, fundamental and technical series, believed to influence or capture investor expectations, are used. The forecast period covered is 1979-2013. The dataset consists of the following:

  • S&P 500 from Yahoo finance (GSPC). The 1 Month Treasury (as a proxy for ‘riskfree’) from the Fama-French dataset is subtracted from the monthly S&P 500 returns. The direction (0,1) of the excess returns then forms the binary dependent variable (exspdir).
  • The spread of the 10Y Constant Maturity Treasury and 3 Month Treasury Bill from FRED (yspread). The data is based on the H15 point in time historical release weekly data which is available every Monday, unless it falls on one of the 10 public holidays in which case it has been manually lagged (according to Federal Reserve guidelines) so that there is no look ahead bias. This is then aligned to the end of month value of the S&P 500 closing prices using the last known release.
  • The dividend yield of the S&P 500 (dy). While this is available quarterly, it is released with a variable lag, so for prudence a 3 month lag is assumed for the release (e.g. the Q1 dividend is not available until the end of June).
  • The monthly change in the 3 Month Treasury Bill (tb3ch).
  • The 12 month rate of change in the Consumer Price Index (cpich). The data is based on the ALFRED point in time vintage series CPIAUCNS (non seasonally adjusted) after adjusting for a number of index re-bases over the years, and aligned to the end of month S&P 500 closing prices using the last known release.
  • The 12 month rate of change in Industrial Production (ipch). The data is based on the ALFRED point in time vintage series INDPRO (seasonally adjusted) after adjusting for a number of index re-bases over the years, and aligned to the end of month S&P 500 closing prices using the last known release.
  • The median of the rolling difference between an EMA of length 25 and 250 on the S&P500 closing price, aligned to the end of month S&P 500 closing prices.

Unlike previous studies in this area, I am NOT going to use a recursive window for the walk forward testing. Instead, I am going to use a moving window of length 200, in order to capture changes in relationships which cannot possible have been constant of the entire dataset, and other possible structural breaks. The size of 200 was chosen so as to provide a compromise between parameter constancy and consistency of the estimates. It is also possible to test for different window sizes for each estimation window and compare the average likelihood, but this is not undertaken here. Also, for the current application, I have also not undertaken to find the best possible model among all possible combination of explanatory variables for each window as was done in Pesaran and Timmermann (1995).

Table 1 provides a set of summary statistics for the performance of a portfolio (DBM) which goes long when the forecast probability of a positive return is greater than 50%, and into the 1-Month Treasury otherwise. For comparison, a popular strategy based on the crossover of a 10-Month Exponential Moving Average (EMA) is also included2 as is the benchmark Buy and Hold portfolio (B&H). For completeness, the table also includes the performance of the model when using the SPY ETF as a proxy for the total return index (which starts in 1993), since the GSPC does not include dividends but has a longer history (which was key in running the backtest).

The results are very clear, using any of the measures provided. The Dynamic Binary portfolio provides superior and actionable signals which may be used to actively time the index. A visual inspection of Figure 1 confirms the significant outperformance of an investor holding this portfolio over the period 1979-2013, with a higher Sharpe, lower drawdowns and positive skewness. Figure 2 shows the directional probability forecast and buy zones for the S&P 500 based on this model.
An examination of the parameter coefficients over the rolling windows also reveals an interesting story. As Figure 3 shows, not all coefficients were significant over the entire period, alternating between periods of significance and non-significance. The CPI and Technical indicator appear to have been significant most of the time, but more importantly, the autoregressive parameter which is the innovation of the approach adopted was significant throughout the period, and close to unity indicating very persistent dynamics.

1979-2013 DBM (SPY) DBM EMA B&H 2010-2013 DBM (SPY) DBM EMA B&H
CAGR 12.89 9.07 6.63 CAGR 19.43 17.46 7.29 12.22
Vol(Ann) 10.45 11.58 14.95 Vol(Ann) 12.26 12.23 11.06 14.49
%Up 78.71 72.03 58.77 %Up 76.74 72.09 67.44 62.79
MaxDraw 17.03 24.5 52.56 MaxDraw 16.23 17.03 12.52 17.03
CAPM(alpha) 5.71 1.93 CAPM(alpha) 9.19 7.55 0.47
CAPM(beta) 0.4769 0.5744 CAPM(beta) 0.7458 0.7418 0.5682
Timing 6.3426 0.9882 Timing 3.2494 3.3475 0.1619
Sharpe 0.7295 0.3424 0.0956 Sharpe 1.5796 1.4222 0.6535 0.8393
Information 0.7595 0.3572 0.1009 Information 1.5810 1.4235 0.6537 0.8398
Calmar 0.7569 0.37 0.1262 Calmar 1.1972 1.0255 0.5826 0.7179
Kurt(ex) 2.3054 6.4895 1.8692 Kurt(ex) 0.5478 0.625 0.4377 -0.092
Skew 0.4646 -0.8709 -0.4357 Skew 0.2126 0.1897 -0.6333 -0.2002
1993-2013 DBM (SPY) DBM EMA B&H
CAGR 13.81 12.38 9.77 6.79
Vol(Ann) 10.44 10.39 10.13 15.01
%Up 77.24 76.83 74.80 62.60
MaxDraw 16.23 17.03 15.57 52.56
CAPM(alpha) 8.25 6.98 4.77
CAPM(beta) 0.4989 0.4963 0.4597
Timing 11.7045 12.1168 2.1630
Sharpe 1.0172 0.8879 0.6596 0.2517
Information 1.0441 0.9115 0.6788 0.2587
Calmar 0.8512 0.7271 0.6272 0.1291
Kurt(ex) 0.8682 0.9920 2.9494 1.1578
Skew 0.2891 0.2257 -0.5237 -0.6921


Figure 1


Figure 2

Figure 2


Figure 3

Concluding Remarks

While the elusive art of return forecasting may safely be left to crystal gazers, the generation of actionable and accurate signs forecasts is within reach. Using a number of lagged economic and technical predictors in a dynamic binary model, the forecast of the monthly direction of the S&P 500 from 1979-2013 yielded superior results on all measures considered to the Buy and Hold passive index strategy and a popular trend following strategy. This indicates that there is clearly value in this model for market timing and active indexing.

Possible extensions would likely consider the reason for the asymmetric results when considering negative returns, possibly nonlinearities giving rise to a skewed distribution, as well as the use of different basis functions for the predictors.

1 Despite Pesaran and Timmermann (1995), whose results I have been unable to duplicate successfully using vintage series…but this may be the fault of my own setup/understanding of their backtest procedure.
2 The strategy goes long the index when the price is above its 10-Month EMA, and into the 1-Month Treasury otherwise.


Al-Osh, M. A., & Alzaid, A. A. (1991). Binomial autoregressive moving average models. Stochastic Models, 7(2), 261-282.

Amemiya, T. (1975). Qualitative response models. In Annals of Economic and Social Measurement, Volume 4, number 3 (pp. 363-372). NBER.

Anatolyev, S., & Gospodinov, N. (2010). Modeling financial return dynamics via decomposition. Journal of Business & Economic Statistics, 28(2).

Christoffersen, P. F., & Diebold, F. X. (2006). Financial asset returns, direction-of-change forecasting, and volatility dynamics. Management Science, 52(8), 1273-1287.

de Jong, R. M., & Woutersen, T. (2011). Dynamic time series binary choice. Econometric Theory, 27(04), 673-702.

Finney, D. J. (1947). Probit analysis; a statistical treatment of the sigmoid response curve.

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27-38.

Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 1-67.

Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in medicine, 21(16), 2409-2419.

Kauppi, H., & Saikkonen, P. (2008). Predicting US recessions with dynamic binary response models. The Review of Economics and Statistics, 90(4), 777-791.

Kohn, R., Smith, M., & Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11(4), 313-322

Lee, T. H., & Yang, Y. (2006). Bagging binary and quantile predictors for time series. Journal of econometrics, 135(1), 465-497.

Nyberg, H. (2010). Dynamic probit models and financial variables in recession forecasting. Journal of Forecasting, 29(1‐2), 215-230.

Nyberg, H. (2011). Forecasting the direction of the US stock market with dynamic binary probit models. International Journal of Forecasting, 27(2), 561-578.

Nyberg, H. (2012). Risk-return tradeoff in US stock returns over the business cycle. Journal of Financial and Quantitative Analysis, 47(01), 137-158.

Nyberg, H.(forthcoming). A Bivariate Autoregressive Probit Model: Business Cycle Linkages and Transmission of Recession Probabilities.  Macroeconomic Dynamics.

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195, 1-405.

Pesaran, M. H., & Timmermann, A. (1995). Predictability of stock returns: Robustness and economic significance. The Journal of Finance, 50(4), 1201-1228.

Rydberg, T. H., & Shephard, N. (2003). Dynamics of trade-by-trade price movements: decomposition and models. Journal of Financial Econometrics, 1(1), 2-25.

Startz, R. (2008). Binomial autoregressive moving average models with an application to US recessions. Journal of business & economic statistics, 26(1).

Tikhonov, A. N., & Arsenin, V. Y. (1974). Methods of solving incorrect problems. Science, Moscow.

Zorn, C. (2005). A solution to separation in binary response models. Political Analysis, 13(2), 157-170.

To leave a comment for the author, please follow the link and comment on their blog: unstarched» R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)