(This article was first published on

**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers)Back in 2013 I wrote a post that was titled, “Forecasting From Log-Linear Regressions“. The basis for that post was the well-known result that if you estimate a linear regression model with the (natural) logarithm of y as the dependent variable, but you’re actually interested in forecasting y itself, you

*don’t*just report the exponentials of the original forecasts. You need to add an adjustment that takes account of the connection between a Normal random variable and a log-Normal random variable, and the relationship between their means.Today, I received a query from a blog-reader who asked how the results in that post would change if the dependent variable was the

*square root*of y, but we wanted to forecast the y itself. I’m not sure why this particular transformation was of interest, but let’s take a look at the question.In this case we can exploit the relationship between a (standard) Normal distribution and a Chi-Square distribution in order to answer the question.

I’m going to “borrow” heavily from my earlier post.

Suppose that we’re working with a regression model of the form

Suppose that we’re working with a regression model of the form

√(y

_{t}) = β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt}+ ε

_{t}; t = 1, 2, …., n.

The explanatory variables are assumed to be non-random, so that rules out models with lagged values of the dependent variable appearing as regressors. Let’s also assume that the errors have a zero mean, and are serially independent,and homoskedastic (with a variance of σ

^{2}). Crucially, we’ll also assume that they are*Normally distributed*.Once we estimate the model, we have the “fitted” values for the dependent variable. That is, we have values of [√(y

_{t})]* = b_{1}+ b_{2}x_{2t}+ ….. + b_{k}x_{kt}, where b_{i}is the estimated value of β_{i}(i = 1, 2, ……, k). These estimates might be obtained by OLS, but that’s not crucial to the following argument.Now, we’re likely to be more interested in fitted values expressed in terms of the

*original data*– that is, in terms of y itself, rather than √(y) .

You might think that all we have to do is to apply the inverse of the square root transformation, and the fitted values of interest will be y

_{t}* = {[√(y_{t})]*}^{2}. Unfortunately, just squaring the original forecasts will result in an unnecessary distortion.Let’s see why.

If ε

_{t}is Normally distributed, then √(y_{t}) is also Normally distributed, with a mean of (β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt}), and a variance of σ^{2}. Another way of describing √(y_{t}) is that it is σZ + (β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt}), where Z is a Standard Normal variate.Now consider y

_{t}itself. We get this by squaring the√(y_{t}) random variable, so we can write: y

_{t}= σ^{2}Z^{2}+ (β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt})^{2}+ 2σZ(β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt}).Taking expectations,

E[y

_{t}] = σ^{2 }E[Z^{2}] + (β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt})^{2},because the mean of Z is zero.

Recalling that Z

^{2}is a Chi-Square variate with one degree of freedom, and that the mean of a Chi-Square variate is its degrees of freedom, we immediately see that E[y

_{t}] = (β_{1}+ β_{2}x_{2t}+ β_{3}x_{3t}+ ……+ β_{k}x_{kt})^{2}+ σ^{2}.So, if we want to obtain forecasts for y, we should square the forecasts of √(y), but then we need to add σ

^{2}.Of course, σ

^{2}is unobservable but we can replace it with its unbiased estimator, s^{2}. The latter is the sum of squared residuals (*from the original regression, with*√(y_{t})*as the dependent variable*), divided by the degrees of freedom, (n – k).Failure to add this extra term will result in point forecasts that are distorted in a

*downwards*direction. Of course, the magnitude of this distortion will depend on both the scale of measurement for our y data, and the signal-to-noise ratio in our regression model.In my earlier post relating to this issue in the context of log-linear regressions, I illustrated the results using both EViews and R. Specifically, I looked at a crude model for the “Airplane Passengers” (AP) time-series, based on the analysis of Cowperwait and Metcalf (2009, pp. 109-118). Here, I’ll just use R to illustrate the results of the current post. The R script is available on this blog’s **code page**, and it can be opened with any text editor. The square root of AP is regressed against a quadratic time trend and various Sine and Cosine terms of the form SIN(2πit) and COS(2πit); i = 1, 2, …, 5:

The time series “APF” is the series of naive within-sample predictions, obtained by simply squaring the fitted values for √(AP). The time-series “APFAD” incorporates the adjustment term discussed above. In this particular case, s = 0.5098, so there’s not a huge difference between APF and APFAD:

*etc.*

However, the sum of the squared (within-sample) prediction errors, based on AP and APF is 42318.42, while that based on AP and APFAD is 42310.45. So, there’s a bit of improvement in overall forecast performance when we take the adjustment term into account.

A final comment is in order. Although we’ve been able to see how to “back out” appropriate forecasts of y when the original regression has a dependent variable that is either log(y) or √(y). We were able to do this only because when the inverses of these particular transformations are applied to a Normal random variable, the resulting new random variable has a known distribution. This little “trick” is not going to work – at least, not easily – for arbitrary non-linear transformations of y.

**Reference**Cowperwait, P. S. P. and A. V. Metcalf, 2009.

*Introductory Time Series With R*. Springer, Dordrecht.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometrics Beat: Dave Giles' Blog**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...