Forecasting From a Regression with a Square Root Dependent Variable

[This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Back in 2013 I wrote a post that was titled, “Forecasting From Log-Linear Regressions“. The basis for that post was the well-known result that if you estimate a linear regression model with the (natural) logarithm of y as the dependent variable, but you’re actually interested in forecasting y itself, you don’t just report the exponentials of the original forecasts. You need to add an adjustment that takes account of the connection between a Normal random variable and a log-Normal random variable, and the relationship between their means.

Today, I received a query from a blog-reader who asked how the results in that post would change if the dependent variable was the square root of y, but we wanted to forecast the y itself. I’m not sure why this particular transformation was of interest, but let’s take a look at the question.

In this case we can exploit the relationship between a (standard) Normal distribution and a Chi-Square distribution in order to answer the question.

I’m going to “borrow” heavily from my earlier post.

Suppose that we’re working with a regression model of the form

              √(yt) = β1 + β2x2t + β3x3t + ……+ βkxkt + εt    ;   t = 1, 2, …., n.

The explanatory variables are assumed to be non-random, so that rules out models with lagged values of the dependent variable appearing as regressors. Let’s also assume that the errors have a zero mean, and are serially independent,and homoskedastic (with a variance of σ2). Crucially, we’ll also assume that they are Normally distributed.

Once we estimate the model, we have the “fitted” values for the dependent variable. That is, we have values of [√(yt)]* = b1 + b2x2t + ….. + bkxkt, where bi is the estimated value of βi (i = 1, 2, ……, k). These estimates might be obtained by OLS, but that’s not crucial to the following argument.

Now, we’re likely to be more interested in fitted values expressed in terms of the original data – that is, in terms of y itself, rather than √(y) .

You might think that all we have to do is to apply the inverse of the square root transformation, and the fitted values of interest will be yt* = {[√(yt)]*}2. Unfortunately, just squaring the original forecasts will result in an unnecessary distortion. 

Let’s see why.

If εt is Normally distributed, then √(ytis also Normally distributed, with a mean of (β1 + β2x2t + β3x3t + ……+ βkxkt), and a variance of σ2. Another way of describing √(yt) is that it is σZ + (β1 + β2x2t + β3x3t + ……+ βkxkt), where Z is a Standard Normal variate.

Now consider yt itself. We get this by squaring the√(yt) random variable, so we can write:

    yt = σ2Z2(β1 + β2x2t + β3x3t + ……+ βkxkt)2 + 2σZ(β1 + β2x2t + β3x3t + ……+ βkxkt).

Taking expectations,

    E[yt] = σE[Z2] + (β1 + β2x2t + β3x3t + ……+ βkxkt)2,

because the mean of Z is zero.

Recalling that Z2 is a Chi-Square variate with one degree of freedom, and that the mean of a Chi-Square variate is its degrees of freedom, we immediately see that

    E[yt] = (β1 + β2x2t + β3x3t + ……+ βkxkt)2 + σ2.

So, if we want to obtain forecasts for y, we should square the forecasts of √(y), but then we need to add σ2

Of course, σ2 is unobservable but we can replace it with its unbiased estimator, s2. The latter is the sum of squared residuals (from the original regression, with √(yt) as the dependent variable), divided by the degrees of freedom, (n – k). 

Failure to add this extra term will result in point forecasts that are distorted in a downwards direction. Of course, the magnitude of this distortion will depend on both the scale of measurement for our y data, and the signal-to-noise ratio in our regression model.

In my earlier post relating to this issue in the context of log-linear regressions, I illustrated the results using both EViews and R. Specifically, I looked at a crude model for the “Airplane Passengers” (AP) time-series, based on the analysis of Cowperwait and Metcalf (2009, pp. 109-118). Here, I’ll just use R to illustrate the results of the current post. The R script is available on this blog’s code page, and it can be opened with any text editor. The square root of AP is regressed against a quadratic time trend and various Sine and Cosine terms of the form SIN(2πit) and COS(2πit); i = 1, 2, …, 5:

The time series “APF” is the series of naive within-sample predictions, obtained by simply squaring the fitted values for (AP). The time-series “APFAD” incorporates the adjustment term discussed above. In this particular case, s = 0.5098, so there’s not a huge difference between APF and APFAD:


However, the sum of the squared (within-sample) prediction errors, based on AP and APF is 42318.42, while that based on AP and APFAD is 42310.45. So, there’s a bit of improvement in overall forecast performance when we take the adjustment term into account.

A final comment is in order. Although we’ve been able to see how to “back out” appropriate forecasts of y when the original regression has a dependent variable that is either log(y) or √(y). We were able to do this only because when the inverses of these particular transformations are applied to a Normal random variable, the resulting new random variable has a known distribution. This little “trick” is not going to work – at least, not easily – for arbitrary non-linear transformations of y.


Cowperwait, P. S. P. and A. V. Metcalf, 2009. Introductory Time Series With R. Springer, Dordrecht.

© 2019, David E. Giles

To leave a comment for the author, please follow the link and comment on their blog: Econometrics Beat: Dave Giles' Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)