If εt is Normally distributed, then √(yt) is also Normally distributed, with a mean of (β1 + β2x2t + β3x3t + ……+ βkxkt), and a variance of σ2. Another way of describing √(yt) is that it is σZ + (β1 + β2x2t + β3x3t + ……+ βkxkt), where Z is a Standard Normal variate.
Now consider yt itself. We get this by squaring the√(yt) random variable, so we can write:
yt = σ2Z2 + (β1 + β2x2t + β3x3t + ……+ βkxkt)2 + 2σZ(β1 + β2x2t + β3x3t + ……+ βkxkt).
E[yt] = σ2 E[Z2] + (β1 + β2x2t + β3x3t + ……+ βkxkt)2,
because the mean of Z is zero.
Recalling that Z2 is a Chi-Square variate with one degree of freedom, and that the mean of a Chi-Square variate is its degrees of freedom, we immediately see that
E[yt] = (β1 + β2x2t + β3x3t + ……+ βkxkt)2 + σ2.
So, if we want to obtain forecasts for y, we should square the forecasts of √(y), but then we need to add σ2.
Of course, σ2 is unobservable but we can replace it with its unbiased estimator, s2. The latter is the sum of squared residuals (from the original regression, with √(yt) as the dependent variable), divided by the degrees of freedom, (n – k).
Failure to add this extra term will result in point forecasts that are distorted in a downwards direction. Of course, the magnitude of this distortion will depend on both the scale of measurement for our y data, and the signal-to-noise ratio in our regression model.
In my earlier post relating to this issue in the context of log-linear regressions, I illustrated the results using both EViews and R. Specifically, I looked at a crude model for the “Airplane Passengers” (AP) time-series, based on the analysis of Cowperwait and Metcalf (2009, pp. 109-118). Here, I’ll just use R to illustrate the results of the current post. The R script is available on this blog’s code page, and it can be opened with any text editor. The square root of AP is regressed against a quadratic time trend and various Sine and Cosine terms of the form SIN(2πit) and COS(2πit); i = 1, 2, …, 5:
The time series “APF” is the series of naive within-sample predictions, obtained by simply squaring the fitted values for √(AP). The time-series “APFAD” incorporates the adjustment term discussed above. In this particular case, s = 0.5098, so there’s not a huge difference between APF and APFAD:
However, the sum of the squared (within-sample) prediction errors, based on AP and APF is 42318.42, while that based on AP and APFAD is 42310.45. So, there’s a bit of improvement in overall forecast performance when we take the adjustment term into account.
A final comment is in order. Although we’ve been able to see how to “back out” appropriate forecasts of y when the original regression has a dependent variable that is either log(y) or √(y). We were able to do this only because when the inverses of these particular transformations are applied to a Normal random variable, the resulting new random variable has a known distribution. This little “trick” is not going to work – at least, not easily – for arbitrary non-linear transformations of y.