# Forecasting From a Regression with a Square Root Dependent Variable

**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*don’t*just report the exponentials of the original forecasts. You need to add an adjustment that takes account of the connection between a Normal random variable and a log-Normal random variable, and the relationship between their means.

*square root*of y, but we wanted to forecast the y itself. I’m not sure why this particular transformation was of interest, but let’s take a look at the question.

Suppose that we’re working with a regression model of the form

√(y

_{t}) = β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt}+ ε

_{t}; t = 1, 2, …., n.

^{2}). Crucially, we’ll also assume that they are

*Normally distributed*.

_{t})]* = b

_{1}+ b

_{2}x

_{2t}+ ….. + b

_{k}x

_{kt}, where b

_{i}is the estimated value of β

_{i}(i = 1, 2, ……, k). These estimates might be obtained by OLS, but that’s not crucial to the following argument.

Now, we’re likely to be more interested in fitted values expressed in terms of the

*original data*– that is, in terms of y itself, rather than √(y) .

_{t}* = {[√(y

_{t})]*}

^{2}. Unfortunately, just squaring the original forecasts will result in an unnecessary distortion.

_{t}is Normally distributed, then √(y

_{t}) is also Normally distributed, with a mean of (β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt}), and a variance of σ

^{2}. Another way of describing √(y

_{t}) is that it is σZ + (β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt}), where Z is a Standard Normal variate.

_{t}itself. We get this by squaring the√(y

_{t}) random variable, so we can write:

_{t}= σ

^{2}Z

^{2}+ (β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt})

^{2}+ 2σZ(β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt}).

_{t}] = σ

^{2 }E[Z

^{2}] + (β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt})

^{2},

^{2}is a Chi-Square variate with one degree of freedom, and that the mean of a Chi-Square variate is its degrees of freedom, we immediately see that

_{t}] = (β

_{1}+ β

_{2}x

_{2t}+ β

_{3}x

_{3t}+ ……+ β

_{k}x

_{kt})

^{2}+ σ

^{2}.

^{2}.

^{2}is unobservable but we can replace it with its unbiased estimator, s

^{2}. The latter is the sum of squared residuals (

*from the original regression, with*√(y

_{t})

*as the dependent variable*), divided by the degrees of freedom, (n – k).

*downwards*direction. Of course, the magnitude of this distortion will depend on both the scale of measurement for our y data, and the signal-to-noise ratio in our regression model.

In my earlier post relating to this issue in the context of log-linear regressions, I illustrated the results using both EViews and R. Specifically, I looked at a crude model for the “Airplane Passengers” (AP) time-series, based on the analysis of Cowperwait and Metcalf (2009, pp. 109-118). Here, I’ll just use R to illustrate the results of the current post. The R script is available on this blog’s **code page**, and it can be opened with any text editor. The square root of AP is regressed against a quadratic time trend and various Sine and Cosine terms of the form SIN(2πit) and COS(2πit); i = 1, 2, …, 5:

*etc.*

**Reference**Cowperwait, P. S. P. and A. V. Metcalf, 2009.

*Introductory Time Series With R*. Springer, Dordrecht.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.