More on logging the outcome

[This article was first published on Matt's Stats n stuff » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This one does my head in. I do it fairly regularly, lots of people do, but I find everytime it comes to interpreting the results I have to slow it right down and go step by step.

Answer: When you log the outcome, then on the original scale, for all else constant, Y will be exp(b1*d) higher. Or,

Y, your outcome on the original scale, will change by (exp(beta*d)-1)*100 % for a d unit change in x1.

Where b1 is the coefficient for the variable of interest and d is the difference in that variable you are interested in.


Here’s how we got there:

We have

Y = a + b1x1 + b2x2 + e

right, simple. Then with logging the outcome we have,

log(Y) = a + b1x1 + b2x2 + e

right, our model is still linear.

What how do we interpret the b1 coefficient on the original scale of Y? Take the exponential of both sides (assuming we used a natural log). Then we have,

Y = exp(a + b1x1 + b2x2 + e)

which with our log/exponent rules is the equivalent of

Y = exp(a)*exp(b1x1)*exp(b2x2)*exp(e)

if we hone on variable 1, of interest, then we fix the levels of the other variables (‘for all else constant’). So we reduce to,

Y = c*exp(b1x1)

where c is some arbitrary constant.
For an increase in x1 of 10 say, we have

Y = c*exp(b1(x1+10))

which is

Y = c*exp(b1x1+b1*10)

right, and as before this becomes

Y = c*exp(b1x1)*exp(b1*10)

Now we’re getting some! So x1 compared to x1+10 (10 units higher) is difference between c*exp(b1x1) and c*exp(b1x1)*exp(b1*10). So what is the difference? Well the latter is a multiple of exp(b1*10) higher than the former! It goes without saying that that holds regardless of what difference you are looking at, be it 10 or 20 or 100.

So that leaves us where? With still a bit more to go. b1 is constant and fixed right, and for the sake of a comparison, so is x1, so let’s roll that term in with our constant, and lets not look at 10 but look at d for difference.

For our base,

Y = c’

where c’ is c*exp(b1x1), and our change is to

Y = c’*exp(b1*d)

Regardless of our choice of x1, if it changes by d (our difference) the outcome on the original scale will be exp(b1*d) higher. Done.

An example with numbers:

b1 <- 0.68
x1 <- 0.2
d <- 0.1
y1 <- 8 * exp(beta * x1)
y2 <- 8 * exp(beta * (x1+d))
> y1
[1] 9.165455
> y2
[1] 9.810385
> y2 - y1
[1] 0.6449301
> (y2 - y1)/y1
[1] 0.07036531
> exp(beta*d)-1
[1] 0.07036531

In this example our c (constant of everything else rolled up) is 8 (arbitrarily chosen). The last part of that shows that y2 is 7% (100*0.070) higher than y1.

And there lies the key!

Y1, your outcome on the original scale, will change by (exp(beta*d)-1)*100 % for a d unit change in x1.

When beta is positive, it is a increase, when beta is negative it is a decrease.

per <- (y2-y1)/y1
y1*(1+per)
y2
> y1*(1+per)
[1] 9.810385
> y2
[1] 9.810385

R syntax made pretty by Pretty R at inside-R.org


To leave a comment for the author, please follow the link and comment on their blog: Matt's Stats n stuff » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)