LogRatio Regression – A Simple Way to Model Compositional Data

Posted on April 15, 2018 by statcompute in R bloggers | 0 Comments

[This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio.

In the example below, I will show how to model compositional outcomes with a simple LogRatio regression. The underlying idea is very simple. With the D-dimension outcome [p_1, p_2…p_D], we can derive a [D-1]-dimension outcome [log(p_2 / p_1)…log(p_D / p_1)] and then estimate a multivariate regression based on the new outcome.

df = get("ArcticLake", envir = asNamespace('DirichletReg'))

#   sand  silt  clay depth
#1 0.775 0.195 0.030  10.4
#2 0.719 0.249 0.032  11.7
#3 0.507 0.361 0.132  12.8

lm(cbind(log(silt / sand), log(clay / sand)) ~ depth, data = df)

#Response log(silt/sand):
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.649656   0.236733  -2.744   0.0093 **
#depth        0.037522   0.004269   8.790 1.36e-10 ***
#
#Response log(clay/sand) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -2.614897   0.421383  -6.206 3.31e-07 ***
#depth        0.062181   0.007598   8.184 8.00e-10 ***

Since log(x / y) = log(x) – log(y), we can also estimate the model with log(sand) as an offset term.


lm(cbind(log(silt), log(clay)) ~ depth + offset(log(sand)), data = df)

#Response log(silt) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.649656   0.236733  -2.744   0.0093 **
#depth        0.037522   0.004269   8.790 1.36e-10 ***
#
#Response log(clay) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -2.614897   0.421383  -6.206 3.31e-07 ***
#depth        0.062181   0.007598   8.184 8.00e-10 ***

Alternatively, we can also use the comp.reg function in the Compositional package.


Compositional::comp.reg(as.matrix(df[, 1:3]), df[, 4])

#$be
#                   [,1]        [,2]
#(Intercept) -0.64965598 -2.61489731
#x            0.03752186  0.06218069
#
#$seb
#                   [,1]        [,2]
#(Intercept) 0.236733203 0.421382652
#x           0.004268588 0.007598043

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

LogRatio Regression – A Simple Way to Model Compositional Data

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)