Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio.

In the example below, I will show how to model compositional outcomes with a simple LogRatio regression. The underlying idea is very simple. With the D-dimension outcome [p_1, p_2…p_D], we can derive a [D-1]-dimension outcome [log(p_2 / p_1)…log(p_D / p_1)] and then estimate a multivariate regression based on the new outcome.

df = get("ArcticLake", envir = asNamespace('DirichletReg'))

#   sand  silt  clay depth
#1 0.775 0.195 0.030  10.4
#2 0.719 0.249 0.032  11.7
#3 0.507 0.361 0.132  12.8

lm(cbind(log(silt / sand), log(clay / sand)) ~ depth, data = df)

#Response log(silt/sand):
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.649656   0.236733  -2.744   0.0093 **
#depth        0.037522   0.004269   8.790 1.36e-10 ***
#
#Response log(clay/sand) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -2.614897   0.421383  -6.206 3.31e-07 ***
#depth        0.062181   0.007598   8.184 8.00e-10 ***



Since log(x / y) = log(x) – log(y), we can also estimate the model with log(sand) as an offset term.

lm(cbind(log(silt), log(clay)) ~ depth + offset(log(sand)), data = df)

#Response log(silt) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.649656   0.236733  -2.744   0.0093 **
#depth        0.037522   0.004269   8.790 1.36e-10 ***
#
#Response log(clay) :
#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
#(Intercept) -2.614897   0.421383  -6.206 3.31e-07 ***
#depth        0.062181   0.007598   8.184 8.00e-10 ***



Alternatively, we can also use the comp.reg function in the Compositional package.

Compositional::comp.reg(as.matrix(df[, 1:3]), df[, 4])

#$be # [,1] [,2] #(Intercept) -0.64965598 -2.61489731 #x 0.03752186 0.06218069 # #$seb
#                   [,1]        [,2]
#(Intercept) 0.236733203 0.421382652
#x           0.004268588 0.007598043