[This article was first published on Strenge Jacke! » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Beautiful HTML tables of linear models

In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN.

There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors. There are examples below that demonstrate these features.

The sjt.lm function prints results and summaries of linear models as HTML-table. These tables can be viewed in the RStudio Viewer pane, web browser or easily exported to office applications. See also my former posts on the table printing functions of my package here and here.

Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. The original online-manual for this function can be found here.

All following tables can be reproduced with the sjPlot package and the sample data set from this package.

## Linear model summaries as HTML table

The sjt.lm function prints summaries of linear models (fitted with the lm function) as nicely formatted html-tables.

Before starting, sample data is loaded and sample models are fitted:

# sample data
data(efc)
# set variable labels
efc <- set_var_labels(efc, get_var_labels(efc))
# fit first model
fit1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + c172code, data=efc)
# Note that both models share the same predictors and only differ
# in their dependent variable. See examples of stepwise models
# later...

The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2.

sjt.lm(fit1, fit2)
 Model 1 Model 2 B CI p B CI p (Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001 carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206 average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001 carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061 carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034 Observations 821 832 R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

### Custom labels

You can specify the ‘model’ label via labelDependentVariables parameter:

sjt.lm(fit1,
fit2,
labelDependentVariables = c("Barthel-Index",
"Negative Impact"))
 Barthel-Index Negative Impact B CI p B CI p (Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001 carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206 average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001 carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061 carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034 Observations 821 832 R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

### More custom labels

Here is an example how to change the other labels. Note that showHeaderStrings makes the two labels on top and top left corner appear in the table.

sjt.lm(fit1,
fit2,
stringB = "Estimate",
stringCI = "Conf. Int.",
stringP = "p-value",
stringDependentVariables = "Response",
stringPredictors = "Coefficients",
stringIntercept = "Konstante",
labelDependentVariables = c("Barthel-Index",
"Negative Impact"))
 Coefficients Response Barthel-Index Negative Impact Estimate Conf. Int. p-value Estimate Conf. Int. p-value Konstante 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001 carer’ age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206 average number of hours of care for the elder in a week -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001 carer’s gender -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061 carer’s level of education: recoding of variable c172edu1 -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034 Observations 821 832 R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

## Changing summary style and content

You can change the table style with specific parameters, e.g. to include CI into the same table cell as the estimates, print asterisks instead of numeric p-values etc.

sjt.lm(fit1, fit2,
separateConfColumn = FALSE, # ci in same cell as estimates
showStdBeta = TRUE,         # also show standardized beta values
pvaluesAsNumbers = FALSE)   # "*" instead of numeric values
 Model 1 Model 2 B (CI) std. Beta (CI) B (CI) std. Beta (CI) (Intercept) 90.06(77.95 – 102.18) *** 8.46(6.67 – 10.24) *** carer’ age -0.22(-0.36 – -0.08) ** -0.10(-0.16 – -0.04) 0.01(-0.01 – 0.03) 0.05(-0.03 – 0.12) average number of hours of care for the elder in a week -0.28(-0.31 – -0.24) *** -0.48(-0.54 – -0.41) 0.02(0.01 – 0.02) *** 0.25(0.18 – 0.32) carer’s gender -0.26(-4.36 – 3.83) -0.00(-0.06 – 0.06) 0.57(-0.03 – 1.17) 0.06(-0.00 – 0.13) carer’s level of education: recoding of variable c172edu1 -0.76(-3.55 – 2.02) -0.02(-0.08 – 0.04) 0.44(0.03 – 0.86) * 0.07(0.01 – 0.14) Observations 821 832 R2 / adj. R2 0.270 / 0.266 0.079 / 0.075 Notes * p<0.05   ** p<0.01   *** p<0.001

## Custom variable labels

In the above example, the original variable labels are long and not much pretty. You can change variable labels either with set_var_labels (see this page for more detaila), which will affect all future plots and tables, or pass own labels via labelPredictors.

sjt.lm(fit1, fit2,
labelPredictors = c("Carer's Age",
"Hours of Care",
"Carer's Sex",
"Educational Status"))
 Model 1 Model 2 B CI p B CI p (Intercept) 90.06 77.95 – 102.18 < 0.001 8.46 6.67 – 10.24 < 0.001 Carer’s Age -0.22 -0.36 – -0.08 0.002 0.01 -0.01 – 0.03 0.206 Hours of Care -0.28 -0.31 – -0.24 < 0.001 0.02 0.01 – 0.02 < 0.001 Carer’s Sex -0.26 -4.36 – 3.83 0.900 0.57 -0.03 – 1.17 0.061 Educational Status -0.76 -3.55 – 2.02 0.592 0.44 0.03 – 0.86 0.034 Observations 821 832 R2 / adj. R2 0.270 / 0.266 0.079 / 0.075

## Compare models with different predictors

In some cases, for instance stepwise regressions, you have different predictors on the same response. The proper grouping of predictors, resp. rows, is done automatically.

First, let’s fit some example models.

# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit second model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

Note that printing tables with fitted models, which have different predictors do not automatically detect variable labels (maybe this will be implemented in a future package version).

sjt.lm(fit1, fit2, fit3,
separateConfColumn = FALSE,
showAIC = TRUE,
showFStat = TRUE)
 Model 1 Model 2 Model 3 B (CI) p B (CI) p B (CI) p (Intercept) 7.82(6.00 – 9.65) < 0.001 8.46(6.67 – 10.24) < 0.001 6.23(4.76 – 7.69) < 0.001 c160age 0.04(0.02 – 0.06) < 0.001 0.01(-0.01 – 0.03) 0.206 0.01(-0.01 – 0.03) 0.271 c172code 0.39(-0.03 – 0.81) 0.071 0.44(0.03 – 0.86) 0.034 0.24(-0.15 – 0.64) 0.230 c161sex 0.69(0.07 – 1.31) 0.028 0.57(-0.03 – 1.17) 0.061 c12hour 0.02(0.01 – 0.02) < 0.001 e42dep 1.50(1.23 – 1.77) < 0.001 tot_sc_e 0.21(0.01 – 0.41) 0.038 Observations 832 832 833 R2 / adj. R2 0.025 / 0.022 0.079 / 0.075 0.153 / 0.148 F-statistics 7.107*** 17.730*** 37.250*** AIC 4611.921 4566.622 4502.333

## Automatic grouping of categorical predictors

In case you have categorical variables with more than two factor levels, the sjt.lm function automatically groups the category levels to give a better overview of predictors in the table.

By default, automatic grouping is activated. To disable this feature, use group.pred = FALSE as parameter.

To demonstrate this feature, we first convert two predictors to factors (what they actually are, indeed). To do this, we use the to_fac function, which converts numerical variables into factors, however, does not remove the variable and value label attributes.

# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make dependency categorical
efc$e42dep <- to_fac(efc$e42dep)
# fit first model again (with c172code as factor)
fit1 <- lm(barthtot ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)
# fit second model again (with c172code as factor)
fit2 <- lm(neg_c_7 ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)

Now we can print the table.

sjt.lm(fit1, fit2)
 Model 1 Model 2 B CI p B CI p (Intercept) 97.17 88.37 – 105.97 < 0.001 7.76 5.97 – 9.55 < 0.001 carer’ age -0.06 -0.16 – 0.03 0.203 0.00 -0.02 – 0.02 0.683 average number of hours of care for the elder in a week -0.07 -0.10 – -0.04 < 0.001 0.01 0.00 – 0.01 0.015 carer’s level of education: recoding of variable c172edu1 intermediate level of education 1.50 -1.60 – 4.60 0.343 0.13 -0.50 – 0.76 0.689 high level of education 0.66 -3.20 – 4.52 0.738 0.72 -0.07 – 1.51 0.074 carer’s gender 0.09 -2.74 – 2.93 0.949 0.56 -0.02 – 1.13 0.058 how dependent is the elder? – subjective perception of carer slightly dependent -7.85 -12.86 – -2.83 0.002 1.11 0.09 – 2.13 0.033 moderately dependent -19.49 -24.42 – -14.57 < 0.001 2.37 1.37 – 3.37 < 0.001 severely dependent -56.87 -62.12 – -51.63 < 0.001 3.92 2.86 – 4.99 < 0.001 Observations 821 832 R2 / adj. R2 0.653 / 0.650 0.160 / 0.152

## Removing estimates from the output

With remove.estmates, specific estimates can be removed from the table output. This may make sense in case you have stepwise regression models and only want to compare the varying predictors but not the controls. remove.estmates either accepts the row indices of the rows of the table output that should be removed, or the coefficient’s names.

When using numeric indices, the estimates’ index number relates to the same order as coef(fit). Note that currently the intercept cannot be removed from the model output!

data(efc)
# attach variable labels to each variable of the data
# frame - useful for automatic label detection
efc <- set_var_labels(efc, get_var_labels(efc))
# make education categorical
efc$c172code <- to_fac(efc$c172code)
# make education categorical
efc$e42dep <- to_fac(efc$e42dep)
# make prettier variable labels
efc$c172code <- set_var_labels(efc$c172code, "Education")
efc$e42dep <- set_var_labels(efc$e42dep, "Dependency")
# fit first model
fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc)
# fit second model
fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc)
# fit third model
fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)

### Example 1: Complete table output

Here you have the complete table output. This helps you identify the row index numbers. Especially when you have multiple models with different predictors, the estimate’s position in the last model may differ from this estimate’s position in the table output.

sjt.lm(fit1, fit2, fit3)
 Model 1 Model 2 Model 3 B CI p B CI p B CI p (Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001 c160age 0.04 0.02 – 0.06 < 0.001 0.01 -0.01 – 0.03 0.306 0.01 -0.01 – 0.03 0.384 Education c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806 c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203 c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053 c12hour 0.02 0.01 – 0.02 < 0.001 Dependency e42dep2 1.18 0.16 – 2.20 0.024 e42dep3 2.53 1.53 – 3.52 < 0.001 e42dep4 4.32 3.31 – 5.33 < 0.001 tot_sc_e 0.21 0.01 – 0.41 0.042 Observations 832 832 833 R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

### Example 2: Remove first coefficient (after intercept)

sjt.lm(fit1, fit2, fit3,
remove.estimates = 2)
 Model 1 Model 2 Model 3 B CI p B CI p B CI p (Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001 Education c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806 c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203 c161sex 0.70 0.09 – 1.32 0.025 0.59 -0.01 – 1.19 0.053 c12hour 0.02 0.01 – 0.02 < 0.001 Dependency e42dep2 1.18 0.16 – 2.20 0.024 e42dep3 2.53 1.53 – 3.52 < 0.001 e42dep4 4.32 3.31 – 5.33 < 0.001 tot_sc_e 0.21 0.01 – 0.41 0.042 Observations 832 832 833 R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

### Example 3: Remove age and sex

sjt.lm(fit1, fit2, fit3,
remove.estimates = c("c160age", "c161sex"))
 Model 1 Model 2 Model 3 B CI p B CI p B CI p (Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001 Education c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806 c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203 c12hour 0.02 0.01 – 0.02 < 0.001 Dependency e42dep2 1.18 0.16 – 2.20 0.024 e42dep3 2.53 1.53 – 3.52 < 0.001 e42dep4 4.32 3.31 – 5.33 < 0.001 tot_sc_e 0.21 0.01 – 0.41 0.042 Observations 832 832 833 R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

### Example 4: Remove many esimates

sjt.lm(fit1, fit2, fit3,
remove.estimates = c(2,5,6,10))
 Model 1 Model 2 Model 3 B CI p B CI p B CI p (Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001 Education c172code2 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806 c172code3 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203 Dependency e42dep2 1.18 0.16 – 2.20 0.024 e42dep3 2.53 1.53 – 3.52 < 0.001 e42dep4 4.32 3.31 – 5.33 < 0.001 Observations 832 832 833 R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

### Example 5: Custom predictor labels

In most cases you need to define your own labels when removing estimates, especially when you have grouped categorical predictors, because automatic label detection in quite tricky in such situations. If you provide own labels, please note that grouped predictors’ headings (the variable name of the grouped, categorical variable) are still automatically set by the sjt.lm function (variable labels are used, so use set_var_labels for those categorical predictors). All data rows in the table, i.e. for each coefficient appearing in the model, you need to specify a label string.

In the next example, we have seven table rows with data (excluding intercept): mid and hi education (categories of the variable Education), Hours of Care, slight, moderate and severe dependency (categories of the variable Dependency) and Service Usage. These ‘rows’ need to be labelled.

sjt.lm(fit1, fit2, fit3,
labelPredictors = c("mid education",
"hi education",
"Hours of Care",
"slight dependency",
"moderate dependency",
"severe dependency",
"Service Usage"),
remove.estimates = c("c160age", "c161sex"))
 Model 1 Model 2 Model 3 B CI p B CI p B CI p (Intercept) 8.40 6.72 – 10.08 < 0.001 9.18 7.53 – 10.83 < 0.001 8.48 6.99 – 9.97 < 0.001 Education mid education 0.16 -0.52 – 0.83 0.652 0.12 -0.54 – 0.78 0.728 0.08 -0.56 – 0.72 0.806 hi education 0.79 -0.05 – 1.64 0.066 0.91 0.09 – 1.74 0.030 0.52 -0.28 – 1.32 0.203 Hours of Care 0.02 0.01 – 0.02 < 0.001 Dependency slight dependency 1.18 0.16 – 2.20 0.024 moderate dependency 2.53 1.53 – 3.52 < 0.001 severe dependency 4.32 3.31 – 5.33 < 0.001 Service Usage 0.21 0.01 – 0.41 0.042 Observations 832 832 833 R2 / adj. R2 0.026 / 0.021 0.081 / 0.075 0.154 / 0.147

Tagged: data visualization, R, rstats, sjPlot

To leave a comment for the author, please follow the link and comment on their blog: Strenge Jacke! » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)