Beautiful tables for linear model summaries #rstats
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Beautiful HTML tables of linear models
In this blog post I’d like to show some (old and) new features of the sjt.lm
function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN.
There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors. There are examples below that demonstrate these features.
The sjt.lm
function prints results and summaries of linear models as HTML-table. These tables can be viewed in the RStudio Viewer pane, web browser or easily exported to office applications. See also my former posts on the table printing functions of my package here and here.
Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. The original online-manual for this function can be found here.
All following tables can be reproduced with the sjPlot package and the sample data set from this package.
Linear model summaries as HTML table
The sjt.lm
function prints summaries of linear models (fitted with the lm
function) as nicely formatted html-tables.
Before starting, sample data is loaded and sample models are fitted:
# sample data data(efc) # set variable labels efc <- set_var_labels(efc, get_var_labels(efc)) # fit first model fit1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data=efc) # fit second model fit2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + c172code, data=efc) # Note that both models share the same predictors and only differ # in their dependent variable. See examples of stepwise models # later...
The simplest way of producing the table output is by passing the fitted models as parameter. By default, estimates (B), confidence intervals (CI) and p-values (p) are reported. The models are named Model 1 and Model 2.
sjt.lm(fit1, fit2)
Model 1 | Model 2 | |||||
B | CI | p | B | CI | p | |
(Intercept) | 90.06 | 77.95 – 102.18 | < 0.001 | 8.46 | 6.67 – 10.24 | < 0.001 |
carer’ age | -0.22 | -0.36 – -0.08 | 0.002 | 0.01 | -0.01 – 0.03 | 0.206 |
average number of hours of care for the elder in a week | -0.28 | -0.31 – -0.24 | < 0.001 | 0.02 | 0.01 – 0.02 | < 0.001 |
carer’s gender | -0.26 | -4.36 – 3.83 | 0.900 | 0.57 | -0.03 – 1.17 | 0.061 |
carer’s level of education: recoding of variable c172edu1 | -0.76 | -3.55 – 2.02 | 0.592 | 0.44 | 0.03 – 0.86 | 0.034 |
Observations | 821 | 832 | ||||
R2 / adj. R2 | 0.270 / 0.266 | 0.079 / 0.075 |
Custom labels
You can specify the ‘model’ label via labelDependentVariables
parameter:
sjt.lm(fit1, fit2, labelDependentVariables = c("Barthel-Index", "Negative Impact"))
Barthel-Index | Negative Impact | |||||
B | CI | p | B | CI | p | |
(Intercept) | 90.06 | 77.95 – 102.18 | < 0.001 | 8.46 | 6.67 – 10.24 | < 0.001 |
carer’ age | -0.22 | -0.36 – -0.08 | 0.002 | 0.01 | -0.01 – 0.03 | 0.206 |
average number of hours of care for the elder in a week | -0.28 | -0.31 – -0.24 | < 0.001 | 0.02 | 0.01 – 0.02 | < 0.001 |
carer’s gender | -0.26 | -4.36 – 3.83 | 0.900 | 0.57 | -0.03 – 1.17 | 0.061 |
carer’s level of education: recoding of variable c172edu1 | -0.76 | -3.55 – 2.02 | 0.592 | 0.44 | 0.03 – 0.86 | 0.034 |
Observations | 821 | 832 | ||||
R2 / adj. R2 | 0.270 / 0.266 | 0.079 / 0.075 |
More custom labels
Here is an example how to change the other labels. Note that showHeaderStrings
makes the two labels on top and top left corner appear in the table.
sjt.lm(fit1, fit2, showHeaderStrings = TRUE, stringB = "Estimate", stringCI = "Conf. Int.", stringP = "p-value", stringDependentVariables = "Response", stringPredictors = "Coefficients", stringIntercept = "Konstante", labelDependentVariables = c("Barthel-Index", "Negative Impact"))
Coefficients | Response | |||||
Barthel-Index | Negative Impact | |||||
Estimate | Conf. Int. | p-value | Estimate | Conf. Int. | p-value | |
Konstante | 90.06 | 77.95 – 102.18 | < 0.001 | 8.46 | 6.67 – 10.24 | < 0.001 |
carer’ age | -0.22 | -0.36 – -0.08 | 0.002 | 0.01 | -0.01 – 0.03 | 0.206 |
average number of hours of care for the elder in a week | -0.28 | -0.31 – -0.24 | < 0.001 | 0.02 | 0.01 – 0.02 | < 0.001 |
carer’s gender | -0.26 | -4.36 – 3.83 | 0.900 | 0.57 | -0.03 – 1.17 | 0.061 |
carer’s level of education: recoding of variable c172edu1 | -0.76 | -3.55 – 2.02 | 0.592 | 0.44 | 0.03 – 0.86 | 0.034 |
Observations | 821 | 832 | ||||
R2 / adj. R2 | 0.270 / 0.266 | 0.079 / 0.075 |
Changing summary style and content
You can change the table style with specific parameters, e.g. to include CI into the same table cell as the estimates, print asterisks instead of numeric p-values etc.
sjt.lm(fit1, fit2, separateConfColumn = FALSE, # ci in same cell as estimates showStdBeta = TRUE, # also show standardized beta values pvaluesAsNumbers = FALSE) # "*" instead of numeric values
Model 1 | Model 2 | |||
B (CI) | std. Beta (CI) | B (CI) | std. Beta (CI) | |
(Intercept) |
90.06 (77.95 – 102.18) *** |
8.46 (6.67 – 10.24) *** |
||
carer’ age |
-0.22 (-0.36 – -0.08) ** |
-0.10 (-0.16 – -0.04) |
0.01 (-0.01 – 0.03) |
0.05 (-0.03 – 0.12) |
average number of hours of care for the elder in a week |
-0.28 (-0.31 – -0.24) *** |
-0.48 (-0.54 – -0.41) |
0.02 (0.01 – 0.02) *** |
0.25 (0.18 – 0.32) |
carer’s gender |
-0.26 (-4.36 – 3.83) |
-0.00 (-0.06 – 0.06) |
0.57 (-0.03 – 1.17) |
0.06 (-0.00 – 0.13) |
carer’s level of education: recoding of variable c172edu1 |
-0.76 (-3.55 – 2.02) |
-0.02 (-0.08 – 0.04) |
0.44 (0.03 – 0.86) * |
0.07 (0.01 – 0.14) |
Observations | 821 | 832 | ||
R2 / adj. R2 | 0.270 / 0.266 | 0.079 / 0.075 | ||
Notes | * p<0.05 ** p<0.01 *** p<0.001 |
Custom variable labels
In the above example, the original variable labels are long and not much pretty. You can change variable labels either with set_var_labels
(see this page for more detaila), which will affect all future plots and tables, or pass own labels via labelPredictors
.
sjt.lm(fit1, fit2, labelPredictors = c("Carer's Age", "Hours of Care", "Carer's Sex", "Educational Status"))
Model 1 | Model 2 | |||||
B | CI | p | B | CI | p | |
(Intercept) | 90.06 | 77.95 – 102.18 | < 0.001 | 8.46 | 6.67 – 10.24 | < 0.001 |
Carer’s Age | -0.22 | -0.36 – -0.08 | 0.002 | 0.01 | -0.01 – 0.03 | 0.206 |
Hours of Care | -0.28 | -0.31 – -0.24 | < 0.001 | 0.02 | 0.01 – 0.02 | < 0.001 |
Carer’s Sex | -0.26 | -4.36 – 3.83 | 0.900 | 0.57 | -0.03 – 1.17 | 0.061 |
Educational Status | -0.76 | -3.55 – 2.02 | 0.592 | 0.44 | 0.03 – 0.86 | 0.034 |
Observations | 821 | 832 | ||||
R2 / adj. R2 | 0.270 / 0.266 | 0.079 / 0.075 |
Compare models with different predictors
In some cases, for instance stepwise regressions, you have different predictors on the same response. The proper grouping of predictors, resp. rows, is done automatically.
First, let’s fit some example models.
# fit first model fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc) # fit second model fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc) # fit second model fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)
Note that printing tables with fitted models, which have different predictors do not automatically detect variable labels (maybe this will be implemented in a future package version).
sjt.lm(fit1, fit2, fit3, separateConfColumn = FALSE, showAIC = TRUE, showFStat = TRUE)
Model 1 | Model 2 | Model 3 | ||||
B (CI) | p | B (CI) | p | B (CI) | p | |
(Intercept) |
7.82 (6.00 – 9.65) |
< 0.001 |
8.46 (6.67 – 10.24) |
< 0.001 |
6.23 (4.76 – 7.69) |
< 0.001 |
c160age |
0.04 (0.02 – 0.06) |
< 0.001 |
0.01 (-0.01 – 0.03) |
0.206 |
0.01 (-0.01 – 0.03) |
0.271 |
c172code |
0.39 (-0.03 – 0.81) |
0.071 |
0.44 (0.03 – 0.86) |
0.034 |
0.24 (-0.15 – 0.64) |
0.230 |
c161sex |
0.69 (0.07 – 1.31) |
0.028 |
0.57 (-0.03 – 1.17) |
0.061 | ||
c12hour |
0.02 (0.01 – 0.02) |
< 0.001 | ||||
e42dep |
1.50 (1.23 – 1.77) |
< 0.001 | ||||
tot_sc_e |
0.21 (0.01 – 0.41) |
0.038 | ||||
Observations | 832 | 832 | 833 | |||
R2 / adj. R2 | 0.025 / 0.022 | 0.079 / 0.075 | 0.153 / 0.148 | |||
F-statistics | 7.107*** | 17.730*** | 37.250*** | |||
AIC | 4611.921 | 4566.622 | 4502.333 |
Automatic grouping of categorical predictors
In case you have categorical variables with more than two factor levels, the sjt.lm
function automatically groups the category levels to give a better overview of predictors in the table.
By default, automatic grouping is activated. To disable this feature, use group.pred = FALSE
as parameter.
To demonstrate this feature, we first convert two predictors to factors (what they actually are, indeed). To do this, we use the to_fac
function, which converts numerical variables into factors, however, does not remove the variable and value label attributes.
# make education categorical efc$c172code <- to_fac(efc$c172code) # make dependency categorical efc$e42dep <- to_fac(efc$e42dep) # fit first model again (with c172code as factor) fit1 <- lm(barthtot ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc) # fit second model again (with c172code as factor) fit2 <- lm(neg_c_7 ~ c160age + c12hour + c172code + c161sex + e42dep, data=efc)
Now we can print the table.
sjt.lm(fit1, fit2)
Model 1 | Model 2 | |||||
B | CI | p | B | CI | p | |
(Intercept) | 97.17 | 88.37 – 105.97 | < 0.001 | 7.76 | 5.97 – 9.55 | < 0.001 |
carer’ age | -0.06 | -0.16 – 0.03 | 0.203 | 0.00 | -0.02 – 0.02 | 0.683 |
average number of hours of care for the elder in a week | -0.07 | -0.10 – -0.04 | < 0.001 | 0.01 | 0.00 – 0.01 | 0.015 |
carer’s level of education: recoding of variable c172edu1 | ||||||
intermediate level of education | 1.50 | -1.60 – 4.60 | 0.343 | 0.13 | -0.50 – 0.76 | 0.689 |
high level of education | 0.66 | -3.20 – 4.52 | 0.738 | 0.72 | -0.07 – 1.51 | 0.074 |
carer’s gender | 0.09 | -2.74 – 2.93 | 0.949 | 0.56 | -0.02 – 1.13 | 0.058 |
how dependent is the elder? – subjective perception of carer | ||||||
slightly dependent | -7.85 | -12.86 – -2.83 | 0.002 | 1.11 | 0.09 – 2.13 | 0.033 |
moderately dependent | -19.49 | -24.42 – -14.57 | < 0.001 | 2.37 | 1.37 – 3.37 | < 0.001 |
severely dependent | -56.87 | -62.12 – -51.63 | < 0.001 | 3.92 | 2.86 – 4.99 | < 0.001 |
Observations | 821 | 832 | ||||
R2 / adj. R2 | 0.653 / 0.650 | 0.160 / 0.152 |
Removing estimates from the output
With remove.estmates
, specific estimates can be removed from the table output. This may make sense in case you have stepwise regression models and only want to compare the varying predictors but not the controls. remove.estmates
either accepts the row indices of the rows of the table output that should be removed, or the coefficient’s names.
When using numeric indices, the estimates’ index number relates to the same order as coef(fit)
. Note that currently the intercept cannot be removed from the model output!
data(efc) # attach variable labels to each variable of the data # frame - useful for automatic label detection efc <- set_var_labels(efc, get_var_labels(efc)) # make education categorical efc$c172code <- to_fac(efc$c172code) # make education categorical efc$e42dep <- to_fac(efc$e42dep) # make prettier variable labels efc$c172code <- set_var_labels(efc$c172code, "Education") efc$e42dep <- set_var_labels(efc$e42dep, "Dependency") # fit first model fit1 <- lm(neg_c_7 ~ c160age + c172code + c161sex, data=efc) # fit second model fit2 <- lm(neg_c_7 ~ c160age + c172code + c161sex + c12hour, data=efc) # fit third model fit3 <- lm(neg_c_7 ~ c160age + c172code + e42dep + tot_sc_e, data=efc)
Example 1: Complete table output
Here you have the complete table output. This helps you identify the row index numbers. Especially when you have multiple models with different predictors, the estimate’s position in the last model may differ from this estimate’s position in the table output.
sjt.lm(fit1, fit2, fit3)
Model 1 | Model 2 | Model 3 | |||||||
B | CI | p | B | CI | p | B | CI | p | |
(Intercept) | 8.40 | 6.72 – 10.08 | < 0.001 | 9.18 | 7.53 – 10.83 | < 0.001 | 8.48 | 6.99 – 9.97 | < 0.001 |
c160age | 0.04 | 0.02 – 0.06 | < 0.001 | 0.01 | -0.01 – 0.03 | 0.306 | 0.01 | -0.01 – 0.03 | 0.384 |
Education | |||||||||
c172code2 | 0.16 | -0.52 – 0.83 | 0.652 | 0.12 | -0.54 – 0.78 | 0.728 | 0.08 | -0.56 – 0.72 | 0.806 |
c172code3 | 0.79 | -0.05 – 1.64 | 0.066 | 0.91 | 0.09 – 1.74 | 0.030 | 0.52 | -0.28 – 1.32 | 0.203 |
c161sex | 0.70 | 0.09 – 1.32 | 0.025 | 0.59 | -0.01 – 1.19 | 0.053 | |||
c12hour | 0.02 | 0.01 – 0.02 | < 0.001 | ||||||
Dependency | |||||||||
e42dep2 | 1.18 | 0.16 – 2.20 | 0.024 | ||||||
e42dep3 | 2.53 | 1.53 – 3.52 | < 0.001 | ||||||
e42dep4 | 4.32 | 3.31 – 5.33 | < 0.001 | ||||||
tot_sc_e | 0.21 | 0.01 – 0.41 | 0.042 | ||||||
Observations | 832 | 832 | 833 | ||||||
R2 / adj. R2 | 0.026 / 0.021 | 0.081 / 0.075 | 0.154 / 0.147 |
Example 2: Remove first coefficient (after intercept)
sjt.lm(fit1, fit2, fit3, remove.estimates = 2)
Model 1 | Model 2 | Model 3 | |||||||
B | CI | p | B | CI | p | B | CI | p | |
(Intercept) | 8.40 | 6.72 – 10.08 | < 0.001 | 9.18 | 7.53 – 10.83 | < 0.001 | 8.48 | 6.99 – 9.97 | < 0.001 |
Education | |||||||||
c172code2 | 0.16 | -0.52 – 0.83 | 0.652 | 0.12 | -0.54 – 0.78 | 0.728 | 0.08 | -0.56 – 0.72 | 0.806 |
c172code3 | 0.79 | -0.05 – 1.64 | 0.066 | 0.91 | 0.09 – 1.74 | 0.030 | 0.52 | -0.28 – 1.32 | 0.203 |
c161sex | 0.70 | 0.09 – 1.32 | 0.025 | 0.59 | -0.01 – 1.19 | 0.053 | |||
c12hour | 0.02 | 0.01 – 0.02 | < 0.001 | ||||||
Dependency | |||||||||
e42dep2 | 1.18 | 0.16 – 2.20 | 0.024 | ||||||
e42dep3 | 2.53 | 1.53 – 3.52 | < 0.001 | ||||||
e42dep4 | 4.32 | 3.31 – 5.33 | < 0.001 | ||||||
tot_sc_e | 0.21 | 0.01 – 0.41 | 0.042 | ||||||
Observations | 832 | 832 | 833 | ||||||
R2 / adj. R2 | 0.026 / 0.021 | 0.081 / 0.075 | 0.154 / 0.147 |
Example 3: Remove age and sex
sjt.lm(fit1, fit2, fit3, remove.estimates = c("c160age", "c161sex"))
Model 1 | Model 2 | Model 3 | |||||||
B | CI | p | B | CI | p | B | CI | p | |
(Intercept) | 8.40 | 6.72 – 10.08 | < 0.001 | 9.18 | 7.53 – 10.83 | < 0.001 | 8.48 | 6.99 – 9.97 | < 0.001 |
Education | |||||||||
c172code2 | 0.16 | -0.52 – 0.83 | 0.652 | 0.12 | -0.54 – 0.78 | 0.728 | 0.08 | -0.56 – 0.72 | 0.806 |
c172code3 | 0.79 | -0.05 – 1.64 | 0.066 | 0.91 | 0.09 – 1.74 | 0.030 | 0.52 | -0.28 – 1.32 | 0.203 |
c12hour | 0.02 | 0.01 – 0.02 | < 0.001 | ||||||
Dependency | |||||||||
e42dep2 | 1.18 | 0.16 – 2.20 | 0.024 | ||||||
e42dep3 | 2.53 | 1.53 – 3.52 | < 0.001 | ||||||
e42dep4 | 4.32 | 3.31 – 5.33 | < 0.001 | ||||||
tot_sc_e | 0.21 | 0.01 – 0.41 | 0.042 | ||||||
Observations | 832 | 832 | 833 | ||||||
R2 / adj. R2 | 0.026 / 0.021 | 0.081 / 0.075 | 0.154 / 0.147 |
Example 4: Remove many esimates
sjt.lm(fit1, fit2, fit3, remove.estimates = c(2,5,6,10))
Model 1 | Model 2 | Model 3 | |||||||
B | CI | p | B | CI | p | B | CI | p | |
(Intercept) | 8.40 | 6.72 – 10.08 | < 0.001 | 9.18 | 7.53 – 10.83 | < 0.001 | 8.48 | 6.99 – 9.97 | < 0.001 |
Education | |||||||||
c172code2 | 0.16 | -0.52 – 0.83 | 0.652 | 0.12 | -0.54 – 0.78 | 0.728 | 0.08 | -0.56 – 0.72 | 0.806 |
c172code3 | 0.79 | -0.05 – 1.64 | 0.066 | 0.91 | 0.09 – 1.74 | 0.030 | 0.52 | -0.28 – 1.32 | 0.203 |
Dependency | |||||||||
e42dep2 | 1.18 | 0.16 – 2.20 | 0.024 | ||||||
e42dep3 | 2.53 | 1.53 – 3.52 | < 0.001 | ||||||
e42dep4 | 4.32 | 3.31 – 5.33 | < 0.001 | ||||||
Observations | 832 | 832 | 833 | ||||||
R2 / adj. R2 | 0.026 / 0.021 | 0.081 / 0.075 | 0.154 / 0.147 |
Example 5: Custom predictor labels
In most cases you need to define your own labels when removing estimates, especially when you have grouped categorical predictors, because automatic label detection in quite tricky in such situations. If you provide own labels, please note that grouped predictors’ headings (the variable name of the grouped, categorical variable) are still automatically set by the sjt.lm
function (variable labels are used, so use set_var_labels
for those categorical predictors). All data rows in the table, i.e. for each coefficient appearing in the model, you need to specify a label string.
In the next example, we have seven table rows with data (excluding intercept): mid and hi education (categories of the variable Education), Hours of Care, slight, moderate and severe dependency (categories of the variable Dependency) and Service Usage. These ‘rows’ need to be labelled.
sjt.lm(fit1, fit2, fit3, labelPredictors = c("mid education", "hi education", "Hours of Care", "slight dependency", "moderate dependency", "severe dependency", "Service Usage"), remove.estimates = c("c160age", "c161sex"))
Model 1 | Model 2 | Model 3 | |||||||
B | CI | p | B | CI | p | B | CI | p | |
(Intercept) | 8.40 | 6.72 – 10.08 | < 0.001 | 9.18 | 7.53 – 10.83 | < 0.001 | 8.48 | 6.99 – 9.97 | < 0.001 |
Education | |||||||||
mid education | 0.16 | -0.52 – 0.83 | 0.652 | 0.12 | -0.54 – 0.78 | 0.728 | 0.08 | -0.56 – 0.72 | 0.806 |
hi education | 0.79 | -0.05 – 1.64 | 0.066 | 0.91 | 0.09 – 1.74 | 0.030 | 0.52 | -0.28 – 1.32 | 0.203 |
Hours of Care | 0.02 | 0.01 – 0.02 | < 0.001 | ||||||
Dependency | |||||||||
slight dependency | 1.18 | 0.16 – 2.20 | 0.024 | ||||||
moderate dependency | 2.53 | 1.53 – 3.52 | < 0.001 | ||||||
severe dependency | 4.32 | 3.31 – 5.33 | < 0.001 | ||||||
Service Usage | 0.21 | 0.01 – 0.41 | 0.042 | ||||||
Observations | 832 | 832 | 833 | ||||||
R2 / adj. R2 | 0.026 / 0.021 | 0.081 / 0.075 | 0.154 / 0.147 |
Tagged: data visualization, R, rstats, sjPlot
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.