I am pleased to announce the olsrr package, a set of tools for improved
output from linear regression models, designed keeping in mind
beginner/intermediate R users. The package includes:
 comprehensive regression output
 variable selection procedures
 heteroskedasticiy, collinearity diagnostics and measures of influence
 various plots and underlying data
If you know how to build models using lm()
, you will find olsrr very
useful. Most of the functions use an object of class lm
as input. So you
just need to build a model using lm()
and then pass it onto the functions in
olsrr. Once you have picked up enough knowledge of R, you can move on to
more intuitive approach offered by tidymodels etc. as they offer more
flexibility, which olsrr does not.
Installation
# Install release version from CRAN
install.packages("olsrr")
# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/olsrr")
Shiny App
olsrr includes a shiny app which can be launched using
ols_launch_app()
or try the live version here.
Read on to learn more about the features of olsrr, or see the
olsrr website for
detailed documentation on using the package.
Regression Output
model < lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_regress(model)
## Model Summary
## 
## R 0.914 RMSE 2.622
## RSquared 0.835 Coef. Var 13.051
## Adj. RSquared 0.811 MSE 6.875
## Pred RSquared 0.771 MAE 1.858
## 
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## 
## Sum of
## Squares DF Mean Square F Sig.
## 
## Regression 940.412 4 235.103 34.195 0.0000
## Residual 185.635 27 6.875
## Total 1126.047 31
## 
##
## Parameter Estimates
## 
## model Beta Std. Error Std. Beta t Sig lower upper
## 
## (Intercept) 27.330 8.639 3.164 0.004 9.604 45.055
## disp 0.003 0.011 0.055 0.248 0.806 0.019 0.025
## hp 0.019 0.016 0.212 1.196 0.242 0.051 0.013
## wt 4.609 1.266 0.748 3.641 0.001 7.206 2.012
## qsec 0.544 0.466 0.161 1.166 0.254 0.413 1.501
## 
In the presence of interaction terms in the model, the predictors are scaled
and centered before computing the standardized betas. ols_regress()
will
detect interaction terms automatically but in case you have created a new
variable instead of using the inline function, you can indicate the presence
of interaction terms by setting iterm
to TRUE
.
Residual Diagnostics
olsrr offers tools for detecting violation of standard regression assumptions:
 Residual QQ plot
 Residual normality test
 Residual vs Fitted plot
 Residual histogram
ols_plot_resid_qq(model)
See Residual Diagnostics
for more details.
Heteroskedasticity
olsrr provides the following 4 tests for detecting heteroscedasticity:
 Bartlett Test
 Breusch Pagan Test
 Score Test
 F Test
ols_test_breusch_pagan(model)
##
## Breusch Pagan Test for Heteroskedasticity
## 
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## 
## Response : mpg
## Variables: fitted values of mpg
##
## Test Summary
## 
## DF = 1
## Chi2 = 0.5884673
## Prob > Chi2 = 0.4430124
See Heteroskedasticity
for more details.
Collinearity Diagnostics
VIF, Tolerance and condition indices to detect collinearity and plots for
assessing mode fit and contributions of variables.
ols_coll_diag(model)
## Tolerance and Variance Inflation Factor
## 
## # A tibble: 4 x 3
## Variables Tolerance VIF
##
## 1 disp 0.125 7.99
## 2 hp 0.194 5.17
## 3 wt 0.145 6.92
## 4 qsec 0.319 3.13
##
##
## Eigenvalue and Condition Index
## 
## Eigenvalue Condition Index intercept disp hp
## 1 4.721487187 1.000000 0.000123237 0.001132468 0.001413094
## 2 0.216562203 4.669260 0.002617424 0.036811051 0.027751289
## 3 0.050416837 9.677242 0.001656551 0.120881424 0.392366164
## 4 0.010104757 21.616057 0.025805998 0.777260487 0.059594623
## 5 0.001429017 57.480524 0.969796790 0.063914571 0.518874831
## wt qsec
## 1 0.0005253393 0.0001277169
## 2 0.0002096014 0.0046789491
## 3 0.0377028008 0.0001952599
## 4 0.7017528428 0.0024577686
## 5 0.2598094157 0.9925403056
See Collinearity Diagnostics for more details.
Measures of Influence
olsrr offers the following tools to detect influential observations:
 Cook’s D Bar Plot
 Cook’s D Chart
 DFBETAs Panel
 DFFITs Plot
 Studentized Residual Plot
 Standardized Residual Chart
 Studentized Residuals vs Leverage Plot
 Deleted Studentized Residual vs Fitted Values Plot
 Hadi Plot
 Potential Residual Plot
ols_plot_resid_lev(model)
See Measures of Influence for more details.
Variable Selection
Different variable selection procedures such as all possible regression, best
subset regression, stepwise regression, stepwise forward regression and
stepwise backward regression.
model < lm(y ~ ., data = stepdata)
ols_step_both_aic(model)
## Stepwise Selection Method
## 
##
## Candidate Terms:
##
## 1 . x1
## 2 . x2
## 3 . x3
## 4 . x4
## 5 . x5
## 6 . x6
##
##
## Variables Entered/Removed:
##
##  x6 added
##  x1 added
##  x3 added
##  x2 added
##  x6 removed
##  x4 added
##
## No more variables to be added or removed.
##
##
## Stepwise Summary
## 
## Variable Method AIC RSS Sum Sq RSq Adj. RSq
## 
## x6 addition 33473.297 6241.497 13986.736 0.69145 0.69143
## x1 addition 32931.758 6074.156 14154.076 0.69972 0.69969
## x3 addition 31912.722 5771.842 14456.391 0.71466 0.71462
## x2 addition 29304.296 5065.587 15162.646 0.74958 0.74953
## x6 removal 29302.317 5065.592 15162.641 0.74958 0.74954
## x4 addition 29300.814 5064.705 15163.528 0.74962 0.74957
## 
See Variable Selection for more details.
Learning More
The olsrr website includes
comprehensive documentation on using the package, including the following
articles that cover various aspects of using olsrr:

Variable Selection – Different variable selection procedures such as all possible regression, best
subset regression, stepwise regression, stepwise forward regression and
stepwise backward regression. 
Residual Diagnostics – Includes plots to examine residuals to validate OLS assumptions.

Heteroskedasticity – Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test.

Collinearity Diagnostics – VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables.

Measures of Influence – Includes 10 different plots to detect and identify influential observations.
Feedback
olsrr has been on CRAN for more than an year while we were fixing bugs and
making the API stable. All feedback is welcome. Issues (bugs and feature
requests) can be posted to github tracker.
For help with code or other related questions, feel free to reach me [email protected].
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...