Using the update function during variable selection

May 9, 2010

(This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers)

When fitting statistical models to data where there are multiple variables we are often interested in adding or removing terms from our model and in cases where there are a large number of terms it can be quicker to use the update function to start with a formula from a model that we have already fitted and to specify the terms that we want to add or remove as opposed to a copy and paste and manually editing the formula to our needs.

Consider the oil-bearing rocks data set that is available with the R software which is used extensively as an example by many authors. One model that can be used as a starting point is a linear model with additive terms for the three variables:

> rock.mod1 = lm(log(perm) ~ area + peri + shape, data = rock)
> summary(rock.mod1)
lm(formula = log(perm) ~ area + peri + shape, data = rock)
    Min      1Q  Median      3Q     Max 
-1.8092 -0.5413  0.1735  0.6493  1.4788 
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.333e+00  5.487e-01   9.720 1.59e-12 ***
area         4.850e-04  8.657e-05   5.602 1.29e-06 ***
peri        -1.527e-03  1.770e-04  -8.623 5.24e-11 ***
shape        1.757e+00  1.756e+00   1.000    0.323    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 0.8521 on 44 degrees of freedom
Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7311 
F-statistic:  43.6 on 3 and 44 DF,  p-value: 3.094e-13

Given this model, saved as an object rock.mod1, we might be interested in considering adding an interaction term between the area and perimeter measurements. The update function has various options and the simplest case is to specfiy a model object and a new formula. The new formula can use the period as short hand for keep everything on either the left or right hand side of the formula and the plus or minus sign used to add or remove terms to the model. In the case of adding an interaction term our call would be:

> rock.mod2 = update(rock.mod1, . ~ . + area:peri)

The first function argument is the name of the model we fitted previously and the periods indicate that we want to use the same response variable and to start with the whole formula but add an interaction term between area and perimeter – the colon is used to specify an interaction term by itself. This fitted model is now:

> summary(rock.mod2)
lm(formula = log(perm) ~ area + peri + shape + area:peri, data = rock)
    Min      1Q  Median      3Q     Max 
-1.7255 -0.4760  0.1256  0.6539  1.4269 
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.567e+00  8.533e-01   7.696 1.28e-09 ***
area         3.769e-04  1.025e-04   3.678  0.00065 ***
peri        -2.141e-03  3.734e-04  -5.733 8.94e-07 ***
shape        4.022e-01  1.859e+00   0.216  0.82974    
area:peri    6.641e-08  3.583e-08   1.854  0.07065 .  
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 0.8295 on 43 degrees of freedom
Multiple R-squared: 0.7669,     Adjusted R-squared: 0.7452 
F-statistic: 35.37 on 4 and 43 DF,  p-value: 4.404e-13

The update function can also be used to change other aspects of the linear model or in fact many other types of model are set up to repsond sensibly to using this function.

Related posts:

To leave a comment for the author, please follow the link and comment on their blog: Software for Exploratory Data Analysis and Statistical Modelling. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)