Updating Call Arguments

November 9, 2016
By

(This article was first published on Peter E. DeWitt, and kindly contributed to R-bloggers)

The stats::update function is one of my favorite tools in R. Using this
function saves a lot of time and effort when needing to modify an object.
However, this function has its limits. The following is an example of how to
extend the use of stats::update.

This post is an extension of the lightening talk I gave to the Denver R Users
Group
in March of 2016. You can get the
slides from that talk from my
github page.

Basics of the stats::update function

The
documentation
for the stats::update function is a good starting place. Of course, examples
are even better. We’ll use the diamonds data set from within the ggplot2 package for the examples. By
default, the cut, color, and clarity elements of the diamonds data set
are ordered factors. I’m going to remove the order and level these variables as
just factors.

data("diamonds", package = "ggplot2")

diamonds$cut     <- factor(diamonds$cut, ordered = FALSE)
diamonds$color   <- factor(diamonds$color, ordered = FALSE)
diamonds$clarity <- factor(diamonds$clarity, ordered = FALSE)

dplyr::glimpse(diamonds) 
## Observations: 53,940
## Variables: 10
## $ carat    0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, ...
## $ cut      Ideal, Premium, Good, Premium, Good, Very Good, Very ...
## $ color    E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J,...
## $ clarity  SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, S...
## $ depth    61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, ...
## $ table    55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54...
## $ price    326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339,...
## $ x        3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, ...
## $ y        3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, ...
## $ z        2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, ...

A simple regression model, a lm object, will be used for our examples. Let’s
regress the price of diamonds as a function of carat, cut, color, and
clarity.

original_fit <- lm(price ~ carat + cut + color + clarity, data = diamonds)

Updating a call

The stats::update function works by calling getCall which calls
getElement. While that is nice to know, the take away message is that if you
have an object with an element called call, then stat::update gives you
assess to modify the call.

Let’s update the regression formula. Say instead of the four predictors we
started with we want to regress price on the depth and table of the
diamonds

updated_fit_1 <- update(original_fit, formula = . ~ depth + table)

In the code chunk above, the update call took the original_fit object as its
first argument and then we provided the formula = . ~ depth + table argent to
tell the interpreter that we want to modify the formula. The . is shorthand
for “the current,” i.e., use the current left hand side of the formula, and
replace the right hand side with the depth + table.

Using getCall we can see that the two objects have different calls, and the
calls we would expect them to have.

getCall(original_fit)
## lm(formula = price ~ carat + cut + color + clarity, data = diamonds)
getCall(updated_fit_1)
## lm(formula = price ~ depth + table, data = diamonds)

One more sanity check: the regression coefficients for these two models are, at
least in name, as expected:

coef(original_fit)
##  (Intercept)        carat      cutGood cutVery Good   cutPremium 
##   -7362.8022    8886.1289     655.7674     848.7169     869.3959 
##     cutIdeal       colorE       colorF       colorG       colorH 
##     998.2544    -211.6825    -303.3100    -506.1995    -978.6977 
##       colorI       colorJ   claritySI2   claritySI1   clarityVS2 
##   -1440.3019   -2325.2224    2625.9500    3573.6880    4217.8291 
##   clarityVS1  clarityVVS2  clarityVVS1    clarityIF 
##    4534.8790    4967.1994    5072.0276    5419.6468
coef(updated_fit_1)
##  (Intercept)        depth        table 
## -15084.96675     82.26159    242.58346

We can update more than just the formula in the call. Perhaps you want to fit
the same regression model on a subset of the data, for example, use the same
regression formula as in original_fit but subset the data to only diamonds
with a carat weight under 2.

updated_fit_2 <- update(original_fit, data = dplyr::filter(diamonds, carat < 2))
getCall(updated_fit_2)
## lm(formula = price ~ carat + cut + color + clarity, data = dplyr::filter(diamonds, 
##     carat < 2))

coef(original_fit)
##  (Intercept)        carat      cutGood cutVery Good   cutPremium 
##   -7362.8022    8886.1289     655.7674     848.7169     869.3959 
##     cutIdeal       colorE       colorF       colorG       colorH 
##     998.2544    -211.6825    -303.3100    -506.1995    -978.6977 
##       colorI       colorJ   claritySI2   claritySI1   clarityVS2 
##   -1440.3019   -2325.2224    2625.9500    3573.6880    4217.8291 
##   clarityVS1  clarityVVS2  clarityVVS1    clarityIF 
##    4534.8790    4967.1994    5072.0276    5419.6468
coef(updated_fit_2)
##  (Intercept)        carat      cutGood cutVery Good   cutPremium 
##   -6162.1964    8724.8278     527.6717     712.9909     744.0746 
##     cutIdeal       colorE       colorF       colorG       colorH 
##     861.5481    -223.8747    -313.4734    -508.2788    -996.4275 
##       colorI       colorJ   claritySI2   claritySI1   clarityVS2 
##   -1492.3535   -2210.0376    1635.1460    2580.1405    3240.4271 
##   clarityVS1  clarityVVS2  clarityVVS1    clarityIF 
##    3568.3900    3999.7614    4094.5453    4437.4743

A second way to achieve the same results as seen with updated_fit_2 is to use
the update function to add to the call:

updated_fit_3 <- update(original_fit, subset = carat < 2)
getCall(updated_fit_3)
## lm(formula = price ~ carat + cut + color + clarity, data = diamonds, 
##     subset = carat < 2)

coef(original_fit)
##  (Intercept)        carat      cutGood cutVery Good   cutPremium 
##   -7362.8022    8886.1289     655.7674     848.7169     869.3959 
##     cutIdeal       colorE       colorF       colorG       colorH 
##     998.2544    -211.6825    -303.3100    -506.1995    -978.6977 
##       colorI       colorJ   claritySI2   claritySI1   clarityVS2 
##   -1440.3019   -2325.2224    2625.9500    3573.6880    4217.8291 
##   clarityVS1  clarityVVS2  clarityVVS1    clarityIF 
##    4534.8790    4967.1994    5072.0276    5419.6468
all.equal(coef(updated_fit_3), coef(updated_fit_2))
## [1] TRUE

Lastly, before we move onto more interesting examples, you can always update
more than one part of a call with one update call

updated_fit_4 <- update(original_fit, 
                        formula = . ~ depth, 
                        data    = dplyr::filter(diamonds, carat < 2), 
                        subset  = cut %in% c("Good", "Very Good"))
getCall(updated_fit_4)
## lm(formula = price ~ depth, data = dplyr::filter(diamonds, carat < 
##     2), subset = cut %in% c("Good", "Very Good"))
summary(updated_fit_4)
## 
## Call:
## lm(formula = price ~ depth, data = dplyr::filter(diamonds, carat < 
##     2), subset = cut %in% c("Good", "Very Good"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3316.1 -2596.6  -954.8  1364.0 15272.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5027.60     934.21   5.382 7.48e-08 ***
## depth         -24.33      15.07  -1.615    0.106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3187 on 16321 degrees of freedom
## Multiple R-squared:  0.0001598,	Adjusted R-squared:  9.852e-05 
## F-statistic: 2.608 on 1 and 16321 DF,  p-value: 0.1063

Modifying a Variable on the Right Hand Side

Let’s work with the original_fit object again and modify the right hand side
of the regression formula to use a centered and scaled version of carat. (You
can quickly center and scale a variable by calling scale. The default
behavior is to center and scale by subtracting off the mean and dividing by the
standard deviation.) If
you attempt to update the formula as . ~ . + scale(carat) where the . is a
reuse operator, the result will be nonsensical.

scaled_fit_1 <- update(original_fit, formula = . ~ . + scale(carat))
getCall(scaled_fit_1)
## lm(formula = price ~ carat + cut + color + clarity + scale(carat), 
##     data = diamonds)
coef(scaled_fit_1)
##  (Intercept)        carat      cutGood cutVery Good   cutPremium 
##   -7362.8022    8886.1289     655.7674     848.7169     869.3959 
##     cutIdeal       colorE       colorF       colorG       colorH 
##     998.2544    -211.6825    -303.3100    -506.1995    -978.6977 
##       colorI       colorJ   claritySI2   claritySI1   clarityVS2 
##   -1440.3019   -2325.2224    2625.9500    3573.6880    4217.8291 
##   clarityVS1  clarityVVS2  clarityVVS1    clarityIF scale(carat) 
##    4534.8790    4967.1994    5072.0276    5419.6468           NA

Note that carat appears twice in the right hand side of the formula. Further,
the regression coefficient for scale(carat) is NA as this vector in the
design matrix is a linear function of carat. We’ve created a regression model
with a rank deficient design matrix. Oops.

Obviously, we need to omit carat and replace with scale(carat) in the
updated formula. Two ways to do this.

  1. Don’t use the . on the right hand side
    and write out the full right hand side yourself. This option requires too much
    effort and would suck to maintain.

  2. Continue to use the . on the right hand side and omit carat via -

scaled_fit_2 <- update(original_fit, formula = . ~ . - carat + scale(carat))
getCall(scaled_fit_2)
## lm(formula = price ~ cut + color + clarity + scale(carat), data = diamonds)
coef(scaled_fit_2)
##  (Intercept)      cutGood cutVery Good   cutPremium     cutIdeal 
##    -272.2067     655.7674     848.7169     869.3959     998.2544 
##       colorE       colorF       colorG       colorH       colorI 
##    -211.6825    -303.3100    -506.1995    -978.6977   -1440.3019 
##       colorJ   claritySI2   claritySI1   clarityVS2   clarityVS1 
##   -2325.2224    2625.9500    3573.6880    4217.8291    4534.8790 
##  clarityVVS2  clarityVVS1    clarityIF scale(carat) 
##    4967.1994    5072.0276    5419.6468    4212.1250

Cool. That worked well.

Now, what if we wanted to only center carat instead of centering and scaling?
This would require adding scale(carat, scale = FALSE) to the right hand side
of the formula. Starting with the scaled_fit_2 object we find that this task
can be difficult as the full text of scale(carat) needs to be omitted. In the
following chunk you’ll see that scaled_fit_3 does not have the desired formula
whereas scaled_fit_4 does.

scaled_fit_3 <-
  update(scaled_fit_2, formula = . ~ . + scale(carat, scale = FALSE))
scaled_fit_4 <-
  update(scaled_fit_2, formula = . ~ . - scale(carat) + scale(carat, scale = FALSE))

getCall(scaled_fit_3)
## lm(formula = price ~ cut + color + clarity + scale(carat) + scale(carat, 
##     scale = FALSE), data = diamonds)
getCall(scaled_fit_4)
## lm(formula = price ~ cut + color + clarity + scale(carat, scale = FALSE), 
##     data = diamonds)

Okay, one more problem. Let’s start with scaled_fit_4 and scale, but not
center carat. In the chunk below, scaled_fit_5 does not have the desired
result, but scaled_fit_6 does.

scaled_fit_5 <-
  update(scaled_fit_4,
         formula = . ~ . - scale(carat) + scale(carat, center = TRUE))
scaled_fit_6 <-p
## Error in eval(expr, envir, enclos): object 'p' not found
  update(scaled_fit_4,
         formula = . ~ . - scale(carat, scale = FALSE) + scale(carat, center = TRUE))
## 
## Call:
## lm(formula = price ~ cut + color + clarity + scale(carat, center = TRUE), 
##     data = diamonds)
## 
## Coefficients:
##                 (Intercept)                      cutGood  
##                      -272.2                        655.8  
##                cutVery Good                   cutPremium  
##                       848.7                        869.4  
##                    cutIdeal                       colorE  
##                       998.3                       -211.7  
##                      colorF                       colorG  
##                      -303.3                       -506.2  
##                      colorH                       colorI  
##                      -978.7                      -1440.3  
##                      colorJ                   claritySI2  
##                     -2325.2                       2625.9  
##                  claritySI1                   clarityVS2  
##                      3573.7                       4217.8  
##                  clarityVS1                  clarityVVS2  
##                      4534.9                       4967.2  
##                 clarityVVS1                    clarityIF  
##                      5072.0                       5419.6  
## scale(carat, center = TRUE)  
##                      4212.1
                                                                           
getCall(scaled_fit_5)
## lm(formula = price ~ cut + color + clarity + scale(carat, scale = FALSE) + 
##     scale(carat, center = TRUE), data = diamonds)
getCall(scaled_fit_6)
## Error in getCall(scaled_fit_6): object 'scaled_fit_6' not found

So, what do you think? The update function is great, but is has some
limitations. Imaging if you had a function of a variable with several options,
or just one option with a very long value. Update might not be that useful.
For example, instead of scaling carat, let’s move it into bins using the
cut() function. (The fact that there is a variable and a function both called
cut on the right hand side could be confusing. I selected this data set for
this example specifically because it had a meaningful variable name of cut.
We will see why this is important later.)

cut_fit_1 <-
  update(original_fit,
         formula = . ~ . - carat + 
                       cut(carat, 
                           breaks = seq(0, 5.5, by = 0.5),
                           right = FALSE)
         )
names(coef(cut_fit_1))
##  [1] "(Intercept)"                                                     
##  [2] "cutGood"                                                         
##  [3] "cutVery Good"                                                    
##  [4] "cutPremium"                                                      
##  [5] "cutIdeal"                                                        
##  [6] "colorE"                                                          
##  [7] "colorF"                                                          
##  [8] "colorG"                                                          
##  [9] "colorH"                                                          
## [10] "colorI"                                                          
## [11] "colorJ"                                                          
## [12] "claritySI2"                                                      
## [13] "claritySI1"                                                      
## [14] "clarityVS2"                                                      
## [15] "clarityVS1"                                                      
## [16] "clarityVVS2"                                                     
## [17] "clarityVVS1"                                                     
## [18] "clarityIF"                                                       
## [19] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[0.5,1)"
## [20] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[1,1.5)"
## [21] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[1.5,2)"
## [22] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[2,2.5)"
## [23] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[2.5,3)"
## [24] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[3,3.5)"
## [25] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[3.5,4)"
## [26] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[4,4.5)"
## [27] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[4.5,5)"
## [28] "cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)[5,5.5)"

Now, we have a regression model for price with the cut, color, and clarity
accounted for, and a 11 level factor for carat, the interval [0, 0.5) is the
reference level here.

If you only had the cut_fit_1 object to start with, using the update
function to modify the options passed to cut() would be a pain. Having to
type out the old cut() call exactly as provide and then replace with a new
cut() call. This is too much work and a pain to maintain. Just start with a
fresh call to lm. Or, let’s be clever and build some new tools to do this
work.

Modifying a call within a formula

First, let’s look at the structure of a formula. We’ll use the object f,
defined below, as the primary object in this example.

f <- price ~ color + cut + clarity + cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE) 
f
## price ~ color + cut + clarity + cut(carat, breaks = seq(0, 5.5, 
##     by = 0.5), right = FALSE)
str(f)
## Class 'formula'  language price ~ color + cut + clarity + cut(carat, breaks = seq(0, 5.5, by = 0.5),      right = FALSE)
##   ..- attr(*, ".Environment")=
is.list(f)
## [1] FALSE
is.recursive(f)
## [1] TRUE

We have a language object with it an
environment attribute. This object
is not a list, but it is recursive (“is.recursive(x) returns TRUE if x
has a recursive (list-like) structure and FALSE otherwise). So, if we
recursively apply as.list to f object we see a controlled deconstruction of
the formula object.

as.list(f)
## [[1]]
## `~`
## 
## [[2]]
## price
## 
## [[3]]
## color + cut + clarity + cut(carat, breaks = seq(0, 5.5, by = 0.5), 
##     right = FALSE)
lapply(as.list(f), as.list) 
## [[1]]
## [[1]][[1]]
## `~`
## 
## 
## [[2]]
## [[2]][[1]]
## price
## 
## 
## [[3]]
## [[3]][[1]]
## `+`
## 
## [[3]][[2]]
## color + cut + clarity
## 
## [[3]][[3]]
## cut(carat, breaks = seq(0, 5.5, by = 0.5), right = FALSE)

What happens when we apply as.list to the third element of the .Last.value?

lapply(lapply(as.list(f), as.list)[[3]], as.list)
## [[1]]
## [[1]][[1]]
## `+`
## 
## 
## [[2]]
## [[2]][[1]]
## `+`
## 
## [[2]][[2]]
## color + cut
## 
## [[2]][[3]]
## clarity
## 
## 
## [[3]]
## [[3]][[1]]
## cut
## 
## [[3]][[2]]
## carat
## 
## [[3]]$breaks
## seq(0, 5.5, by = 0.5)
## 
## [[3]]$right
## [1] FALSE

Look at the elements of the third element, the cut() call is the first
sub-element followed by the arguments x (implicitly) and breaks (explicitly).

This is great! If a formula is reconstructed recursively then we can access the
arguments of calls within the formula.

Let’s start building a function to fully reconstruct a formula object into
it’s parts.

decon <- function(x) {
  if (is.recursive(x)) {
    lapply(as.list(x), decon)
  } else { 
    x
  }
}

The decon function rips apart any recursive object until only the
non-recursive elements remain. Passing f to decon yields:

decon(f) 
## [[1]]
## `~`
## 
## [[2]]
## price
## 
## [[3]]
## [[3]][[1]]
## `+`
## 
## [[3]][[2]]
## [[3]][[2]][[1]]
## `+`
## 
## [[3]][[2]][[2]]
## [[3]][[2]][[2]][[1]]
## `+`
## 
## [[3]][[2]][[2]][[2]]
## color
## 
## [[3]][[2]][[2]][[3]]
## cut
## 
## 
## [[3]][[2]][[3]]
## clarity
## 
## 
## [[3]][[3]]
## [[3]][[3]][[1]]
## cut
## 
## [[3]][[3]][[2]]
## carat
## 
## [[3]][[3]]$breaks
## [[3]][[3]]$breaks[[1]]
## seq
## 
## [[3]][[3]]$breaks[[2]]
## [1] 0
## 
## [[3]][[3]]$breaks[[3]]
## [1] 5.5
## 
## [[3]][[3]]$breaks$by
## [1] 0.5
## 
## 
## [[3]][[3]]$right
## [1] FALSE

where we can see a hierarchy for each element and sub-element. Take careful
notice of element [[3]][[3]][[1]]. When the deconstruction of the formula
runs into the cut() call, the first element is the name of the call itself,
followed by the arguments thereto. This is illustrated again with respect to
the seq call.

Now, before we try to modify any arguments, we need decon to return a
formual object. After all, the point of this is to modify a formula. By
wrapping the lapply in a as.call if x is recursive we gain the desired
behavior.

decon <- function(x) {
  if (is.recursive(x)) {
    as.call(lapply(as.list(x), decon))
  } else { 
    x
  }
}
decon(f)
## price ~ color + cut + clarity + cut(carat, breaks = seq(0, 5.5, 
##     by = 0.5), right = FALSE)
all.equal(decon(f), f)
## [1] TRUE

Why? Well, each deconstruction is a list with an operator, a named call, as the
first element followed by two arguments. A simple example with addition:

as.call(list(`+`, 1, 2))
## .Primitive("+")(1, 2)

eval(as.call(list(`+`, 1, 2))) 
## [1] 3

By wrapping the lapply in the as.call within the decon function we
preserve the unevaluated calls until the end of the recursion when the call is
implicitly evaluated.

The next step in our journey is to modify decon such that arguments to the
cut call can be updated. Specifically, we want to change the value of the
breaks argument. Just to be clear, the stats::update function can’t do
this.

update(cut(carat, breaks = c(1, 1)), breaks = c(3, 4))
## Error in cut(carat, breaks = c(1, 1)): object 'carat' not found

We are looking for a call named cut, and to modify the breaks argument.
Using is.call will differentiate between the cut call and the cut
variable.

decon <- function(x, nb) {
  if (is.call(x) && grepl("cut", deparse(x[[1]]))) {
    x$breaks <- nb
    x
  } else if (is.recursive(x)) {
    as.call(lapply(as.list(x), decon, nb))
  } else {
    x
  }
}
f
## price ~ color + cut + clarity + cut(carat, breaks = seq(0, 5.5, 
##     by = 0.5), right = FALSE)
decon(f, c(0, 3))
## price ~ color + cut + clarity + cut(carat, breaks = c(0, 3), 
##     right = FALSE)

We’re almost there. The return from decon is a formula. However, we have
not dealt with the environments. Let’s place
decon within another function, call it newbreaks and then handle
environments and calls. While not necessary, to make it clear which functions
are being called we will give use the name local_decon within the newbreaks
function.

newbreaks <- function(form, nb) {
  local_decon <- function(x, nb) {
    if (is.call(x) && grepl("cut", deparse(x[[1]]))) {
      x$breaks <- nb
      x
    } else if (is.recursive(x)) {
      as.call(lapply(as.list(x), local_decon, nb))
    } else {
      x
    }
  }

  out <- lapply(as.list(form), local_decon, nb)
  out <- eval(as.call(out))
  environment(out) <- environment(form)
  out
}

newbreaks(f, c(0, 3)) 
## price ~ color + cut + clarity + cut(carat, breaks = c(0, 3), 
##     right = FALSE)

This is great! Now we are able to modify the breaks within a cut call
within a formula! In practice we could do the following:

new_fit_1 <-
  update(cut_fit_1, formula = newbreaks(formula(cut_fit_1), c(0, 1, 2)))
new_fit_2 <-
  update(cut_fit_1, formula = newbreaks(formula(cut_fit_1), c(0, 1, 3, 5)))
new_fit_3 <-
  update(cut_fit_1, formula = newbreaks(formula(cut_fit_1), seq(0, 5.5, by = 1.25)))

names(coef(new_fit_1))
##  [1] "(Intercept)"                                        
##  [2] "cutGood"                                            
##  [3] "cutVery Good"                                       
##  [4] "cutPremium"                                         
##  [5] "cutIdeal"                                           
##  [6] "colorE"                                             
##  [7] "colorF"                                             
##  [8] "colorG"                                             
##  [9] "colorH"                                             
## [10] "colorI"                                             
## [11] "colorJ"                                             
## [12] "claritySI2"                                         
## [13] "claritySI1"                                         
## [14] "clarityVS2"                                         
## [15] "clarityVS1"                                         
## [16] "clarityVVS2"                                        
## [17] "clarityVVS1"                                        
## [18] "clarityIF"                                          
## [19] "cut(carat, breaks = c(0, 1, 2), right = FALSE)[1,2)"
names(coef(new_fit_2))
##  [1] "(Intercept)"                                           
##  [2] "cutGood"                                               
##  [3] "cutVery Good"                                          
##  [4] "cutPremium"                                            
##  [5] "cutIdeal"                                              
##  [6] "colorE"                                                
##  [7] "colorF"                                                
##  [8] "colorG"                                                
##  [9] "colorH"                                                
## [10] "colorI"                                                
## [11] "colorJ"                                                
## [12] "claritySI2"                                            
## [13] "claritySI1"                                            
## [14] "clarityVS2"                                            
## [15] "clarityVS1"                                            
## [16] "clarityVVS2"                                           
## [17] "clarityVVS1"                                           
## [18] "clarityIF"                                             
## [19] "cut(carat, breaks = c(0, 1, 3, 5), right = FALSE)[1,3)"
## [20] "cut(carat, breaks = c(0, 1, 3, 5), right = FALSE)[3,5)"
names(coef(new_fit_3))
##  [1] "(Intercept)"                                                           
##  [2] "cutGood"                                                               
##  [3] "cutVery Good"                                                          
##  [4] "cutPremium"                                                            
##  [5] "cutIdeal"                                                              
##  [6] "colorE"                                                                
##  [7] "colorF"                                                                
##  [8] "colorG"                                                                
##  [9] "colorH"                                                                
## [10] "colorI"                                                                
## [11] "colorJ"                                                                
## [12] "claritySI2"                                                            
## [13] "claritySI1"                                                            
## [14] "clarityVS2"                                                            
## [15] "clarityVS1"                                                            
## [16] "clarityVVS2"                                                           
## [17] "clarityVVS1"                                                           
## [18] "clarityIF"                                                             
## [19] "cut(carat, breaks = c(0, 1.25, 2.5, 3.75, 5), right = FALSE)[1.25,2.5)"
## [20] "cut(carat, breaks = c(0, 1.25, 2.5, 3.75, 5), right = FALSE)[2.5,3.75)"
## [21] "cut(carat, breaks = c(0, 1.25, 2.5, 3.75, 5), right = FALSE)[3.75,5)"

Why, Why would you ever need this?

I hope the above examples would have answered this question. If not, here was
my motivation. I have been working with B-splines, a lot. My Ph.D.
dissertation focuses on B-spline regression models. I needed to be able to
update a regression object with a new formula differing only by the internal
knot locations within a spline. Originally, this meant using the splines::bs
call and adjusting the knots argument while preserving the values passed to
the degree, intercept and Boundary.knots arguments. If you are familiar
with the splines::bs call then you’ll know that the there are default
arguments to each of these after mentioned arguments.

Further, the objects I really needed to update where more complex than just an
lm object and, as new software goes, had an ever changing API. Construction of
calls was a lot of overhead. When the only thing that needed to be updated was
one argument in one call within a formula it seemed reasonable to find a
solution to do exactly what I needed and no more.

Once I publicly release my cpr package you’ll find, if you dig into the source
code, functions cpr:::newknots and cpr:::newdfs to be critical functions in
the implementation.

Acknowledgements:
I didn’t figure out how to do this completely on my own. I had posed a question
on stackoverflow which was the
basis for this post and extensions.

To leave a comment for the author, please follow the link and comment on their blog: Peter E. DeWitt.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)