Very Non-Standard Calling in R

December 3, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Our group has done a lot of work with non-standard calling conventions in R.

Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in R.

Please read on for a recent example.

Consider the following calls to stats::lm(). And notice the third example fails (throws an error).

# works
lm("y ~ x", 
   data = data.frame(
     x=1:5, 
     y = c(1, 1, 2, 2, 2)), 
   weights = numeric(5)+1)
#> 
#> Call:
#> lm(formula = "y ~ x", data = data.frame(x = 1:5, y = c(1, 1, 
#>     2, 2, 2)), weights = numeric(5) + 1)
#> 
#> Coefficients:
#> (Intercept)            x  
#>         0.7          0.3


# works
f1 <- function(w = NULL) {
  lm(as.formula("y ~ x"), 
     data = data.frame(
       x=1:5, 
       y = c(1, 1, 2, 2, 2)), 
     weights = w)
}
f1(numeric(5)+1)
#> 
#> Call:
#> lm(formula = as.formula("y ~ x"), data = data.frame(x = 1:5, 
#>     y = c(1, 1, 2, 2, 2)), weights = w)
#> 
#> Coefficients:
#> (Intercept)            x  
#>         0.7          0.3


# fails
f2 <- function(w = NULL) {
  lm("y ~ x", 
     data = data.frame(
       x=1:5, 
       y = c(1, 1, 2, 2, 2)), 
     weights = w)
}
f2(numeric(5)+1)
#> Error in eval(extras, data, env): object 'w' not found

According the stats::lm() documentation (help(lm)) the first argument must be:

an object of class “formula” (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

A string appears to be coerce-able into a formula, so all three examples should work. However, typing “print(lm)” reveals the issue: stats::lm() doesn’t take the “weights” argument in a standard way (as the value of a function parameter). It instead grabs it through a sequence of match.call() and eval() steps. It is a complicated way to get the value, which works until it does not work. Somehow passing in the formula as a string interferes with how the value of weights is found. I think we can now see the benefits of isolation and independence of concerns in code.

This over-use of direct environment copying and manipulation is what leads to a great many data-leaks in stats::lm() and stats::glm(). This is in addition to their weird habit of keeping a copy of all of the training data (which loses quite a few of the merits of these methods). Our group dealt with these issues a long time ago, so we are somewhat familiar with stats::lm() and stats::glm().

Of course, one could (as the stats::lm() documentation mentions) call stats::lm.fit(). However, stats::lm.fit() does not seem to accept weights and its own documentation (help(lm.fit)) starts ominously:

These are the basic computing engines called by lm used to fit linear models. These should usually not be used directly unless by experienced users.

Having just finished teaching a four day intensive course covering data science in Python, I can’t help but remark that users of sklearn.linear_model.LinearRegression() don’t need to worry about issues such as the above. Some of the notational flair of R comes at the cost of significant opportunities for user confusion.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)