RStudio has improved the power of auto-completion in R and generally increased usability. However, there remains the potential to improve discoverability and usability. There are also coding practices that R package authors can adopt both to work better with auto-complete and make the features of their R package more discoverable. After using and teaching R for the last ten years, this post outlines what I see as major areas for potential improvement.
R has a reputation as being efficient once you know how it works, but difficult to learn.
- Users don’t have to memorise the precise spelling of the name of every function, argument name, data frame variable, and argument value. It also helps to resolve the issue of the wide range of coding conventions in R (camelCase, dot.names, under_score_names, etc.).
- It means that users can focus more on coding and less on looking up help files for the precise phrasing of some low level feature, or constantly typing dput(names(mydata)) to get lists of variable names.
- New users may also know what they are looking for, but not know how to obtain it. Auto-completion can facilitate this.
Auto-completion of arguments that take a character variable
Example 1: If I have missing data on a correlation matrix, then I use the “use” argument to specify what kind of missing data substitution should occur. It would be good if code completion operated on the available options. That said, at least RStudio automatically shows the argument instructions which lists the options.
Example 2: The options for useNA and exclude arguments of table
Example 3: The rotation argument for factanal does not list available rotations. The help only states that the default argument is “varimax” and that there are other rotations in some other packages, although the help files does show “promax” as another option.
- Package authors should ensure that the help files list all argument options in the “arguments” section of the help file. If using “see details”, at least list the permissible option names in the arguments section and use “see details” for actually putting the details of what each of the arguments means. RStudio displays the argument information in auto-complete. Often a user just wants to be reminded of the precise spelling for the argument option or wishes to get an overview of the choices.
- It should be possible to enable auto-completion on the available options. I imagine this would involve the specification of additional language features in R which would then be detected by IDEs like RStudio.
Auto-completion for nested elipsis arguments
Example 1: I’m running a factor analysis
fit <- factanal(matrix(rnorm(1000), ncol = 10), 2)
The code for printing the loadings, has several arguments including “sort” and “cutoff”i.e.,
print(fit, sort = TRUE, cutoff = .5)
But auto-complete doesn’t see these arguments. RStutidio actually does a pretty good job of finding arguments. It seems that these arguments are related to “print.loadings” as opposed to “print.factanal”. Thus, if you go:
loads <- fit$loadings
Then, pressing tab after
will show the cutoff and sort arguments.
However, it seems that RStudio is only able to to go one layer deep.
I imagine that this is a hard one to solve.
Auto-completion of variable names in data frames
- RStudio should also auto-complete variable names after mydata[,c(“ . i.e., after quotation marks. Because presumably that is how the user would be selecting variables and then they realise that they can’t remember the precise spelling and so need to tab complete.
Auto-completion on formulas
Auto-completion in the Hadleyverse (e.g., ggplot2) and other functions where a data frame is one argument and variable names are another
Auto-completion for function arguments that take lists
- nls(…, control = list(…))
- ProjectTemplate::load.project(override.config = list(…))
- Package authors: should include the list of permissible argument names in the argument section of the help file so that auto-completion software could quickly show this information.
- R language: There should be a way to specify what are the permissible arguments which could then be then incorporated into some form of auto-complete in RStudio.
Some other issues
Make more model fit information accessible from the fit object
Avoid printing output to the screen that can not easily be extracted
R generally makes reproducible analysis easier to perform. A common use case is to take the output of a function and use that output in a subsequent function. This can be as simple as creating a table that combines different elements (e.g., coefficients from multiple models along with fit statistics).
However, some functions print the statistics you want to the screen, but these numbers are not readily available. In general, this means that print function is performing the calculations and printing them to the screen, without ever storing the results in an object.
Example 1: The print method for factanal prints proportion variance explained for each factor. This is calculated in the print function but is not accessible. If you didn’t know how to calculate this yourself, you would have to know that getAnywhere(print.factanal) is the incantation for seeing how R calculates it, and then you’d have to extract the code that does it.
In contrast, when you run summary on an lm fit, you can explore the object and extract things like adjusted r-squared. E.g.,
fit <- lm(y ~ x, mydata)
sfit <- summary(fit)
This will show the elements of what has been calculated. Depending on trade-offs for computation time, it might even be simpler, if more of these relevant summary statistics are calculated with the fit. So that a user only has to fit the object, and then they can extract the relevant information with fit$ (tab)
- Package authors should try to ensure that for every important bit of output in a print function, there should be a standard way of extracting that information into an object. For example, the summary method for lm returns the adjusted r-squared.
Many different object exploration operators
- $ (dollar) to extract named elements of a list (particularly used for output of statistical functions, variables in data.frames and general lists of things) .
- :: (double colon) to extract functions and other objects in a package (e.g., mypackage::foo())
- ::: (triple colon) to extract hidden functions
- @ (at symbol) to extract elements of S4 class objects
- . (period) which is a notational rule relevant to understanding S3 methods (e.g., print.lm)