Survive R

September 29, 2009
By

(This article was first published on Win-Vector Blog » R, and kindly contributed to R-bloggers)

New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009).

We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we do like R (see: Exciting Technique #1: The “R” language ) we also understand the need to defend oneself against the abuse regularly dished out by R. Here we will quickly share a few fighting techniques.

If you are not already using R the following will not mean much. If you are using R this may scratch a few itches.

  • First: Write down everything- keep notes in a separate file.

    When you do figure out how to do something in R it will be concise, powerful and completely un-mnemonic and impossible to find again through the help system.

  • Second: Find some way to search for R answers.

    http://stackoverflow.com/questions/102056/how-to-search-for-r-materials

  • Third: Learn unclass().

    # Here is an example of fitting a linear model (from the help(glm) documentation)
    ## Dobson (1990) Page 93: Randomized Controlled Trial :
    > counts <- c(18,17,15,20,10,20,25,13,12)
    > outcome <- gl(3,1,9)
    > treatment <- gl(3,3)
    > glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
    
    

    Want to get the model coefficients and don't feel like suffering through the documentation/help system? You can't inspect the glm.D93 object because it has overridden the print() and summary() methods to hide details (in particular you can't find the member data). No problem, type this:

    > model <- unclass(glm.D93)
    

    The model is now a harmless list without a bunch of pesky methods hiding the information.

  • Fourth: learn how to list class and methods.

    Often one of methods(), showMethods() or getS3Method() can show you what methods are on a class or object. Be prepared to try them all as they apply in different contexts.

    # lets make a tricky function
    > fe <- function(x) UseMethod("fe")
    > fe.formula <- function(x) { print('formula')}
    > fe.numeric <- function(x) { print('numeric')}
    

    How will anyone figure out what we have done?

    > class(fe)
    [1] "function"
    
    > methods(fe)
    # [1] fe.formula fe.numeric
    
    > getS3method('fe','numeric')
    # fe.numeric <- function(x) { print('numeric')}
    

  • Fifth: Learn to stomp out attributes.

    Ever have this crud follow you around?

    > m <- summary(c(1,2))[4]
    > m
    Mean
     1.5
    

    Ah that’s cute: a little “Mean” tag is following the data around. But what if we try to use this value:

    > m*m
    Mean
    2.25
    

    Okay, now the “Mean” tag has outstayed its welcome. The fix:

    > attributes(m) <- c()
    > m*m
    [1] 2.25
    

    MUCH better.

  • Sixth: Swallow your pride.

    My example: does R have map structures? I have no idea and I am too ashamed to ask. However I know I can fake it with environments (which may be “the R way to do this” or may be “a horrible abuse of the language”- I have no idea which).

    > map <- new.env(hash=TRUE,parent=emptyenv())
    > assign('dog',7,map)
    > ls(map)
    [1] "dog"
    > get('dog',envir=map)
    [1] 7
    

    That (nearly) gives you maps with string keys. For maps with numeric keys we can fake something else up with findInterval(). For maps from generic comparable objects keys- I have no idea how you would trick R into helping. This is one reason we like to separate out all data-preparation into a pre-processing step implemented in Java or SQL.

    Note important correction from Eward Ratzer: use “map <- new.env(hash=TRUE,parent=emptyenv()), see comments.

  • Seventh: Find and rely on “the one-liners.”

    Reading in an entire comma separated file in a single line ( read.table() ), re-aggregating data ( table() or doBy’s summaryBy() command ) or building an empirical density ( ecdf() ) in a single line of code is an experience not to be missed.

The overall all point is that while R has some (unnecessarily) sharp edges and pain-points it is a powerful tool worth using. I would much rather struggle through a minor R-language issue when trying to prepare my data than to do without the many special functions, distributions, fitters and plotters built into the R system.

Related posts:

  1. R examine objects tutorial
  2. Exciting Technique #1: The “R” language.
  3. R annoyances

To leave a comment for the author, please follow the link and comment on his blog: Win-Vector Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.