Friday function triple bill: with vs. within vs. transform

April 29, 2011

(This article was first published on 4D Pie Charts » R, and kindly contributed to R-bloggers)

When you first learnt about data frames in R, I’m sure that, like me, you thought “This is a lot of hassle having to type the names of data frames over and over in order to access each column”.

anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #I have to type anorexia how many times?

Indeed, any time you see chunks of code repeated over and over, there’s an indication that they need rewriting. Thus the first time you discovered the attach function was a blissful moment. Ah, the hours you would save by not typing variable names! Alas, those hours were more than made up for by the hundreds of hours you spent debugging impenetrable buggy code that was a side effect of attach.

anorexia$wtDiff <- Postwt - Prew   #Deliberate typo!

In the above snippet of code, the typo causes execution to halt after the second line, so the call to detach never happens. Then, after you’ve fixed the typo and run the code again, anorexia is on your search path twice. This is problematic because when you detach it, there is still a copy of the data frame on the search path. Cue wailing and gnashing of teeth as you waste half an hour trying to find the bug.

Today we’re going to look at three functions that let you manipulate data frames, without the nasty side-effects of attachwith, within and transform.

For adding (or overwriting) a column to a data frame, like in the above example, any of the three functions is perfectly adequate; they just have slightly different syntaxes. with often has the most concise formulation, though there isn’t much in it.

anorexia$wtDiff <- with(anorexia, Postwt - Prewt)
anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt)
anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)

For multiple changes to the data frame, all three functions can still be used, but now the syntax for with is more cumbersome. I tend to favour within or transform in these situations.

fahrenheit_to_celcius <- function(f) (f - 32) / 1.8
airquality <- with(airquality, list(
airquality <- within(airquality,
  cTemp2     <- fahrenheit_to_celcius(Temp)
  logOzone2  <- log(Ozone)
  MonthName2 <-[Month]
airquality <- transform(airquality,
  cTemp3     = fahrenheit_to_celcius(Temp),
  logOzone3  = log(Ozone),
  MonthName3 =[Month]

The most important lesson to take away from this is that if you are manipulating data frames, then with, within and transform can be used almost interchangeably, and all of them should be used in preference to attach. For further refinement, I prefer with for single updates to data frames, and within or transform for multiple updates.

Tagged: data-manipulation, data-transformation, r, with

To leave a comment for the author, please follow the link and comment on their blog: 4D Pie Charts » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)