Using the plyr package

January 18, 2014
By

(This article was first published on Dan Kelley Blog/R, and kindly contributed to R-bloggers)

Introduction

The base R system provides lapply() and related functions, and the package plyr provides alternatives that are worth considering. It will be assumed that readers are familiar with lapply() and are willing to spend a few moments reading the plyr documentation, to see why the illustration here will use the ldply() function.

The test task will be extraction of latitude (and then both latitude and longitude) from the section dataset in the oce package. (Users of that package may be aware that there is a built-in accessor for doing this, so results can easily be checked.)

Methods

First, load the data

1
2
library(oce)
data(section)

Next, find latitudes using lapply

1
lat <- unlist(lapply(section[["station"]], function(x) x[["latitude"]]))

Next, find latitudes with ldply

1
2
library(plyr)
lat <- ldply(section[["station"]], function(x) x[["latitude"]])

Results

The reader can check that the results match, although ldply() returns a data frame, not a vector as in the first method. Tests of speed

1
2
3
library(microbenchmark)
microbenchmark(ldply(section[["station"]], function(x) x[["latitude"]])$V1, 
    unlist(lapply(section[["station"]], function(x) x[["latitude"]])))

yield the following

1
2
3
4
5
6
7
## Unit: milliseconds
##                                                               expr   min
##        ldply(section[["station"]], function(x) x[["latitude"]])$V1 18.99
##  unlist(lapply(section[["station"]], function(x) x[["latitude"]])) 18.36
##     lq median    uq   max neval
##  20.26  20.56 21.02 36.05   100
##  19.71  19.93 20.64 63.18   100

suggesting a difference too small to be of much practical interest.

Discussion

Since ldply() returns a data frame, it is more flexible than unlist(), which returns a vector. For example, the following creates a data frame with columns for lat and lon:

1
latlon <- ldply(section[["station"]], function(x) c(x[["latitude"]], x[["longitude"]]))

A station plot is produced as follows.

1
2
3
mapPlot(coastlineWorld, projection = "orthographic", orientation = c(20, -40, 
    0))
mapPoints(latlon$V2, latlon$V1, pch = "+", cex = 1/2, col = "red")

figure

Conclusions

The effort of learning how to use the plyr package is likely to pay off in more flexible code, particularly because of the use of data frames in that package. On this theme, note that the author of plyr is developing a similar package called dplry, which centres more closely on data frames and offers many new features; see http://blog.rstudio.org/2014/01/17/introducing-dplyr/ for a blog item introducing dplyr.

To leave a comment for the author, please follow the link and comment on their blog: Dan Kelley Blog/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)