Converting strsplit() output to a data.frame

January 28, 2011
By

(This article was first published on Gregor Gorjanc (gg), and kindly contributed to R-bloggers)

R has a nice set of utilities to work with strings. Function paste is surely one among these. It can be used to "glue" several strings with optional separator. The following example shows how paste can be used to create a new variable in a dataset:
dat <- data.frame(x=1:5, y=letters[1:5])
(dat$z <- with(dat, paste(x, y, sep="-")))
Today I was in a situation where I only had column z and wanted to reverse the action of paste. Is there a way to do it? Not directly (AFAIK), but strsplit seems to be quite useful for this:
(tmp <- strsplit(x=dat$z, split="-"))
However, the output of strsplit is a list object with elements (vectors) by the elements of my column z and not by split components. Consequently one can not convert strsplit output easily back to a data.frame as you can test yourself with:
as.data.frame(tmp)
Argh. I understand that strsplit is meant to be very general (say we could have unequal number of components in one element, e.g., c("1-a-0", "1-a")), but its output is really inconvenient for transformation to a data.frame. I came up with the following solution, which seems to work nicely and is quite fast.
tmp <- unlist(strsplit(dat$z, split="-"))
cols <- c("x2", "y2")
nC <- length(cols)
ind <- seq(from=1, by=nC, length=nrow(dat))
for(i in 1:nC) {
  dat[, cols[i]] <- tmp[ind + i - 1]
}
Does anyone have a better (more obvious) solution?

To leave a comment for the author, please follow the link and comment on his blog: Gregor Gorjanc (gg).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.