Expand delimited columns in R

November 14, 2012
By

(This article was first published on Eldon Prince » R-bloggers, and kindly contributed to R-bloggers)

A postdoctoral researcher asked me the other day to help him expand a vector of comma delimited values so he could do computations in R with it. I wrote an R function to solve the problem. Here is the before and after:

> data
  Name      Score1   Score2
1 Bill 1,3,4,3,6,9 F1,F3,F2
2  Bob       3,2,3 F2,F2,F4
3  Sam       2,5,3 F5,F2,F4
> expand.delimited(data)
   Name Score1
1  Bill      1
2  Bill      3
3  Bill      4
4  Bill      3
5  Bill      6
6  Bill      9
7   Bob      3
8   Bob      2
9   Bob      3
10  Sam      2
11  Sam      5
12  Sam      3
# Description
# Accepts a data.frame where col1 represents a factor and col2 represents
# comma or other delimited values to be expanded according to col1.
# Returns a data.frame.

# Usage
# expand.delimited(x, ...)

# Default
# expand.delimited(x, col1=1, col2=2, sep=",")

# Arguments
# x     A data.frame
# col1  Column in data.frame to act as factor
# col2  Column in data.frame that is delimited and will be expanded
# sep   Delimiter

#Download data
#Read in data 
data<-read.table("expand_delimited.txt",header=T)

#Function to expand data
expand.delimited <- function(x, col1=1, col2=2, sep=",") {
  rnum <- 1
  expand_row <- function(y) {
    factr <- y[col1]
    strng <- toString(y[col2])
    expand <- strsplit(strng, sep)[[1]]
    num <- length(expand)
    factor <- rep(factr,num)
    return(as.data.frame(cbind(factor,expand),
          row.names=seq(rnum:(rnum+num)-1)))
    rnum <- (rnum+num)-1
  }
  expanded <- apply(x,1,expand_row)
  df <- do.call("rbind", expanded)
  names(df) <- c(names(x)[col1],names(x)[col2])
  return(df)
}

# Example
expand.delimited(data)

To leave a comment for the author, please follow the link and comment on his blog: Eldon Prince » R-bloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.