Import/Export data to and from xlsx files

April 5, 2013

(This article was first published on Insights of a PhD student » R, and kindly contributed to R-bloggers)

As Ive already written, getting data into R from your precious xlsx files is really handy. No need to clutter up your computer with txt or csv files. The previous post I wrote about the gdata package for importing data from xlsx files and was pointed to, among others, the xlsx package. xlsx seems to be a good package, easy to use and, importantly, fast. Its based on java, but it comes with all the relevant jar files in an accompanying package which installs on its own if you have the install dependencies setting to TRUE.

To read in with xlsx its the same as any other read function, you just need to tell it which sheet to read, by either name (sheetName argument) or number (sheetIndex):

dat <- read.xlsx("testfile.xlsx", sheetName="")

There are various other options that other packages for importing excel files dont seem to have such as rowIndex and colIndex for specifying which rows or columns you want to import. There is also a second function (read.xlsx2) which is apparently an order of magnitude faster for those particularly big files. Once youve selected the data and run the code, you can happily work with the data.

Writing to xlsx files might be useful too, for storage or data sharing with people who dont use R for instance. This is dead easy with xlsx!

If you want just a single dataframe in the workbook you simply do something like the following:

write.xlsx(cars, "cars_dataframe.xlsx")

To create a new file containing multiple dataframes from R, you first create the workbook, add sheets to that workbook and then add the dataframes to the sheets and save the workbook to whatever file you want.

cars <- createWorkbook()
cars1 <- createSheet(wb=cars, sheetName="Cars")
cars2 <- createSheet(wb=cars, sheetName="MTCars")
data(cars); data(mtcars)
addDataFrame(x=cars, sheet=cars1)
addDataFrame(x=mtcars, sheet=cars2)
saveWorkbook(cars, "Cars_datasets.xlsx")

By default this will add both column and row names, but this can be overridden using the row.names or col.names arguments in the addDataFrame function. You can also add the dataframes to a particular starting place in the sheet using the startRow and startCol arguments to the addDataFrame function.

Theres also some funky styling stuff you can do using the CellStyle, Fill, Alignment, Font and setCellStyle functions of the following sort (from ?CellStyle).

  wb <- createWorkbook()
  sheet <- createSheet(wb, "Sheet1")
  rows  <- createRow(sheet, rowIndex=1)    
  cell.1 <- createCell(rows, colIndex=1)[[1,1]]     
  setCellValue(cell.1, "Hello R!")
  cs <- CellStyle(wb) +
    Font(wb, heightInPoints=20, isBold=TRUE, isItalic=TRUE,
      name="Courier New", color="orange") + 
    Fill(backgroundColor="lavender", foregroundColor="lavender",
      pattern="SOLID_FOREGROUND") +
   setCellStyle(cell.1, cs)  
# you need to save the workbook now if you want to see this art


To leave a comment for the author, please follow the link and comment on their blog: Insights of a PhD student » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)