Quickly read Excel worksheets into R (Windows only…sorry)

June 18, 2013
By

(This article was first published on Houses of Stones » R, and kindly contributed to R-bloggers)

I suppose most companies use the Microsoft Office suite of programs, and my office is no exception. It easy to import data from an API or a database into R, but importing data from an Excel workbook is a different story. There are a few R packages for reading Excel files, but I’ve had problems with all of them:

  • `read.xlsx` (`gdata` package): pretty convenient to run in R, but requires Perl which for some reason I have a hard time installing on my Windows machine…that might just be an issue with me, not the machine.
  • `odbcConnectExcel2007` (`RODBC` package): from what I’ve seen on the listservs, this one has a hard time reading xlsx files because of a driver mismatch – you have to access the files through 32-bit R, which is annoying.
  • `readWorksheetFromFile` (`XLConnect` package): uses Java, easy to install, and has tons of functionality to write in addition to read, but I don’t really need the write functionality and for large files especially XLConnect is very slow.

So I set off in search of a faster way to pull information out of an Excel file. The gist below shows what I came up with. Excel already has Visual Basic capabilities built in. So I stole a little VB script from here and stuck it in a function that writes the script to a temporary file, calls the script from the command line, and then outputs the contents of the formerly-Excel file.

The function takes the following inputs:

  • file_path: the full path of the Excel file
  • keep_sheets: the names of the sheets you want to keep; extracts all sheets if NULL
  • target_dir: the directory to which you want to output the excel sheets; defaults to the directory where the Excel sheet is kept. If set to FALSE, it doesn’t output any files – rather, it reads them all into R as a list of data frames.

Unfortunately, this only works on Windows. I had hoped it would work on a Mac where Excel was installed, but I haven’t had any luck getting it to work on my home computer.

To leave a comment for the author, please follow the link and comment on his blog: Houses of Stones » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.