Quickly read Excel worksheets into R (Windows only…sorry)

[This article was first published on Houses of Stones » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I suppose most companies use the Microsoft Office suite of programs, and my office is no exception. It easy to import data from an API or a database into R, but importing data from an Excel workbook is a different story. There are a few R packages for reading Excel files, but I’ve had problems with all of them:

  • `read.xlsx` (`gdata` package): pretty convenient to run in R, but requires Perl which for some reason I have a hard time installing on my Windows machine…that might just be an issue with me, not the machine.
  • `odbcConnectExcel2007` (`RODBC` package): from what I’ve seen on the listservs, this one has a hard time reading xlsx files because of a driver mismatch – you have to access the files through 32-bit R, which is annoying.
  • `readWorksheetFromFile` (`XLConnect` package): uses Java, easy to install, and has tons of functionality to write in addition to read, but I don’t really need the write functionality and for large files especially XLConnect is very slow.

So I set off in search of a faster way to pull information out of an Excel file. The gist below shows what I came up with. Excel already has Visual Basic capabilities built in. So I stole a little VB script from here and stuck it in a function that writes the script to a temporary file, calls the script from the command line, and then outputs the contents of the formerly-Excel file.

The function takes the following inputs:

  • file_path: the full path of the Excel file
  • keep_sheets: the names of the sheets you want to keep; extracts all sheets if NULL
  • target_dir: the directory to which you want to output the excel sheets; defaults to the directory where the Excel sheet is kept. If set to FALSE, it doesn’t output any files – rather, it reads them all into R as a list of data frames.

Unfortunately, this only works on Windows. I had hoped it would work on a Mac where Excel was installed, but I haven’t had any luck getting it to work on my home computer.

To leave a comment for the author, please follow the link and comment on their blog: Houses of Stones » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)