Get data out of excel and into R with readxl

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m pleased to announced that the first version of readxl is now available on CRAN. Readxl makes it easy to get tabular data out of excel. It:
- Supports both the legacy
.xls
format and the modern xml-based.xlsx
format..xls
support is made possible the with libxls C library, which abstracts away many of the complexities of the underlying binary format. To parse.xlsx
, we use the insanely fast RapidXML C++ library. - Has no external dependencies so it’s easy to use on all platforms.
- Re-encodes non-ASCII characters to UTF-8.
- Loads datetimes into POSIXct columns. Both Windows (1900) and Mac (1904) date specifications are processed correctly.
- Blank columns are automatically dropped.
- Returns output with class
c("tbl_df", "tbl", "data.frame")
so if you also use dplyr you’ll get an enhanced print method (i.e. you’ll see just the first ten rows, not the first 10,000!).
You can install it by running:
install.packages("readxl")
There’s not really much to say about how to use it:
library(readxl) # Use a excel file included in the package sample <- system.file("extdata", "datasets.xlsx", package = "readxl") # Read by position head(read_excel(sample, 2)) #> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 # Or by name: excel_sheets(sample) #> [1] "iris" "mtcars" "chickwts" "quakes" head(read_excel(sample, "mtcars")) #> mpg cyl disp hp drat wt qsec vs am gear carb #> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
You can see the documentation for more info on the
col_names
, col_types
and na
arguments.
Readxl is still under active development. If you have problems loading a dataset, please try the development version, and if that doesn’t work, file an issue.
To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.