Site icon R-bloggers

XLConnect 0.2-4

[This article was first published on Mirai Solutions » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-4, which is available from CRAN.

This newest release comes along with a number of new features:

A summary of these news can be found in the NEWS file on CRAN.

To see some of the new features in action, please find below some examples:

# Load the XLConnect package
require(XLConnect)

# Download sample data
download.file("http://miraisolutions.files.wordpress.com/2013/01/example.xlsx", destfile = "example.xlsx")

# Load the workbook
wb = loadWorkbook("example.xlsx")

# Read the data on the sheet named "data"
data = readWorksheet(wb, sheet = "data")
# Note all the NA values due to the cells with text containing
# 'some stuff'
print(head(data))

# Let's assume we only roughly know the data is located within some
# given boundaries (exact location would be startRow = 10,
# startCol = 8, endRow = 42, endCol = 20)
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23)
# However, XLConnect is capable to find the actual data region
print(head(data))
# --> This is due to the arguments autofitRow and autofitCol
# which default to TRUE

# Let's see what would happen if we put them to FALSE
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, autofitRow = FALSE,
                     autofitCol = FALSE)
# --> Well, it reads the region exactly as specified above,
# so we get NAs obviously.

# Going back to the previous version
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23)

# Did you note the column named 'complex' which is all NAs?
# This is due to the fact that the 'complex' column is computed
# using the BESSELJ function in Excel which is not yet implemented
# in Apache POI. However, we can still read the values by instructing
# XLConnect to read the _cached_ values rather than trying to evaluate
# the formula
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
print(head(data))

# Let's get on to filtering. Let's assume we are only interested
# in the columns "car", "mpg" and "wt". Generally, we could
# do something like this:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
dataSub = data[, c("car", "mpg", "wt")]
print(head(dataSub))
# What we do here is import ALL the data and subset the data
# in R. This is fine and no problem for small data sets. For
# large data sets, however, there might be a performance penalty.
# It may be better to the subsetting on the Java side directly:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE,
                     keep = c("car", "mpg", "wt"))
print(head(data))

# Similarly, instead of 'keeping' columns, you can also drop
# them:
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE,
                     drop = c("disp", "hp", "drat", "carb"))
print(head(data))

# Now on to missing values. Noted the values 999999 in column
# "gear"? We might would want those to read in as NAs. How can
# we achieve that?

# Set the missing value identifier
setMissingValue(wb, value = 999999)
# Read in the data
data = readWorksheet(wb, sheet = "data", startRow = 7, startCol = 7,
                     endRow = 46, endCol = 23, useCachedValues = TRUE)
# Here we go ...
print(head(data))

XLConnect is a comprehensive and cross-platform R package for manipulating Microsoft Excel files from within R. XLConnect differs from other related R packages in that it is completely cross-platform and as such runs under Windows, Unix/Linux and Mac (32- and 64-bit). Moreover, it does not require any installation of Microsoft Excel or any other special drivers to be able to read & write Excel files. The only requirement is a recent version of a Java Runtime Environment (JRE).

The package can easily be installed from CRAN via install.packages("XLConnect") (use install.packages("XLConnect", type = "source") on Mac OS X). In order to get started have a look at the XLConnect and XLConnect for the Impatient package vignettes, the numerous demos available via demo(package = "XLConnect") or browse through the comprehensive reference manual.

We are always happy to hear from you! For general feedback write us an email to xlconnect@mirai-solutions.com. If you find any bugs or hope to see some currently missing features in the future, just send us an email to xlconnect-bugs@mirai-solutions.com. You may also want to check out our XLConnect FAQ.


To leave a comment for the author, please follow the link and comment on their blog: Mirai Solutions » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.