Saving your R dataframe to a
.csv can be useful; being able to view the data all at once can help to see the bigger picture. Often though, multiple dataframes, all pieces of the same project, need to be viewed this way and related back to one another. In this case viewing becomes far easier when these dataframes are written to
.xlsx across multiple sheets in a single workbook. Not to mention the time and energy saved when you no longer have to find and open multiple files.
Four packages in R are available to do just this. I generated some test data (a 30000 x 40 dataframe with sampled values between 1 and 100) and tested each one with varying levels of success.
The xlsx package
I ran into problems very early on with this package. The installation was dependent on the
rJava package which gave an error during installation. Given that we’re using a Unix platform, the fix to this was to run
sudo apt-get install r-cran-rjava in console. Executing this command succesfully installed the package and after restarting R-studio I was able to install and load the
The next problem I ran into was a java out of memory error, specifically
java.lang.OutOfMemoryError. This time the fix was allocating more memory to Java trough the
options(java.parameters = "-Xmx40000m") setting. The memory increase amount (and therefore the amount of data that can be written) is dependent on the amount of memory your computer can allocate to R. In this case I have allocated 40gig, which most of us do not have laying around. Once this was done I was able write up to 10 worksheets without any problems, but again, I ran into memory problems when I tried to scale up.
The XLConnect package
This package had all the same problems as the previous with one added extra. During testing, the writing process was done many times; each time it was done the previous
.xlsx file saved as the same name needed to be deleted. If this wasn’t done R would continue attempting to write to the file indefinitely with no indications of stopping. There seems to be a fix for this using the
createNames() function but it didn’t seem worth the effort.
The WriteXLS package
This package was easy enough to use with no apparent errors. The problem was that because this method writes the data directly into a workbook, instead of first creating it locally within the R environment, it was extremely slow.
The openxlsx package
Finally, I looked to the
opnexlsx package. Again, no problems here; the most user friendly of the four.
Microbenchmark test findings
After using each of the four packages I decided, taking usability, memory issues and speed into account, not to continue further with the
WriteXLS and the
XLConnect packages. I then tested the speed of the remaining two,
xlsx, using the
microbenchmark package with an evaluation number of 5 and varying sheet numbers.
1 Sheet with a 30000 x 40 dataframe run 5 times:
3 Sheets with a 30000 x 40 dataframe run 5 times:
5 Sheets with a 30000 x 40 dataframe run 5 times:
10 Sheets with a 30000 x 40 dataframe run 5 times:
Graphing the mean, median, min and max times for both the
openxlsx and the
xlsxpackages show that other than min time, the
openxlsx package is faster.
After testing each of the four packages using a dataframe of 30000 x 40 random numbers bewteen 1 and 100, I found that 2 of the packages, the
XLConnectpackage and the
WriteXLS package, were either not very user friendly or were simply too slow. The remaining two, the
openxlsx package and the
xlsx package, were then tested using the
microbenchmark package with an evaluation number of 5 and various sheet numbers. The tests concluded that the
openxlsx package was overall faster. With this in mind and with the ease of use for this package, it comes in as the favorite.