R Read and Write xlsx Files
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.
R has built-in functionality that makes it easy to read and write an xlsx file.
Sample xlsx File
To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx with the following data:
We will be reading these data with the help of R's built-in functions.
Install and Load xlsx Package
In order to read, write, and format Excel files into R, we first need to install and load the xlsx package as:
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
Here, we have successfully installed and loaded the xlsx package.
Now, we are able to read data from an xlsx file.
Read an xlsx File in R
In R, we use the read.xlsx() function to read an xlsx file available in our current directory. For example,
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# read studentinfo.xlsx file from our current directory
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
# display xlsx file
print(read_data)
Output
      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam        21     Arts             Arizona
In the above example, we have read the studentinfo.xlsx file that is available in our current directory. Notice the code,
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
Here,
- read.xlsx()- reads the xlsx file- studentinfo.xlsxand creates a dataframe that is stored in the read_data variable.
- sheetIndex = 1- reads specified worksheet i.e. 1
Note:
- If the file is in some other location, we have to specify the path along with the file name as: read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1).
- We can also use the read.xlsx2()function if the dataset we are working on is larger.
xlsx rowIndex and colIndex Argument in R
In R, we can also read a specific range of data from excel files. We can pass the rowIndex and colIndex argument inside read.xlsx() to read specific range.
- rowIndex- reads a specific range of rows
- colIndex- read a specific range of columns
Example: Read Range of Rows
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first five rows of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  rowIndex = 1:5
)
# display xlsx file
print(read_data)
Output
      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
In the above example, we have passed rowIndex = 1:5 inside read.xlsx() so the function reads only the first five rows from the studentinfo.xlsx file.
Example: Read Range of Columns
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first three columns of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  colIndex = 1:3
)
# display xlsx file
print(read_data)
Output
      Name      Age   Faculty           
1    Abby       24     Business         
2    Hazzle     23     Engineering   
3    Cathy      20     Engineering    
4    Paterson   22     Arts                
5    Sammy      20     Economics      
6    Pam        21     Arts                
Here, colIndex = 1:3 inside read.xlsx() reads only the first three columns from the studentinfo.xlsx file.
xlsx startRow Argument in R
Sometimes the excel file may contain headers at the beginning that we may not want to include. For example,
Here, the 1st Row of the excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.
To start reading data from a specific row in the excel worksheet, we pass the startRow argument inside read.xlsx().
Let's take a look at an example,
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# start reading from 3rd row
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  startRow = 3
)
# display xlsx file
print(read_data)
Output
      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam        21     Arts             Arizona
In the above example, we have used the startRow argument inside the read.xlsx() function to start reading from the specified row.
startRow = 3 means the first two rows are ignored and read.xlsx() starts reading data from the 3rd row.
Write Into xlsx File in R
In R, we use the write.xlsx() function to write into an xlsx file. We pass the data in the form of dataframe. For example,
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))
# write dataframe1 into file1 xlsx file
write.xlsx(dataframe1, "file1.xlsx")
In the above example, we have used the write.xlsx() function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx(),
write.xlsx(dataframe1, "file1.xlsx")
Here,
- dataframe1- name of the data frame we want to export
- file1.xlsx- name of the xlsx file
Finally, the file1.xlsx file would look like this in our directory:
Rename Current Worksheet
We can rename the current worksheet by using the sheetName argument inside the write.xlsx() function. For example,
# install xlsx package 
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))
# name current worksheet
write.xlsx(dataframe1, "file1.xlsx",
  sheetName = "Voting Eligibility"
)
Here, we have passed sheetname = "Voting Eligibility" inside write.xlsx(), so the name of the sheet is changed to "Voting Eligibility".
So the file1.xlsx looks like this:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
