You might find that loading data into R can be quite frustrating. Almost every single type of file that you want to get into R seems to require its own function, and even then you might get lost in the functions’ arguments. In short, it can be fairly easy to mix up things from time to time, whether you are a beginner or a more advanced R user…
To cover these needs, DataCamp decided to publish a comprehensive, yet easy tutorial to quickly importing data into R, going from simple text files to the more advanced SPSS and SAS files. Keep on reading to find out how to easily import your files into R!
To import data into R, you first need to have data. This data can be saved in a file onto your computer (e.g. a local Excel, SPSS, or some other type of file), but can also live on the Internet or be obtained through other sources. Where to find these data are out of the scope of this tutorial, so for now it’s enough to mention this blog post, which explains well how to find data on the internet, and DataCamp’s interactive tutorial, which deals with how to import and manipulate Quandl data sets.
Tip: before you move on and discover how to load your data into R, it might be useful to go over the following checklist that will make it easier to import the data correctly into R:
- If you work with spreadsheets, the first row is usually reserved for the header, while the first column is used to identify the sampling unit;
- Avoid names, values or fields with blank spaces, otherwise each word will be interpreted as a separate variable, resulting in errors that are related to the number of elements per line in your data set;
- If you want to concatenate words, inserting a . in between to words instead of a space;
- Short names are prefered over longer names;
- Try to avoid using names that contain symbols such as
- Delete any comments that you have made in your Excel file to avoid extra columns or NA’s to be added to your file; and
- Make sure that any missing values in your data set are indicated with
Preparing Your R Workspace
Make sure to go into RStudio and see what needs to be done before you start your work there. You might have an environment that is still filled with data and values, which you can all delete using the following line of code:
rm() function allows you to “remove objects from a specified environment”. In this case, you specify that you want to consider a list for this function, which is the outcome of the
ls() function. This last function returns you a vector of character strings that gives the names of the objects in the specified environment. Since this function has no argument, it is assumed that you mean the data sets and functions that you as a user have defined.
Next, you might also find it handy to know where your working directory is set at the moment:
And you might consider changing the path that you get as a result of this function, maybe to the folder in which you have stored your data set:
Getting Data From Common Sources into R
You will see that the following basic R functions focus on getting spreadsheets into R, rather than Excel or other type of files. If you are more interested in the latter, scroll a bit further to discover the ways of importing other files into R.
Importing TXT files
If you have a
.txt or a tab-delimited text file, you can easily import it with the basic R function
read.table(). In other words, your file will look similar to this
// Contents of .txt 1 6 a 2 7 b 3 8 c 4 9 d 5 10 e
and can be imported as follows:
df <- read.table("
.txt", header = FALSE)
Note that by using this function, your data from the file will become a
data.frame object. Note also that the first argument isn’t always a filename, but could possibly also be a webpage that contains data. The
header argument specifies whether or not you have specified column names in your data file. The final result of your importing will show in the RStudio console as:
V1 V2 V3 1 1 6 a 2 2 7 b 3 3 8 c 4 4 9 d 5 5 10 e
Good to know
read.table() function is the most important and commonly used function to import simple data files into R. It is easy and flexible. That is why you should definitely check out our previous tutorial on reading and importing Excel files into R, which explains in great detail how to use the
read.table() function optimally.
For files that are not delimited by tabs, like
.csv and other delimited files, you actually use variants of this basic function. These variants are almost identical to the
read.table() function and differ from it in three aspects only:
- The separator symbol;
headerargument is always set at TRUE, which indicates that the first line of the file being read contains the header with the variable names;
fillargument is also set as TRUE, which means that if rows have unequal length, blank fields will be added implicitly.
Importing CSV Files
If you have a file that separates the values with a
;, you usually are dealing with a
.csv file. It looks somewhat like this:
// Contents of .csv file Col1,Col2,Col3 1,2,3 4,5,6 7,8,9 a,b,c
In order to successfully load this file into R, you can use the
read.table() function in which you specify the separator character, or you can use the
read.csv2() functions. The former function is used if the separator is a
,, the latter if
; is used to separate the values in your data file.
Remember that the
read.csv() as well as the
read.csv2() function are almost identical to the
read.table() function, with the sole difference that they have the
fill arguments set as
TRUE by default.
df <- read.table("
.csv", header = FALSE, sep = ",") df <- read.csv(" .csv", header = FALSE) df <- read.csv2(" .csv", header= FALSE)
Tip: if you want to know more about the arguments that you can use in the
read.csv2() functions, you can always check out our reading and importing Excel files into R tutorial, which explains in great detail how to use the
Importing Files With Other Separator Characters
In case you have a file with a separator character that is different from a tab, a comma or a semicolon, you can always use the
read.delim2() functions. These are variants of the
read.table() function, just like the
read.csv() function. Consequently, they have much in common with the
read.table() function, except for the fact that they assume that the first line that is being read in is a header with the attribute names, while they use a tab as a separator instead of a whitespace, comma or semicolon. They also have the
fill argument set to
TRUE, which means that blank field will be added to rows of unequal length.
You can use the
read.delim2() functions as follows:
df <- read.delim("
") df <- read.delim2(" ")
Importing Excel Files Into R
To load Excel files into R, you first need to do some further prepping of your workspace in the sense that you need to install packages. Simply run the following piece of code to accomplish this:
When you have installed the package, you can just type in the following to activate it in your workspace:
To check if you already installed the package or not, type in
Importing Excel Files With The XLConnect Package
The first way to get Excel files directly into R is by using the
XLConnect package. Install the package and if you’re not sure whether or not you already have it, check if it is already there.
Next, you can start using the
readWorksheetFromFile() function, just like shown here below:
library(XLConnect) df <- readWorksheetFromFile("
", sheet = 1)
Note that you need to add the
sheet argument to specify which sheet you want to load into R. You can also add more specifications. You can find these explained in our tutorial on reading and importing Excel files into R.
You can also load in a whole workbook with the
loadWorkbook() function, to then read in worksheets that you desire to appear as data frames in R through
wb <- loadWorkbook("
") df <- readWorksheet(wb, sheet=1)
Note again that the
sheet argument is not the only argument that you can use in
readWorkSheetFromFile() . If you want more information about the package or about all the arguments that you can pass to the
readWorkSheetFromFile() function or to the two alternative functions that were mentioned, you can visit the package’s RDocumentation page.
Importing Excel Files With The Readxl Package
readxl package has only recently been published and allows R users to easily read in Excel files, just like this:
library(readxl) df <- read_excel("
Note that the first argument specifies the path to your
.xlsx file, which you can set by using the
setwd() functions. You can also add a
sheet argument, just like with the XLConnect package, and many more arguments on which you can read up here or in this blog post.
To get JSON files into R, you first need to install or load the rjson package. If you want to know how to install packages or how to check if packages are already installed, scroll a bit up to the section of importing Excel files into R.
Once you have done this, you can use the
fromJSON() function. Here, you have two options:
Your JSON file is stored in your working directory.
library(rjson) JsonData <- fromJSON(file = "
Your JSON file is available through a URL.
library(rjson) JsonData <- fromJSON(file = "
Importing XML Data Into R
If you want to get XML data into R, one of the easiest ways is through the usage of the XML package. First, you make sure you install and load the XML package in your workspace, just like demonstrated above. Then, you can use the
xmlTreeParse() function to parse the XML file directly from the web:
library(XML) xmlfile <- xmlTreeParse("
Next, you can check whether R knows that
xmlfile is in XML by entering:
class(xmlfile) #Result is usually similar to this:  "XMLDocument" "XMLAbstractDocument"
Tip: you can use the
xmlRoot() function to access the top node:
topxml <- xmlRoot(xmlfile)
You will see that the data is presented kind of weirdly when you try printing out the
xmlfile vector. That is because the XML file is still a real XML document in R at this point. In order to put the data in a data frame, you first need to extract the XML values. You can use the
xmlSApply() function to do this:
topxml <- xmlSApply(topxml, function(x) xmlSApply(x, xmlValue))
The first argument of this function will be
topxml, since it is the top node on whose children you want to perform a certain function. Then, you list the function that you want to apply to each child node. In this case, you want to extract the contents of a leaf XML node. This, in combination with the first argument
topxml, will make sure that you will do this for each leaf XML node.
Lastly, you put the values in a dataframe! You use the
data.frame() function in combination with the matrix transpostition function
t() to do this. Additionally you also specify that no row names should be included:
xml_df <- data.frame(t(topxml), row.names=NULL)
You can also choose not to do all the previous steps, which are a bit more complicated, and to just do the following:
url <- "" data_df <- xmlToDataFrame(url)
Importing Data From HTML Tables Into R
Getting data From HTML tables into R is pretty straightforward:
Note that the
which argument allows you to specify which tables to return from within the document.
If this gives you an error in the nature of “failed to load external entity”, don’t be confused: this error has been signaled by many people and has been confirmed by the package’s author here. You can work around this by using the
RCurl package in combination with the
XML package to read in your data:
library(XML) library(RCurl) url <- "YourURL" urldata <- getURL(url) data <- readHTMLTable(urldata, stringsAsFactors = FALSE)
Note that you don’t want the strings to be registered as factors or categorical variables! You can also use the httr package to accomplish exactly the same thing, except for the fact that you will want to convert the raw objects of the URL’s content to characters by using the
library(httr) urldata <- GET(url) data <- readHTMLTable(rawToChar(urldata$content), stringsAsFactors = FALSE)
Getting Data From Statistical Software Packages into R
For the following more advanced statistical software programs, there are corresponding packages that you first need to install in order to read your data files into R, just like you do with Excel or JSON.
Importing SPSS Files into R
If you’re a user of SPSS software and you are looking to import your SPSS files into R, firstly install the foreign package. After loading the package, run the
read.spss() function that is contained within it and you should be good to go!
library(foreign) mySPSSData <- read.spss("example.sav")
Tip: if you wish the result to be displayed in a data frame, make sure to set the
to.data.frame argument of the
read.spss() function to
TRUE. Furthermore, if you do NOT want the variables with value labels to be converted into R factors with corresponding levels, you should set the
use.value.labels argument to
library(foreign) mySPSSData <- read.spss("example.sav", to.data.frame=TRUE, use.value.labels=FALSE)
Remember that factors are variables that can only contain a limited number of different values. As such, they are often called “categorical variables”. The different values of factors can be labeled and are therefore often called “value labels”
Importing Stata Files into R
To import Stata files, you keep on using the
library(foreign) mydata <- read.dta("
Importing Systat Files into R
library(foreign) mydata <- read.systat("
Importing SAS Files into R
For those R users that also want to import SAS file into R, it’s very simple! For starters, install the
sas7bdat package. Load it, and then invoke the
read.sas7bdat() function contained within the package and you are good to go!
library(sas7bdat) mySASData <- read.sas7bdat("example.sas7bdat")
Does this function interest you and do you want to know more? Visit the Rdocumentation page.
Importing Minitab Files into R
Is your software of choice for statistical purposes Minitab? Look no further if you want to use Minitab data in R!
.mtp files into R is pretty straightforward. To begin with, install the foreign package and load it. Then simply use the
read.mtp() function from that package:
library(foreign) myMTPData <- read.mtp("example2.mtp")
Importing RDA or RData Files into R
If your data file is one that you have saved in R as an
.rdata file, you can read it in as follows:
Getting Data From Other Sources Into R
Since this tutorial focuses on importing data from different types of sources, it is only right to also mention that you can import data into R that comes from databases, webscraping, etc.
Importing Data From Databases
Importing Data From Relational Databases
If you are interested in knowing more about this last package, make sure to check out DataCamp’s interactive course, which is definitely a must for everyone that wants to use dplyr to access data stored outside of R in a database. Furthermore, the course also teaches you how to perform sophisticated data manipulation tasks using dplyr!
Importing Data From Non-Relational Databases
For more information on loading data from non-relational databases into R, like data from MongoDB, you can read this blogpost from “Yet Another Blog in Statistical Computing” for an overview on how to load data from MongoDB into R.
Importing Data Through Webscraping
Tip: you can check out this set of amazing tutorials which deal with the basics of webscraping.
Importing Data Through The TM Package
For those of you who are interested in importing textual data to start mining texts, you can read in the text file in the following way after having installed and activated the tm package:
text <- readLines("
Then, you have to make sure that you load these data as a corpus in order to get started correctly:
docs <- Corpus(VectorSource(text))
You can find an accessible tutorial on text mining with R here.
This Is Just The Beginning…
Loading your data into R is just a small step in your exciting data analysis, manipulation and visualization journey. DataCamp is here to guide you through it!
If you are already a more advanced R user, you might be interested in reading our tutorial on 15 Easy Solutions To Your Data Frame Problems In R.
Also, don’t forget to pass by DataCamp to see whether our offer of interactive courses on R can interest you!
The post This R Data Import Tutorial Is Everything You Need appeared first on The DataCamp Blog .