R is a powerful system for statistical analysis and data visualization. However, it’s not exactly user-friendly for data storage, so, still for several time your data will be archived using Excel, SPSS or similar programs.
How to open into R a file stored using the SPSS (.sav) format? There are some packages as
foreign which allow to perform this operation. The package foreign is already present in the base distribution of R system and you just need to activate it using the function
When you activated the package, you can open your file if you know where it’s located… the simpler method to locate a file (Yes, I know, you can set the work directory, but I have abrupt manners) is to send the instruction:
The system will open a window for the file access; you can look for your file in the folder where you have earlier archived it. R return the path to file:
Now, you can read the SPSS file using foreign, specifying the path to file (yes, you have understood, you need to copy and paste the path):
dataset = read.spss("C:\\PathToFile\\MyDataFile.sav", to.data.frame=TRUE)
Do you want avoid the copy and paste? You can assign the result of the instruction
file.choose() to an object named
db (abbreviation for database):
db = file.choose()
As before, you obtained the path to file, but this time R not shows it because you assigned to the object
db. Then, the object
db contains a character string identifying the path that R will have to follow to recover the file. Using this way, you need to run
file.choose() at every session, while if you write the path you can use it every time. Ready go?
dataset = read.spss(db, to.data.frame=TRUE)
read.spss() read the dataset in sav format. You must be careful, however, to specify as
TRUE the argument
to.data.frame, which requires to the function to arrange the data within a data frame (i.e. the class of R object for data tables).
Yolo, man. Another very simple method to open an SPSS file into R is to save the file in a format which R manage very well: the dat format (tab-delimited). So, you save your SPSS file in .dat and you behave as before, searching the file with
file.choose() and assigning the resulting string to an object.
The function to read the file, now, is
read.table(). Pay attention to missing data: if there are missing values, you should to indicate to R what is their code (e.g. 999), specifying a value for the argument
Do you have your file in .dat format?
db = file.choose() dataset = read.table(db, header=TRUE)
header = TRUE specifies that the first row of the file contains the variable names, therefore these values aren’t to interpret as data.
Being in a hurry? Conflate all the operations in just one line:
dataset = read.spss(file.choose(), to.data.frame=TRUE)
or, with .dat:
dataset = read.table(file.choose(), header = TRUE)
Once you import a file, it’s a good idea to verify that the reading was performed with accuracy.
To check the size of your database, use the
dim() function. You will obtain two numbers, the first one refers to the cases (rows in your database), while the second one is the number of variables (the columns of your database).
Further, can be useful to visualize a preview of data. To inspect the first six rows of the dataset, use the
To inspect the flast six rows of the dataset, use the
To inspect the structure of the dataset, use the
Do you want visualize the entire matrix of your dataset? If the data table is large, it is advisable to use the function
fix() which allows you to manually edit the cell content.
This post was originally written in Italian by Davide Massidda and Antonello Preti and published in InsulaR blog
How to open into R a Microsoft Excel file? Please read again the post Read Excel files from R.