How to open an SPSS file into R

March 26, 2014
By

(This article was first published on Milano R net, and kindly contributed to R-bloggers)

R is a powerful system for statistical analysis and data visualization. However, it’s not exactly user-friendly for data storage, so, still for several time your data will be archived using Excel, SPSS or similar programs.

How to open into R a file stored using the SPSS (.sav) format? There are some packages as foreign which allow to perform this operation. The package foreign is already present in the base distribution of R system and you just need to activate it using the function library().

library(foreign)

When you activated the package, you can open your file if you know where it’s located… the simpler method to locate a file (Yes, I know, you can set the work directory, but I have abrupt manners) is to send the instruction:

file.choose()

The system will open a window for the file access; you can look for your file in the folder where you have earlier archived it. R return the path to file:

"C:\\PathToFile\\MyDataFile.sav"

Now, you can read the SPSS file using foreign, specifying the path to file (yes, you have understood, you need to copy and paste the path):

dataset = read.spss("C:\\PathToFile\\MyDataFile.sav", to.data.frame=TRUE)

Do you want avoid the copy and paste? You can assign the result of the instruction file.choose() to an object named db (abbreviation for database):

db = file.choose()

As before, you obtained the path to file, but this time R not shows it because you assigned to the object db. Then, the object db contains a character string identifying the path that R will have to follow to recover the file. Using this way, you need to run file.choose() at every session, while if you write the path you can use it every time. Ready go?

dataset = read.spss(db, to.data.frame=TRUE)

The instruction read.spss() read the dataset in sav format. You must be careful, however, to specify as TRUE the argument to.data.frame, which requires to the function to arrange the data within a data frame (i.e. the class of R object for data tables).

Yolo, man. Another very simple method to open an SPSS file into R is to save the file in a format which R manage very well: the dat format (tab-delimited). So, you save your SPSS file in .dat and you behave as before, searching the file with file.choose() and assigning the resulting string to an object.

The function to read the file, now, is read.table(). Pay attention to missing data: if there are missing values, you should to indicate to R what is their code (e.g. 999), specifying a value for the argument na.strings.

Do you have your file in .dat format?

db = file.choose()
dataset = read.table(db, header=TRUE)

The argument header = TRUE specifies that the first row of the file contains the variable names, therefore these values aren’t to interpret as data.

Being in a hurry? Conflate  all the operations in just one line:

dataset = read.spss(file.choose(), to.data.frame=TRUE)

or, with .dat:

dataset = read.table(file.choose(), header = TRUE)

Once you import a file, it’s a good idea to verify that the reading was performed with accuracy.

To check the size of your database, use the dim() function. You will obtain two numbers, the first one refers to the cases (rows in your database), while the second one is the number of variables (the columns of your database).

dim(dataset)

Further, can be useful to visualize a preview of data. To inspect the first six rows of the dataset, use the head() function:

head(dataset)

To inspect the flast six rows of the dataset, use the tail() function:

tail(dataset)

To inspect the structure of the dataset, use the str() function:

str(dataset)

Do you want visualize the entire matrix of your dataset? If the data table is large, it is advisable to use the function View(), or fix() which allows you to manually edit the cell content.

View(dataset)
fix(dataset)

This post was originally written in Italian by Davide Massidda and Antonello Preti and published in InsulaR blog

How to open into R a Microsoft Excel file? Please read again the post Read Excel files from R.

To leave a comment for the author, please follow the link and comment on his blog: Milano R net.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.