Site icon R-bloggers

Using Excel for Data Entry

[This article was first published on R – r4stats.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article shows you how to enter data so that you can easily open in statistics packages such as R, SAS, SPSS, or jamovi (code or GUI steps below). Excel has some statistical analysis capabilities, but they often provide incorrect answers. For a comprehensive list of these limitations, see http://www.forecastingprinciples.com/paperpdf/McCullough.pdf and http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction.

Simple Data Sets

Most data sets are easy to enter using the following rules.

Relational Data Sets

Some data sets contain observations that are related in some way. They may be people who all live in the same home, or samples that all came from the same site. There may be higher levels of relations, such as students within classrooms, then classrooms within schools. Data that contains such relations (a.k.a. nesting) may be stored in a “relational” database, but those are harder to learn than spreadsheet software. Relational data can easily be entered as two or more spreadsheets and combined later during data analysis. This saves quite a lot of data entry as the higher level data (e.g. family house value, socio-economic status, etc.) only needs to be entered once, instead of on several lines (e.g. for each family member).

If you have such data, make sure that each data set contains a “key” variable that acts as a  common ID number for family, site, school, etc. You can later read two files at a time and combine them matching on that key variable. R calls this combination a join or merge; SAS calls it a merge; and SPSS calls it Add Variables.

Example of a Good Data Structure

This data set follows all the rules for simple data sets above. Any statistics software can read it easily.

ID
Gender Income

1

0

32000

2

1

23000

3

0

137000

4

1

54000

5

1

48500

Example of a Bad Data Structure

This is the same data shown above, but it violates the rules for simple data sets in several ways: there is no column for gender, the income values contain dollar signs and commas, variable names appear on more than one line, variable names are not even consistent (income vs. salary), and there is a blank line in the middle. This would not be easy to read!

Data for Female Subjects
ID Income

1

$32,000

3

$137,000

   
Data for Male Subjects
ID Salary

2

$23,000

4

$54,000

5

$48,500

Excel Tips for Data Entry

Backups

Save your data frequently and make backup copies often. Don’t leave all your backup copies connected to a computer which would leave them vulnerable to attack by viruses. Don’t store them all in the same building or you risk losing all your hard work in a fire or theft. Get a free account at http://drive.google.com, http://dropbox.com, or http://onedrive.live.com and save copies there.

 Steps for Reading Excel Data Into R

There are several ways to read an Excel file into R. Perhaps the easiest method uses the following commands. They read an excel file named mydata.xlsx into an R data frame called mydata. For examples on how to read many other file formats into R, see:
http://r4stats.com/examples/data-import/.

# Do this once to install:
install.packages("readxl")

# Each time you read a file, follow these steps
library("readxl")
mydata <- read_excel("mydata.xlsx")
mydata 

Steps for Reading Excel Data Into SPSS

  1. In SPSS, choose File> Open> Data.
  2. Change the “Files of file type” box to “Excel (*.xlsx)”
  3. When the Read Excel File box appears, select the Worksheet name and check the box for Read variable names from the first row of data, then click OK.
  4. When the data appears in the SPSS data editor spreadsheet, Choose File: Save as and leave the Save as type box to SPSS (*.sav).
  5. Enter the name of the file without the .sav extension and then click Save to save the file in SPSS format.
  6. Next time open the .sav version, you won’t need to convert the file again.
  7. If you create variable or value labels in the SPSS file and then need to read your data from Excel again you can copy them into the new file. First, make sure you use the same variable names. Next, after opening the file in SPSS, use Copy Data Properties from the Data menu. Simply name the SPSS file that has properties (such as labels) that you want to copy, check off the things you want to copy and click OK. 

Steps for Reading Excel Data Into SAS

The code below will read an excel file called mydata.xlsx and store it as a permanent SAS dataset called sasuser.mydata. If your organization is considering migrating from SAS to R, I offer some tips here: http://r4stats.com/articles/migrate-to-r/

proc import datafile="mydata.xlsx"
dbms=xlsx out=sasuser.mydata replace;
getnames=yes;
run;

Steps for Reading Excel Data into jamovi

At the moment, jamovi can open CSV, JASP, SAS, SPSS, and Stata files, but not Excel. So you must open the data in Excel and Save As a comma separated value (CSV) file. The ability to read Excel files should be added to a release in the near future. For more information about the free and open source jamovi software, see my review here:
http://r4stats.com/2018/02/13/jamovi-for-r-easy-but-controversial/.

More to Come

If you found this post useful, I invite you to check out many more on my website or follow me on Twitter where I announce my blog posts.

To leave a comment for the author, please follow the link and comment on their blog: R – r4stats.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.