Site icon R-bloggers

Understanding Data Import and Export in R: Working with CSV and Excel Files

[This article was first published on A Statistician's R Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level2" data-number="1">

1 Introduction

When learning R, most people focus on functions, models, and visualizations. However, many real-world problems start much earlier — at the data import stage — and end much later — with exporting results.

If data is read incorrectly, no statistical method can save the analysis.

In this post, we focus on the logic of data import and export in R, using CSV and Excel files. Rather than memorizing functions, we build a mental model for how R interacts with files.

< section id="why-data-import-and-export-matters" class="level2" data-number="2">

2 Why Data Import and Export Matters

Data analysis is a workflow:

Data source → Import → Analysis → Results → Export → Sharing

Errors often occur at the import stage:

The result?
A model that runs perfectly — on the wrong data.

< section id="csv-vs-excel-not-a-competition" class="level2" data-number="3">

3 CSV vs Excel: Not a Competition

Before touching R, we should clarify the difference between file formats.

< section id="csv-files" class="level3" data-number="3.1">

3.1 CSV Files

Example:

total_bill,tip,sex
16.99,1.01,Female
< section id="excel-files" class="level3" data-number="3.2">

3.2 Excel Files

Key idea:
CSV is a data transport format.
Excel is a communication format.

< section id="working-directory-where-r-actually-looks" class="level2" data-number="4">

4 Working Directory: Where R Actually Looks

One of the most common beginner mistakes has nothing to do with R syntax.

R does not search your entire computer for files. It only looks inside its working directory.

getwd()

This command shows where R is currently looking.

If a file exists on your computer but not in this directory, R behaves as if the file does not exist.

This is why errors like:

cannot open the connection

usually indicate a path problem, not a coding problem.

< section id="the-example-dataset-tips" class="level2" data-number="5">

5 The Example Dataset: tips

Throughout this post, we use a single dataset: tips.

Data source:
https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv

< section id="reading-csv-files-the-core-logic" class="level2" data-number="6">

6 Reading CSV Files: The Core Logic

When R reads a CSV file, it needs answers to four questions:

  1. How are columns separated?

  2. Is the first row a header?

  3. What is the decimal separator?

  4. How should text be interpreted?

These answers are provided via function arguments.

< section id="read.table-the-foundation" class="level2" data-number="7">

7 read.table(): The Foundation

All CSV-reading functions in base R are built on read.table().

tips <- read.table(
  file = "tips.csv",
  header = TRUE,
  sep = ",",
  dec = ".",
  stringsAsFactors = FALSE
)

Understanding this function means understanding CSV import in R.

< section id="read.csv-and-its-assumptions" class="level2" data-number="8">

8 read.csv() and Its Assumptions

read.csv() is simply a shortcut for a common case:

tips <- read.csv("tips.csv")

This works perfectly — if the assumptions match the file.

The dangerous part? R may not throw an error even if the assumptions are wrong.

The most dangerous errors are silent ones.

< section id="read.csv2-and-regional-differences" class="level2" data-number="9">

9 read.csv2() and Regional Differences

In many European datasets:

total_bill;tip;sex
16,99;1,01;Female

For this structure, read.csv2() is designed.

tips2 <- read.csv2("tips_semicolon.csv")

Important nuance:
Even if decimals use dots, read.csv2() may still work in some cases — but this is not guaranteed.

Correct approach:

Always inspect the file structure before choosing the function.

< section id="writing-csv-files-from-r" class="level2" data-number="10">

10 Writing CSV Files from R

Data analysis rarely ends in R. Results are shared as files.

< section id="writing-comma-separated-csv" class="level3" data-number="10.1">

10.1 Writing comma-separated CSV

write.csv(tips, "tips_comma.csv", row.names = FALSE)
< section id="writing-semicolon-separated-csv" class="level3" data-number="10.2">

10.2 Writing semicolon-separated CSV

write.csv2(tips, "tips_semicolon.csv", row.names = FALSE)

Choosing the correct format depends on who will read the file next.

< section id="why-we-still-need-excel" class="level2" data-number="11">

11 Why We Still Need Excel

CSV is technically superior in many ways. Yet Excel remains dominant in practice.

Why?

Excel is not an analysis tool — but it is a powerful delivery tool.

< section id="working-with-excel-in-r-openxlsx" class="level2" data-number="12">

12 Working with Excel in R: openxlsx

The openxlsx package allows Excel operations without requiring Excel itself.

library(openxlsx)
< section id="writing-a-simple-excel-file" class="level3" data-number="12.1">

12.1 Writing a simple Excel file

write.xlsx(tips, "tips.xlsx", sheetName = "tips")
< section id="reading-from-excel" class="level3" data-number="12.2">

12.2 Reading from Excel

tips_excel <- read.xlsx("tips.xlsx", sheet = 1)
< section id="multiple-sheets-a-mini-report" class="level2" data-number="13">

13 Multiple Sheets: A Mini Report

Excel shines when organizing related tables.

summary_tips <- aggregate(tip ~ day, data = tips, mean)

wb <- createWorkbook()

addWorksheet(wb, "Raw Data")
writeData(wb, "Raw Data", tips)

addWorksheet(wb, "Summary")
writeData(wb, "Summary", summary_tips)

saveWorkbook(wb, "tips_report.xlsx", overwrite = TRUE)

One file.

Multiple views.

Clean structure.

< section id="common-mistakes-to-watch-for" class="level2" data-number="14">

14 Common Mistakes to Watch For

Most errors are not caused by R, but by assumptions:

A healthy habit after every import:

head(data)
str(data)
summary(data)
< section id="final-thoughts" class="level2" data-number="15">

15 Final Thoughts

If you can:

you have already crossed one of the most important thresholds in data analysis.

For a complementary discussion, you may also find this article useful:
https://medium.com/p/e730f4a84b3b


Extended version on Medium:
https://medium.com/@Fatih.Tuzen/understanding-data-import-and-export-in-r-working-with-csv-and-excel-files-6322e61049b2

< !-- -->
To leave a comment for the author, please follow the link and comment on their blog: A Statistician's R Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version