Quantitative research often begins with the humble process of counting. Historical documents are never as plentiful as a historian would wish, but counting words, material objects, court cases, etc. can lead to a better understanding of the sources and the subject under study. When beginning the process of counting, the first instinct is to open a spreadsheet. The end result might be the production of tables and charts created in the very same spreadsheet document. In this post, I want to show why this spreadsheet-centric workflow is problematic and recommend the use of a programming language such as R as an alternative for both analyzing and visualizing data.

The post provides a good overview of the pros and cons of using spreadsheets for data analysis, and then provides a useful — aimed at spreadsheet users — to using R for the problematic parts. It includes:

  • Basics of the R command line
  • An overview of the Tidyverse, a suite of R packages for data manipulation
  • Working with data in R: numbers, strings and dates
  • Manipulating data frames by linking operations together with the pipe operator
  • Visualizing data with the ggplot2 package

Correspondence

The guide is built around a worked analysis of an interesting historical data set: the correspondence network from 6,600 letters written to the 16th-century Dutch diplomat Daniel van der Meulen. You can find the complete guide, including a link to download the data for the examples, at the link below.

Jesse Sadler: Excel vs R: A Brief Introduction to R