In this post, I’ll describe a fun visualization of Anscombe’s quartet I whipped up recently.
If you aren’t familiar with Anscombe’s quartet, here’s a brief description from its Wikipedia entry: “Anscombe’s quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. He described the article as being intended to counter the impression among statisticians that ‘numerical calculations are exact, but graphs are rough.’ “
In essence, there are 4 different datasets with quite different patterns in the data. Fitting a linear regression model through each dataset yields (nearly) identical regression coefficients, while graphing the data makes it clear that the underlying patterns are very different. What’s amazing to me is how these simple data sets (and accompanying graphs) make immediately intuitive the importance of data visualization, and drive home the point of how well-constructed graphs can help the analyst understand the data he or she is working with.
Because the 1980’s are back, I decided to make a visualization of Anscombe’s quartet using the like-most-totally-rad 1980’s graphing elements I could come up with. I was aided with the colors by a number of graphic design palettes with accompanying hex codes. I used the excellent showtext package for the 1980’s font, which comes from the Google font “Press Start 2P.” (Note, if you’re reproducing the graph at home, the fonts won’t show properly in RStudio. Run the code in the standalone R program and everything works like a charm). I had to tweak a number of graphical parameters in order to get the layout right, but in the end I’m quite pleased with the result.
In this post, I used data available in R to make a 1980’s-themed version of the Anscombe quartet graphs. The main visual elements I manipulated were the colors and the fonts. R’s wonderful and flexible plotting capabilities (here using base R!) made it very straightforward to edit every detail of the graph to achieve the desired retro-kitsch aesthetic.
OK, so maybe this isn’t the most serious use of R for data analysis and visualization. There are doubtless more important business cases and analytical problems to solve. Nevertheless, this was super fun to do. Data analysis (or data science, or whatever you’d like to call it) is a field in which there are countless ways to be creative with data. It’s not always easy to bring this type of creativity to every applied project, but this blog is a place where I can do any crazy thing I set my mind to and just have fun. Judging by that standard, I think this project was a success.
Coming Up Next
In the next post, I’ll do something a little bit different with data. Rather than doing data analysis, I’ll describe a project in which I used Python to manage and edit meta-data (ID3 tags) in mp3 files. Stay tuned!