Articles by R on

Covdata Package

April 10, 2020 | R on

The covdata logo Partly because it grew out of a few code-throughs I was doing, but mostly as a classroom exercise, I pulled together a small data package for R called covdata, available at It contains COVID-19 data from three sources: National level data from ... [Read more...]

A COVID Small Multiple

March 27, 2020 | R on

John Burn-Murdoch has been doing very good work at the Financial Times producing various visualizations of the progress of COVID-19. One of his recent images is a small-multiple plot of cases by country, showing the trajectory of the outbreak for a large number of countries, with a the background of ... [Read more...]

Covid 19 Tracking

March 21, 2020 | R on

Get Your Epidemiology from Epidemiologists The COVID-19 pandemic continues to rage. I’m strongly committed to what should be the uncontroversial view that we should listen to the recommendations of those institutions and individuals with strong expertise in the relevant fields of Public Health, Epidemiology, Disease Control, and Infection Modeling. ... [Read more...]

U.S. Census Counts Data

March 15, 2020 | R on

As promised previously, I packaged up the U.S. Census data that I pulled together to make the population density and pyramid animations. The package is called uscenpops and it’s available to install via GitHub or with install.packages() if you set up drat first. The instructions are on ... [Read more...]

Spanish Flu

March 5, 2020 | R on

I was teaching some dplyr and ggplot today. Because Coronavirus is in the, uh, air, I decided to work with the mortality data from and have the students practice getting a bunch of data files into R and then plotting the resulting ... [Read more...]

A New Baby Boom Poster

February 26, 2020 | R on

I wanted to work through a few examples of more polished graphics done mostly but perhaps not entirely in R. So, I revisited the Baby Boom visualizations I made a while ago and made a new poster with them. This allowed me to play around with a few packages that ... [Read more...]

Dataviz Workshop at RStudio::conf

February 18, 2020 | R on

Workshop materials are available here: Consider buying the book; it’s good: Data Visualization: A Practical Introduction / Buy on Amazon I was delighted to have the opportunity to teach a two-day workshop on Data Visualization using ggplot2 at this year’s rstudio::conf(2020) in January. ... [Read more...]

Cleaning the Table

November 10, 2019 | R on

While I’m talking about getting data into R this weekend, here’s another quick example that came up in class this week. The mortality data in the previous example were nice and clean coming in the door. That’s usually not the case. Data can be and usually is ... [Read more...]

Reading in Data

November 9, 2019 | R on

Here’s a common situation: you have a folder full of similarly-formatted CSV or otherwise structured text files that you want to get into R quickly and easily. Reading data into R is one of those tasks that can be a real source of frustration for beginners, so I like ... [Read more...]

Dogs of New York

October 28, 2019 | R on

The other week I took a few publicly-available datasets that I use for teaching data visualization and bundled them up into an R package called nycdogs. The package has datasets on various aspects of dog ownership in New York City, and amongst other things you can draw maps with it ... [Read more...]

Reconstructing Images Using PCA

October 27, 2019 | R on

A decade or more ago I read a nice worked example from the political scientist Simon Jackman demonstrating how to do Principal Components Analysis. PCA is one of the basic techniques for reducing data with multiple dimensions to some much smaller subset that nevertheless represents or condenses the information we ... [Read more...]

Widening Multiple Columns Redux

October 21, 2019 | R on

Last year I wrote about the slightly tedious business of spreading (or widening) multiple value columns in Tidyverse-flavored R. Recent updates to the tidyr package, particularly the introduction of the pivot_wider() and pivot_longer() functions, have made this rather more straightforward to do than before. Here I recapitulate the ... [Read more...]

Parsing Sda Pages

October 15, 2019 | R on

SDA is a suite of software developed at Berkeley for the web-based analysis of survey data. The Berkeley SDA archive ( lets you run various kinds of analyses on a number of public datasets, such as the General Social Survey. It also provides consistently-formatted HTML versions of ... [Read more...]

Back in the GSSR

October 10, 2019 | R on

The General Social Survey, or GSS, is one of the cornerstones of American social science and one of the most-analyzed datasets in Sociology. It is routinely used in research, in teaching, and as a reference point in discussions about changes in American society since the early 1970s. It is also ... [Read more...]

Earned Doctorates

June 23, 2019 | R on

PhDs awarded in selected disciplines, 2006-2016. Thierry Rossier asked me for the code to produce plots like the one above. The data come from the Survey of Earned Doctorates, a very useful resource for tracking trends in PhDs awarded in the United States. The plot is made with geom_line() ... [Read more...]

Baby Name Animation

May 13, 2019 | R on

I was playing around with the gganimate package this morning and thought I’d make a little animation showing a favorite finding about the distribution of baby names in the United States. This is the fact—I think first noticed by Laura Wattenberg, of the Baby Name Voyager—that there ... [Read more...]

A Quick and Tidy Look at the 2018 GSS

March 22, 2019 | R on

The data from the 2018 wave of the General Social Survey was released during the week, leading to a flurry of graphs showing various trends. The GSS is one of the most important sources of information on various aspects of U.S. society. One of the best things about it is ...
[Read more...]

Installing Socviz

March 12, 2019 | R on

I’ve gotten a couple of reports from people having trouble installing the socviz library that’s meant to be used with Data Visualization: A Practical Introduction. As best as I can tell, the difficulties are being caused by GitHub’s rate limits. The symptom is that, after installing the ... [Read more...]
1 2 3

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)