COVID-19 Data Forum: Data Journalism

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The COVID-19 Data Forum, a joint project of the Stanford Data Science Institute and the R Consortium, is an ongoing series of multidisciplinary webinars where topic experts discuss data-related aspects of the scientific response to the pandemic. The most recent event, held on March 18, 2021, explored the role of data journalism in the pandemic. This was a bit of a departure from previous forum events1 because it focused on issues relating to using and interpreting COVID-19 data, and not on the particular kinds of COVID-19 related data that are available.

I think you will find the webinar video worth watching. If you are a statistician or epidemiologist working on COVID-19, you may find the data journalists’ accounts of difficulties they faced working with COVID data and statistical models instructive. But, even if you are not directly working on COVID, you may find that listening to the journalists fills in some gaps between what you know about statistics and data visualizations and what you see in the news.

The data journalism event was moderated by Dr. Irena Hwang, a data reporter at ProPublica. Speakers included Dr. Mark Hansen, David and Helen Gurley Brown Professor of Journalism and Innovation at Columbia University; Ana Carolina Moreno, a senior data journalist at TV Globo in São Paulo, Brazil; and Meghan Hoyer, Director of Data Reporting at the Washington Post.

The video of the data journalism event is available here. The following short time map and the times referenced in my comments below should be helpful for browsing the ninety minute event.

  • 2:37 Irena Hwang introduces Mark Hansen
  • 3:50 Start of Mark’s talk
  • 19:30 Irena introduces Ana Carolina Moreno (Carol)
  • 21:10 Start of Carol’s talk
  • 39:20 Irena introduces Meghan Hoyer
  • 40:00 Start of Meghan’s talk
  • 1:01:40 Start of discussion

Mark Hansen

In his talk, Mark offers an overview of the profession of data journalism that provides some historical context and emphasizes the hybrid nature of the practice which blends a hard nose detective’s drive to uncover facts with the empathy to tell stories “about who we are and how we live”.

7:00 Mark introduces Joseph Pulitzer’s 1904 paper The College of Journalism in which Pulitzer includes Statistics as a subject journalists should study. On page 673, Pulitzer writes:

You want statistics to tell you the truth. You can find truth there if you know how to get at it, and romance, human interest, humor and fascinating revelations as well.

10:19 Mark describes a piece, An ode to reporter’s notebooks, published by Philip Eil in the Columbia Journalism Review that offers a personal account of reporting: Eil writes:

To report is to be alert and alive at a particular time and place.

11:00 Mark remarks:

when we’re thinking about bringing computation to journalism we are taking that basic curiosity that we are cultivating in our students minds … and adding computational lines of inquiry to that habit of mind, that questioning why things look the way they do…

12:08 Mark calls attention to the report by Charles Berret and Cheryl Phillips Teaching Data And Computational Journalism and describes some recent activities of the Brown Institute at the Columbia School of Journalism.

Ana Carolina Moreno

22:32 Carol introduces Brazil’s universal healthcare system and shows a schematic of the available official and unofficial COVID-19 data sources.

26:00 Carol notes that a platform originally built to track SARS data was adapted to track COVID.

27:38 Carol explains that, in practice, there are many obstacles making it difficult to obtain the data necessary to understand how the pandemic is developing. Some of these are called out in the following slide:

30:37 Carol remarks that hospital data seems to be the most reliable.

31:06 Carol describes how the government changed its policy for reporting deaths. The new scheme of only reporting deaths that have been confirmed in the past twenty-four hours vastly undercounts the current death rate.

31:57 In an effort to obtain more reliable data, a consortium of competing journalists at local news organizations began cooperating by sharing information directly obtained from hospitals every day.

32:35 Carol provides a view of day-to-day journalism at the local news organizations and describes how the data journalist scrape data on a daily basis to populate dashboards showing rolling averages and daily indicators. By focusing on the more reliable hospitalization data journalists are doing their best to track the spread of the pandemic an expose inequities in the health care system.

Meghan Hoyer

40:07 Meghan begins her walk through of what last year was like for data journalists who were trying to tell the story of the pandemic in real time as it was happening.

41:22 Meghan recounts her experiences trying to make sense of COVID-19 models and expresses the frustration she and other data journalists felt with the multitude of contradictory predictive models.

44:03 In a memorable quote, Meghan remarks:

Models were inherently problematic and yet they were being forced upon us by society…

Consequently journalists at the AP agreed and decided that they were not going to base stories on models.

In absence of reliable case data, and wanting nothing to do with the models, Meghan explains that data journalists turned to whatever data they could get their hands on to quantify the story of the pandemic.

46:00 Meghan recounts how journalists used garbage pickup data as a proxy for population density to estimate where people were living in NYC and correlate it with case data.

47:30 Journalists struggled to find data to verify the anecdotal stories they were hearing about the the disparities in who was being affected by virus. Finding that one quarter to one third of the COVID case data was missing information on race, data journalists “hand collected” data by looking city by city to find the missing data.

50:30 Meghan recounts how they turned to age adjusted data to determine the impact of the virus on communities of color.

52:10 Data journalists find that excess deaths is a reliable metric for determining the impact of what is happening on the ground.

54:18 Journalists developed a survey which was returned by seven hundred schools to investigate how going back to school might be affecting students. Among their findings was that districts serving students of color were more likely to start online.

56:35 Meghan discusses the COVID-19 Tracking Project and the effort to sort out the impact of test positivity rates. She reports that because not all states measure the number of people who test in the same way, correctly comparing test positivity rates among states remains an unsolved problem.

58:33 Meghan shares the need to “flip the numbers” to help people understand the meaning of statistics stated in terms of very large numbers. For example, saying that “Since January of last year at least 1 in 15 people who live in Alexandria, Virginia have been infected by the virus” is easier for people to understand than something like: “On March 17th there were 14 cases per 100,000 in Alexandria”.

59:53 Vaccination tracking is another problematic data reporting area. Not only are vaccinations reported differently from state-to-state, but the data that is reported is changing from day-to-day. The CDC is apparently still adding new fields to the vaccination data sets.

The Q & A Discussion

1:02 The question and answer discussion begins.

1:02:56 Mark talks about how visualizations evolved over the course of the pandemic.

1:06:08 Carol and then Meghan talk how the lessons the pandemic taught data journalists about competition and collaboration.

1:10:04 Meghan describes how during the pandemic data journalists became advocates for public data.

1:11:21 Carol answers a question about the opportunities for data journalism in Brazil.

1:15:50 Answers a question of how academia is supporting data journalism during the pandemic and mentions an effort to have statistical and scientific experts collaborate with data journalists.

1:20:19 Meghan responds to a question about technical and social challenges for data journalists during the pandemic.

1:23:10 Carol talks about the difference between reporting online news and television news.

1:26:01 Mark answers a question about communicating emotional impact in COVID reporting and ends with emphasizing the importance of communicating honestly about what we do, and do not know.

1The first forum on May 14, 2020 focused on the data needs and challenges of modeling and controlling the spread of COVID-19, The second forum on August 13, 2020 explored what was being done to make clinical data available and useful. The third forum on December 10, 2020 discussed the role of mobility data.

To leave a comment for the author, please follow the link and comment on their blog: R Views. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)