# Lies, Damn Lies, “Data Journalism” and Charts That Don’t Start at 0

January 28, 2014
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

This tweet by @moorehn (who usually is a superb economic journalist) really bugged me:

I grabbed the raw data from EPI: (http://www.epi.org/files/2012/data-swa/jobs-data/Employment%20to%20population%20ratio%20(EPOPs).xls) and properly started the graph at 0 for the y-axis and also broke out men & women (since the Excel spreadsheet had the data). It’s a really different picture:

I’m not saying employment is great right now, but it’s nowhere near a “ski jump”. So much for the state of data journalism at the start of 2014.

Here’s the hastily crafted R-code:

library(ggplot2) library(ggthemes) library(reshape2)   a <- read.csv("empvyear.csv") b <- melt(a, id.vars="Year")   gg <- ggplot(data=b, aes(x=Year, y=value, group=variable)) gg <- gg + geom_line(aes(color=variable)) gg <- gg + ylim(0, 100) gg <- gg + theme_economist() gg <- gg + labs(x="Year", y="Employment as share of population (%)", title="Employment-to-population ratio, age 25–54, 1975–2011") gg <- gg + theme(legend.title = element_blank()) gg

And, here’s the data extracted from the Excel file:

Year,Men,Women
1975,89.0,51.0
1976,89.5,52.9
1977,90.1,54.8
1978,91.0,57.3
1979,91.1,59.0
1980,89.4,60.1
1981,89.0,61.2
1982,86.5,61.2
1983,86.1,62.0
1984,88.4,63.9
1985,88.7,65.3
1986,88.5,66.6
1987,89.0,68.2
1988,89.5,69.3
1989,89.9,70.4
1990,89.1,70.6
1991,87.5,70.1
1992,86.8,70.1
1993,87.0,70.4
1994,87.2,71.5
1995,87.6,72.2
1996,87.9,72.8
1997,88.4,73.5
1998,88.8,73.6
1999,89.0,74.1
2000,89.0,74.2
2001,87.9,73.4
2002,86.6,72.3
2003,85.9,72.0
2004,86.3,71.8
2005,86.9,72.0
2006,87.3,72.5
2007,87.5,72.5
2008,86.0,72.3
2009,81.5,70.2
2010,81,69.3
2011,81.4,69