Extracting data from news articles: Australian pollution by postcode

[This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.

So here’s how I got them, using R.

The full details are in this Github repository. There you’ll find the code to generate this report.

Essentially, the procedure goes like this:

  1. Use rvest to create a data frame from the data table in the online article
  2. Clean and pre-process the data using dplyr
  3. Join the pollution data with geospatial data derived from a shapefile of Australian postal areas
  4. Filter by postcode range for the city of interest
  5. And finally plot maps using ggplot2

Rather than copying/pasting/formatting code here, I encourage you to look at the report.

Result: maps, like the one on the right. I sometimes think R makes this kind of thing almost too easy.

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)