Extracting data from news articles: Australian pollution by postcode

November 28, 2018

(This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers)

The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps.

So here’s how I got them, using R.

The full details are in this Github repository. There you’ll find the code to generate this report.

Essentially, the procedure goes like this:

  1. Use rvest to create a data frame from the data table in the online article
  2. Clean and pre-process the data using dplyr
  3. Join the pollution data with geospatial data derived from a shapefile of Australian postal areas
  4. Filter by postcode range for the city of interest
  5. And finally plot maps using ggplot2

Rather than copying/pasting/formatting code here, I encourage you to look at the report.

Result: maps, like the one on the right. I sometimes think R makes this kind of thing almost too easy.

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)