Durban EDGE DataQuest

[This article was first published on R | datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Durban EDGE (Economic Development and Growth in eThekwini) DataQuest was held at UKZN (Westville Campus) on 13 November 2019. Participants were tasked with creating something interesting and useful with the civic data on the new Durban EDGE Open Data Portal developed by Open Data Durban.

These datasets were available:

  • EThekwini Water and Sanitation
  • Durban Skills Audit 2016
  • EThekwini Financial Statistics Survey
  • EThekwini Rate Collection and Valuation Roll
  • EThekwini Business Licensing
  • EThekwini DMOSS -DURBAN Metropolitan Open Space System
  • Rentable Office Data
  • EThekwini Labour Force
  • EThekwini Building Plans
  • Durban Film Sector Data
  • KZN Formal Education – Current
  • EThekwini Electricity Usage and Access and
  • EThekwini Ward Maps.

Here’s a presentation by Richard Gevers on auxiliary data sources.

None of the participants had prior experience with R, but most had used Excel. I’h hoped to get at least a few of them to try using R. To make this more accessible I introduced them to RStudio Cloud, which is such a phenomenal tool for this sort of gig since it requires zero setup on the participants’ machines. I also put together a couple of starter scripts:

Let’s take a quick look at them.

Electricity Usage

This script loads the electricity consumption data, does some simple wrangling (mostly just fixing the year column) and then creates a few plots.

The first plot shows how the number of (formal) electricity consumers has increased over time.

We see that there is a systematic increase in the number of consumers, which makes sense in terms of population growth and urbanisation.

How much energy is being consumed?

Again there is a systematic growth in energy consumption. But something clearly happens in 2007: the introduction of load shedding.

With these two pieces of information we can also assess the average power consumed per customer.

Distribution of Drivers’ Licenses

This script merges data from two sources:

  • a KML file giving ward boundaries and
  • a skills survey.

Although there’s a wealth of informative data in the survey, to keep things simple I used a simple Boolean column: whether or not the respondent had a drivers’ license.

Mashing these two datasets together created the map below: the proportion of people with drivers’ licenses broken down by ward.

Both of these scripts provide potentially interesting starting points for a deeper analysis. The main motivation for them though was simply to show how such an analysis can be done in R.

To leave a comment for the author, please follow the link and comment on their blog: R | datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)