Site icon R-bloggers

Council spending – open data

[This article was first published on R – scottishsnow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My local authority recently decided to publish all spending over £500 in an effort to be more transparent. Here’s a post taking an overview of what they’ve published. I’ve used R for the analysis. The dataset doesn’t contain much detail, but if you’ve analysis suggestions, please add them in the comments!

You can download the spending data here. It’s available in pdf (why?!) and xlsx (plain text would be more open).

First off, some packages:

library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(formattable)

Read in the dataset:

df = read_excel("~/Downloads/midlothian_payments_over_500_01042019_to_15092019.xlsx") %>%
   clean_names()

We’ve got six columns:

 

Busiest day:

df %>%
   mutate(day = weekdays(date_paid)) %>%
   group_by(day) %>%
   summarise(transactions = n(),
             thousands_pounds_spent = sum(amount) / 1000) %>%
   mutate(day = fct_relevel(day, rev(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))) %>%
   gather(var, value, -day) %>%
   ggplot(aes(day, value)) +
   geom_col() +
   facet_wrap(~var, scales = "free_x") +
   coord_flip() +
   scale_y_continuous(labels = scales::comma) +
   labs(title = "Busiest day of the week",
        x = "",
        y = "")

Busiest time of year:

df %>%
   mutate(dow = weekdays(date_paid),
          dow = if_else(dow == "Tuesday" | dow == "Friday", "Tue/Fri", "Other")) %>%
   group_by(date_paid, dow) %>%
   summarise(transactions = n(),
             pounds_spent = sum(amount)) %>%
   gather(var, value, -date_paid, -dow) %>%
   ggplot(aes(date_paid, value, colour = dow)) +
   geom_point() +
   facet_wrap(~var, scales = "free_y") +
   scale_y_log10(labels = scales::comma) +
   scale_colour_brewer(type = "qual", palette = "Set2") +
   labs(title = "Busiest day of the year",
        x = "",
        y = "")

Top 10 payees by value:

df %>%
   group_by(supplier) %>%
   summarise(pounds_spent = sum(amount),
             transactions = n()) %>%
   arrange(desc(pounds_spent)) %>%
   top_n(n = 10, wt = pounds_spent) %>%
   mutate(pounds_spent = currency(pounds_spent, "£", digits = 0L)) %>%
   formattable(list(`pounds_spent` = color_bar("#FA614B"),
                    `transactions` = color_bar("lightpink")))

In Scotland local authorities collect water charges on behalf of the water authority, which they then pass on. It’s not surprise that Scottish Water is the biggest supplier.

Top 10 payees by frequency:

df %>%
   group_by(supplier) %>%
   summarise(pounds_spent = sum(amount),
             transactions = n()) %>%
   arrange(desc(transactions)) %>%
   top_n(n = 10, wt = transactions) %>%
   mutate(pounds_spent = currency(pounds_spent, "£", digits = 0L)) %>%
   formattable(list(`pounds_spent` = color_bar("lightpink"),
                    `transactions` = color_bar("#FA614B")))

As a final note, writing this post is reminding me again I should be moving away from wordpress because incorporating code and output would be much easier with mark/blog down! As always, legacy is holding me back.

To leave a comment for the author, please follow the link and comment on their blog: R – scottishsnow.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.