UK Energy company acquisitions – a great use case for Sankey plots

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Like many others in the ‘United’ Kingdom, I’ve found my energy supplier has gone bust, due to ongoing price hikes, and I’ve been moved over to another one, whom ordinarily I wouldn’t have signed up with.

Fantastic.

not really
Because I’m not always sure you get the sarcasm

How many folk have been affected by this sort of thing since the energy market prices went haywire in 2021?

Here’s what I’m using:

library(readxl)
library(dplyr)
library(tidyr)
library(networkD3)
library(htmlwidgets)

Data duly pinched from the internet and bunged into excel for no reason other than to preserve the dates, rather than face the horror of trying to convert them all later (NB – lubridate and anytime make this easy, I’m just being lazy.)

I have since converted it to csv (with data.table’s ‘fread’, of course) and it can currently be found here if you want to try replicating this

energy

Here’s how the data looks:

head(data,10)

## # A tibble: 10 x 4
##    Date                `Failed_supplier`           Customers`Acquired_b~
##    <dttm>              <chr>                           <dbl> <chr>              
##  1 2022-01-01 00:00:00 Together Energy Retail Ltd.    176000 British Gas        
##  2 2021-12-01 00:00:00 Zog Energy                      11700 EDF                
##  3 2021-11-01 00:00:00 Entice Energy Supply Ltd         5400 Scottish Power     
##  4 2021-11-01 00:00:00 Orbit Energy                    65000 Scottish Power     
##  5 2021-11-01 00:00:00 Bulb                          1700000 *In special admini~
##  6 2021-11-01 00:00:00 Neon Reef Ltd.                  30000 British Gas        
##  7 2021-11-01 00:00:00 Social Energy Supply Ltd.        5500 British Gas        
##  8 2021-11-01 00:00:00 CNG Energy Ltd.                 41000 Pozitive Energy    
##  9 2021-11-01 00:00:00 Omni Energy Ltd.                 6000 Utilita            
## 10 2021-11-01 00:00:00 MA Energy Ltd.                    300 SmartestEnergy Bus~

I don’t like the column headers, and I want to restrict my final plot to just plotting by year rather than individual date, so I create a ‘year_acquired’ column:

data2 <- data %>% 
  rename(failed = `Failed_supplier`,
         acquired =  `Acquired_by`) %>% 
  mutate(year_acquired = lubridate::year(Date)) %>% 
  relocate(year_acquired, .before = 'Date') %>% 
    select(-Date)

I want a sankey plot, so I need to get my data into shape , and while I’m at it, I’ll filter it for 2021 and onwards:

links <-
  data2 %>%
  filter(year_acquired >= 2021) %>% 
  mutate(row = row_number()) %>%
  gather('column', 'source', -row) %>%
  mutate(column = match(column, names(data2))) %>%
  group_by(row) %>%
  arrange(column) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(target))

Network plots are all about links and nodes. So here’s some more jiggery-pokery with links and nodes. It’s all dplyr, and as that’s so easy to understand, I won’t delve into this too much

links <-
  links %>%
  mutate(source = paste0(source, '_', column)) %>%
  mutate(target = paste0(target, '_', column + 1)) %>%
  select(source, target)

nodes <- data.frame(name = unique(c(links$source, links$target)))

links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
links$value <- 1

nodes$name <- sub('_[0-9]+$', '', nodes$name)

Wohoo. Time for a Sankey plot. No idea how to get an interactive plot on here though, unfortunately, so a boring old static image will have to do.

Sorry about that.

But not as sorry as I will be, when I get my winter gas bill.

sankeyNetwork(Links = links,
              Nodes = nodes,
              Source = 'source',
              Target = 'target',
              Value = 'value',
              NodeID = 'name')

# assign plot to 'p', then save
#saveNetwork(p, file= "sankeyEnergy.html", selfcontained = TRUE)

Looks like some big players are doing well in the acquisition stakes, as the smaller (cheaper, with good customer service) suppliers get squeezed out.

Probably a metaphor for something, who knows?

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)