Site icon R-bloggers

Tidy Tuesday Energy Analysis

[This article was first published on Andy Pickering, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="tidy-tuesday-energy-2023---week-23" class="level1">

Tidy Tuesday: Energy (2023 – Week 23)

< section id="introduction" class="level2">

Introduction

In this document I’ll analyze and visualize some energy data that were the focus of Tidy Tuesday2023 week 23. The data comes from Our World In Data and the full data set is available here. Data Source Citation: Ritchie, Roser, and Rosado (2022).

< section id="analysis-and-visualization" class="level2">

Analysis and Visualization

I’ll start by loading libraries (tidyverse) and the data set. The result is a dataframe with a row for each country and year, from 1900-2002.

< details open="">< summary>Code
suppressPackageStartupMessages(library(tidyverse))

owid_energy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-06/owid-energy.csv',show_col_types = FALSE)

head(owid_energy)
# A tibble: 6 × 129
  country      year iso_code population   gdp biofuel_cons_change_pct
  <chr>       <dbl> <chr>         <dbl> <dbl>                   <dbl>
1 Afghanistan  1900 AFG         4832414    NA                      NA
2 Afghanistan  1901 AFG         4879685    NA                      NA
3 Afghanistan  1902 AFG         4935122    NA                      NA
4 Afghanistan  1903 AFG         4998861    NA                      NA
5 Afghanistan  1904 AFG         5063419    NA                      NA
6 Afghanistan  1905 AFG         5128808    NA                      NA
# ℹ 123 more variables: biofuel_cons_change_twh <dbl>,
#   biofuel_cons_per_capita <dbl>, biofuel_consumption <dbl>,
#   biofuel_elec_per_capita <dbl>, biofuel_electricity <dbl>,
#   biofuel_share_elec <dbl>, biofuel_share_energy <dbl>,
#   carbon_intensity_elec <dbl>, coal_cons_change_pct <dbl>,
#   coal_cons_change_twh <dbl>, coal_cons_per_capita <dbl>,
#   coal_consumption <dbl>, coal_elec_per_capita <dbl>, …
< details open="">< summary>Code
length(unique(owid_energy$country))
[1] 306

That’s a lot! I’ll focus on just the United States for now.

Make a new dataframe for just the USA data and remove years without data.

< details open="">< summary>Code
usa <- owid_energy %>% 
  filter(country=="United States") %>% 
  filter(!is.na(electricity_demand))

head(usa)
# A tibble: 6 × 129
  country        year iso_code population     gdp biofuel_cons_change_pct
  <chr>         <dbl> <chr>         <dbl>   <dbl>                   <dbl>
1 United States  2000 USA       282398560 1.30e13                   14.6 
2 United States  2001 USA       285470496 1.31e13                    6.24
3 United States  2002 USA       288350240 1.33e13                   19.5 
4 United States  2003 USA       291109824 1.37e13                   35.7 
5 United States  2004 USA       293947872 1.42e13                   26.2 
6 United States  2005 USA       296842656 1.47e13                   16.8 
# ℹ 123 more variables: biofuel_cons_change_twh <dbl>,
#   biofuel_cons_per_capita <dbl>, biofuel_consumption <dbl>,
#   biofuel_elec_per_capita <dbl>, biofuel_electricity <dbl>,
#   biofuel_share_elec <dbl>, biofuel_share_energy <dbl>,
#   carbon_intensity_elec <dbl>, coal_cons_change_pct <dbl>,
#   coal_cons_change_twh <dbl>, coal_cons_per_capita <dbl>,
#   coal_consumption <dbl>, coal_elec_per_capita <dbl>, …
< section id="share-of-electricity-generation-by-fuel-type" class="level3">

Share of electricity generation by fuel type

For this analysis, I’ve chosen to investigate how the mix of fuel types used to generate electricity has changed over time. We need to reduce carbon emissionsIn order to prevent or mitigate the effects of climate change, and electricity generation is a large component of these emissions. I’m interested to see what progress has been made in transitioning to more renewable/low-carbon fuels for electricity generation.

Conveniently, the data already contain fields for the share of total electricity generation for each fuel type! I’ll make a new data frame with just these fields. I can select these columns (all ending in share_elec), using the ends_with function from the Wickham et al. (2023) package.

< details open="">< summary>Code
usa %>% select(year,dplyr::ends_with('share_elec')) %>% 
  head()
# A tibble: 6 × 13
   year biofuel_share_elec coal_share_elec fossil_share_elec gas_share_elec
  <dbl>              <dbl>           <dbl>             <dbl>          <dbl>
1  2000               1.60            51.7              70.9           16.2
2  2001               1.34            51.1              71.9           17.1
3  2002               1.40            50.3              71.0           18.0
4  2003               1.38            51.0              71.2           16.8
5  2004               1.36            50.1              71.3           18.0
6  2005               1.34            49.9              71.9           18.9
# ℹ 8 more variables: hydro_share_elec <dbl>, low_carbon_share_elec <dbl>,
#   nuclear_share_elec <dbl>, oil_share_elec <dbl>,
#   other_renewables_share_elec <dbl>, renewables_share_elec <dbl>,
#   solar_share_elec <dbl>, wind_share_elec <dbl>

Now I have a dataframe with just the variables I want to plot, but the format isn’t ideal; I would have to specify/plot each variable separately. What I want to do is give a single plot command and group/color the lines by the variable (fuel type). To achieve this, I am going to pivot the data frame from wide to long format. I believe this was previously referred to as gather or melt.

< details open="">< summary>Code
usa_share <- usa %>% select(year,ends_with('share_elec')) %>% 
  tidyr::pivot_longer(cols=ends_with('share_elec'),
                      names_to='FuelType',
                      values_to = 'Percentage')
head(usa_share)
# A tibble: 6 × 3
   year FuelType              Percentage
  <dbl> <chr>                      <dbl>
1  2000 biofuel_share_elec          1.60
2  2000 coal_share_elec            51.7 
3  2000 fossil_share_elec          70.9 
4  2000 gas_share_elec             16.2 
5  2000 hydro_share_elec            7.10
6  2000 low_carbon_share_elec      29.1 

Now my dataframe has a row for each year, fuel type, and value, and I can simply group or color by the fuel type when I plot.

< details open="">< summary>Code
usa_share %>% 
  ggplot(aes(year,Percentage))+
  geom_line(aes(color=FuelType),linewidth=1.5)+
  ggtitle("Percent of US electricity Generation By Fuel type")+
  xlab("Year")+
  ylab("Percent")

Timeseries of the percent of total US electricty generation by fuel type.

Here we finally have a plot of the share of electricity generation by fuel type. We can see that the share of fossil fuels and coal has decreased, and renewable have increased. But there’s a lot on this plot and it’s hard to read, so I’ll focus down on some more specific subsets of the data.

< section id="total-fossil-renewables-and-nuclear-shares" class="level3">

Total Fossil, renewables, and nuclear shares

First we can look at the total shares of fossil (oil,coal, gas), renewable (wind, solar, hydro), and nuclear generation. Grouping into these categories de-clutters the plot and makes it easier to interpret.

< details open="">< summary>Code
usa %>% select(year,fossil_share_elec,renewables_share_elec,nuclear_share_elec) %>% 
  tidyr::pivot_longer(cols=dplyr::ends_with('share_elec'),
               names_to='FuelType',
               values_to = 'Percentage') %>% 
  ggplot(aes(year,Percentage))+
  geom_line(aes(color=FuelType),linewidth=1.5)+
  ggtitle("Percent of US electricity Generation")+
  xlab("Year")+
  ylab("Percent")

Observations from this plot:

< section id="breakdown-of-fossil-fuel-shares" class="level3">

Breakdown of fossil fuel shares

< details open="">< summary>Code
usa %>% select(year,fossil_share_elec,oil_share_elec,gas_share_elec,coal_share_elec) %>% 
  tidyr::pivot_longer(cols=dplyr::ends_with('share_elec'),
               names_to='FuelType',
               values_to = 'Percentage') %>% 
  ggplot(aes(year,Percentage))+
  geom_line(aes(color=FuelType),linewidth=1.5)+
  ggtitle("Percent of US electricity Generation: Fossil Fuels")+
  xlab("Year")+
  ylab("Percent")

Observations from this plot:

< section id="renewables-breakdown" class="level3">

Renewables breakdown

< details open="">< summary>Code
usa %>% select(year,renewables_share_elec,solar_share_elec,
               hydro_share_elec,wind_share_elec) %>% 
  tidyr::pivot_longer(cols=dplyr::ends_with('share_elec'),
               names_to='FuelType',
               values_to = 'Percentage') %>% 
  ggplot(aes(year,Percentage))+
  geom_line(aes(color=FuelType),linewidth=1.5)+
  ggtitle("Percent of US electricity Generation: Renewables")+
  xlab("Year")+
  ylab("Percent")

Observations from this plot:

< section id="references" class="level2"> < !-- -->
< section class="quarto-appendix-contents">

References

Ritchie, Hannah, Max Roser, and Pablo Rosado. 2022. “Energy.” Our World in Data.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.
To leave a comment for the author, please follow the link and comment on their blog: Andy Pickering.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version