%
head()
# A tibble: 6 × 13
year biofuel_share_elec coal_share_elec fossil_share_elec gas_share_elec
1 2000 1.60 51.7 70.9 16.2
2 2001 1.34 51.1 71.9 17.1
3 2002 1.40 50.3 71.0 18.0
4 2003 1.38 51.0 71.2 16.8
5 2004 1.36 50.1 71.3 18.0
6 2005 1.34 49.9 71.9 18.9
# ℹ 8 more variables: hydro_share_elec , low_carbon_share_elec ,
# nuclear_share_elec , oil_share_elec ,
# other_renewables_share_elec , renewables_share_elec ,
# solar_share_elec , wind_share_elec
Now I have a dataframe with just the variables I want to plot, but the format isn’t ideal; I would have to specify/plot each variable separately. What I want to do is give a single plot command and group/color the lines by the variable (fuel type). To achieve this, I am going to pivot the data frame from wide to long format, using the pivot_longer function from the tidyr (Wickham, Vaughan, and Girlich (2023)) package. I believe this was previously referred to as gather or melt.
Codeusa_share %
select(year, ends_with("share_elec")) %>%
tidyr::pivot_longer(
cols = ends_with("share_elec"),
names_to = "FuelType",
values_to = "Percentage") |>
mutate(FuelType = str_remove(FuelType,'_share_elec'))
head(usa_share)
# A tibble: 6 × 3
year FuelType Percentage
1 2000 biofuel 1.60
2 2000 coal 51.7
3 2000 fossil 70.9
4 2000 gas 16.2
5 2000 hydro 7.10
6 2000 low_carbon 29.1
Now my dataframe has a row for each year, fuel type, and value, and I can simply group or color by the fuel type when I plot.
Codeg %
ggplot(aes(year, Percentage)) +
geom_line(aes(color = FuelType), linewidth = 1.5) +
ggtitle("Percent of US electricity Generation By Fuel type") +
xlab("Year") +
ylab("Percent")
plotly::ggplotly(g)
Figure 1: Timeseries of the percent of total US electricty generation by fuel type.
Here we finally have a plot (Figure 1) of the share of electricity generation by fuel type. We can see that the share of fossil fuels and coal has decreased, and renewable have increased. But there’s a lot on this plot and it’s hard to read, so I’ll focus on some more specific subsets of the data.
Total Fossil, renewables, and nuclear shares
First we can look at the total shares of fossil (oil,coal, gas), renewable (wind, solar, hydro), and nuclear generation. Grouping into these categories de-clutters the plot and makes it easier to interpret.
Codeg %
select(year, fossil_share_elec, renewables_share_elec, nuclear_share_elec) %>%
tidyr::pivot_longer(
cols = dplyr::ends_with("share_elec"),
names_to = "FuelType",
values_to = "Percentage"
) %>%
mutate(FuelType = str_remove(FuelType,'_share_elec')) |>
ggplot(aes(year, Percentage)) +
geom_line(aes(color = FuelType), linewidth = 1.5) +
ggtitle("Percent of US electricity Generation") +
xlab("Year") +
ylab("Percent")
plotly::ggplotly(g)
Figure 2: Timeseries of the percent of total US electricty generation by fuel types
Observations from this plot (Figure 2):
Fossil fuel share has been decreasing steadily since about 2007
Renewable share has been increasing steadily since about 2007
Nuclear has remained relatively constant at around 20%.
Fossil share remains the majority of generation, but is decreasing. Renewables became approximately equal to nuclear around 2020 and are continuing to increase.
Breakdown of fossil fuel shares
In this dataset, fossil fuels include coal, gas, and oil.
Codeg %
select(year, fossil_share_elec, oil_share_elec, gas_share_elec, coal_share_elec) %>%
tidyr::pivot_longer(
cols = dplyr::ends_with("share_elec"),
names_to = "FuelType",
values_to = "Percentage"
) %>%
mutate(FuelType = str_remove(FuelType,'_share_elec')) |>
ggplot(aes(year, Percentage)) +
geom_line(aes(color = FuelType), linewidth = 1.5) +
ggtitle("Percent of US electricity Generation: Fossil Fuels") +
xlab("Year") +
ylab("Percent")
plotly::ggplotly(g)
Figure 3: Timeseries of the percent of total US electricty generation by fossil fuels
Observations from this plot (Figure 3):
We can see that the fossil fuel share of electricity generation has been decreasing, starting around 2008.
Coal and gas make up the majority of the fossil fuel generation.
Coal share has been decreasing while the gas share has increased. Coal was much higher than gas previously, but their shares became equal around 2015 and gas now makes up a larger share of the fossil fuel generation.
Renewables breakdown
In this dataset, renewables include wind, solar, and hydro.
Codeg %
select(
year, renewables_share_elec, solar_share_elec,
hydro_share_elec, wind_share_elec
) %>%
tidyr::pivot_longer(
cols = dplyr::ends_with("share_elec"),
names_to = "FuelType",
values_to = "Percentage"
) %>%
mutate(FuelType = str_remove(FuelType,'_share_elec')) |>
ggplot(aes(year, Percentage)) +
geom_line(aes(color = FuelType), linewidth = 1.5) +
ggtitle("Percent of US electricity Generation: Renewables") +
xlab("Year") +
ylab("Percent")
plotly::ggplotly(g)
Figure 4: Timeseries of the percent of total US electricty generation renewable fuels
Observations from this plot (Figure 4):
The share of renewable electricity production has increased sharply, approximately doubling from 2008 to 2020.
The share of hydro generation has remained relatively constant.
Solar and wind shares have increased significantly.
Wind started to increase earlier, around 2005.
Solar started increasing around 2012
SessionInfo
CodesessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Denver
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] plotly_4.10.3 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0
[5] dplyr_1.1.3 purrr_1.0.2 readr_2.1.4 tidyr_1.3.0
[9] tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 renv_1.0.3 stringi_1.7.12
[5] hms_1.1.3 digest_0.6.33 magrittr_2.0.3 evaluate_0.22
[9] grid_4.3.1 timechange_0.2.0 fastmap_1.1.1 jsonlite_1.8.7
[13] httr_1.4.7 fansi_1.0.5 crosstalk_1.2.0 viridisLite_0.4.2
[17] scales_1.2.1 lazyeval_0.2.2 cli_3.6.1 rlang_1.1.1
[21] crayon_1.5.2 ellipsis_0.3.2 bit64_4.0.5 munsell_0.5.0
[25] withr_2.5.1 yaml_2.3.7 parallel_4.3.1 tools_4.3.1
[29] tzdb_0.4.0 colorspace_2.1-0 curl_5.1.0 vctrs_0.6.4
[33] R6_2.5.1 lifecycle_1.0.3 htmlwidgets_1.6.2 bit_4.0.5
[37] vroom_1.6.4 pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.4
[41] glue_1.6.2 data.table_1.14.8 xfun_0.40 tidyselect_1.2.0
[45] rstudioapi_0.15.0 knitr_1.44 farver_2.1.1 htmltools_0.5.6.1
[49] labeling_0.4.3 rmarkdown_2.25 compiler_4.3.1
References
Ritchie, Hannah, Max Roser, and Pablo Rosado. 2022. “Energy.” Our World in Data.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. “Tidyr: Tidy Messy Data.” https://CRAN.R-project.org/package=tidyr.
" />
[This article was first published on Andy Pickering, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tidy Tuesday: Energy (2023 – Week 23)
Introduction
In this document I’ll analyze and visualize some energy data that were the focus of Tidy Tuesday 2023 week 23. The data comes from Our World In Data and the full data set is available here. Data Source Citation: Ritchie, Roser, and Rosado (2022).
Analysis and Visualization
I’ll start by loading the tidyverse (Wickham et al. (2019)) library and the data set. The result is a dataframe with a row for each country and year, from 1900-2002.
# A tibble: 6 × 129
country year iso_code population gdp biofuel_cons_change_pct
<chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 United States 2000 USA 282398560 1.30e13 14.6
2 United States 2001 USA 285470496 1.31e13 6.24
3 United States 2002 USA 288350240 1.33e13 19.5
4 United States 2003 USA 291109824 1.37e13 35.7
5 United States 2004 USA 293947872 1.42e13 26.2
6 United States 2005 USA 296842656 1.47e13 16.8
# ℹ 123 more variables: biofuel_cons_change_twh <dbl>,
# biofuel_cons_per_capita <dbl>, biofuel_consumption <dbl>,
# biofuel_elec_per_capita <dbl>, biofuel_electricity <dbl>,
# biofuel_share_elec <dbl>, biofuel_share_energy <dbl>,
# carbon_intensity_elec <dbl>, coal_cons_change_pct <dbl>,
# coal_cons_change_twh <dbl>, coal_cons_per_capita <dbl>,
# coal_consumption <dbl>, coal_elec_per_capita <dbl>, …
Share of electricity generation by fuel type
For this analysis, I’ve chosen to investigate how the mix of fuel types used to generate electricity has changed over time. We need to reduce carbon emissions in order to prevent or mitigate the effects of climate change, and electricity generation is a large component of these emissions. I’m interested to see what progress has been made in transitioning to more renewable/low-carbon fuels for electricity generation.
Conveniently, the data already contain fields for the share of total electricity generation for each fuel type! I’ll make a new data frame with just these fields. I can select these columns (all ending in share_elec), using the ends_with function from the dplyrWickham et al. (2023) package.
Code
usa %>%
select(year, dplyr::ends_with("share_elec")) %>%
head()
Now I have a dataframe with just the variables I want to plot, but the format isn’t ideal; I would have to specify/plot each variable separately. What I want to do is give a single plot command and group/color the lines by the variable (fuel type). To achieve this, I am going to pivot the data frame from wide to long format, using the pivot_longer function from the tidyr (Wickham, Vaughan, and Girlich (2023)) package. I believe this was previously referred to as gather or melt.
Now my dataframe has a row for each year, fuel type, and value, and I can simply group or color by the fuel type when I plot.
Code
g <- usa_share %>%
ggplot(aes(year, Percentage)) +
geom_line(aes(color = FuelType), linewidth = 1.5) +
ggtitle("Percent of US electricity Generation By Fuel type") +
xlab("Year") +
ylab("Percent")
plotly::ggplotly(g)
Here we finally have a plot (Figure 1) of the share of electricity generation by fuel type. We can see that the share of fossil fuels and coal has decreased, and renewable have increased. But there’s a lot on this plot and it’s hard to read, so I’ll focus on some more specific subsets of the data.
Total Fossil, renewables, and nuclear shares
First we can look at the total shares of fossil (oil,coal, gas), renewable (wind, solar, hydro), and nuclear generation. Grouping into these categories de-clutters the plot and makes it easier to interpret.
Fossil fuel share has been decreasing steadily since about 2007
Renewable share has been increasing steadily since about 2007
Nuclear has remained relatively constant at around 20%.
Fossil share remains the majority of generation, but is decreasing. Renewables became approximately equal to nuclear around 2020 and are continuing to increase.
Breakdown of fossil fuel shares
In this dataset, fossil fuels include coal, gas, and oil.
We can see that the fossil fuel share of electricity generation has been decreasing, starting around 2008.
Coal and gas make up the majority of the fossil fuel generation.
Coal share has been decreasing while the gas share has increased. Coal was much higher than gas previously, but their shares became equal around 2015 and gas now makes up a larger share of the fossil fuel generation.
Renewables breakdown
In this dataset, renewables include wind, solar, and hydro.
Ritchie, Hannah, Max Roser, and Pablo Rosado. 2022. “Energy.”Our World in Data.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.”https://CRAN.R-project.org/package=dplyr.