Smoke from a distant fire

[This article was first published on Bayes Ball, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Forest fires and air quality

August 31, 2018


It was recently announced that during 2018, British Columbia has seen the most extensive forest fire season on record. As I write this (2018-08-31) there are currently 442 wildfires burning in British Columbia. These fires have a significant impact on people’s lives–many areas are under evacuation order and evacuation alert, and there are reports that homes have been destroyed by the blazes.

The fires also create a significant amount of smoke, which has been pushed great distances by the shifting winds. This includes the large population centres of Vancouver and Victoria in British Columbia, as well as the Seattle metropolitan region and elsewhere in Washington. (Clifford Mass, Professor of Atmospheric Sciences at the Universtiy of Washington in Seattle, has written extensively about the smoke events in the region; see for example Western Washington Smoke: Darkest Before the Dawn from 2018-08-22.)

The Province of British Columbia has many air quality monitoring stations around the province, and makes the data available. The measure most used for monitoring the effects on health is PM25 or PM2.5, for fine particles with a diameter of 2.5 microns (millionths of a metre). The B.C. government has a Current Particulate Matter map that colour codes the one hour average measures for all the testing stations around the province.

The data file and a simple plot


The DataBC Catalogue provides access to air quality data. There’s “verified” to the end of 2017, and “unverified” for the past 30 days. Since we want to see what happened this month, it’s the latter we want. (The page with the links to the raw files is here.)

The files are arranged by particulate or gas type; there’s a table for ozone and another for sulpher dioxide, and others for the particulate matter. Note that the data are made available under the Province of B.C.’s Open Data license, and are in nice tidy form. And the date format is ISO 8601, which makes me happy.

To make sure we’ve got a reproducible version, I’ve saved the file I downloaded early this morning to my google drive. The link to the folder is here.

For the first plot, let’s look at the PM2.5 level for my hometown of Victoria, B.C. The code below loads the R packages we’ll use, reads the data, and generates the plot.


# tidyverse packages
library(tidyverse)
library(glue)

PM25_data <- readr::read_csv("PM25_2018-08-31.csv")
filter(STATION_NAME == "Victoria Topaz") %>% ggplot() + geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) + labs(x = "date", title = glue("Air quality: Victoria Topaz"), subtitle = "one hour average, µg/m3 of PM2.5", caption = "data: B.C. Ministry of Environment and Climate Change Strategy")


There are 61 air quality monitoring stations around British Columbia. It would be interesting to see how the air quality was in other parts of the region–and since over half (54% in 2017) of the province’s population lives in the Vancouver Census Metropolitan Area (CMA), let’s plot the air quality there. There are multiple stations in the Vancouver CMA, so I chose the one at Burnaby South…it’s fairly central in the region.We can run this line of code to see a listing of all 61 stations (but we won’t do that now…)

# list all the air quality stations for which there is PM2.5 data
unique(PM25_data$STATION_NAME)

And since we’re going to be doing this often, let’s wrap the code that filters for the location we want and runs the plot in a function. Note that we’ll create a new variable station_name so all we need to do to change the plot is assign the name of the station we want, and off we go. Not only does this simplify our lives now, but is all-but-essential for a Shiny application.
# the air quality plot
PM25_plot <- function(datafile, station_name){
  datafile %>%
  filter(STATION_NAME == station_name) %>%
  ggplot() +
  geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
  labs(x = "date",
       title = glue("Air quality: ", station_name),
       subtitle = "one hour average, µg/m3 of PM2.5", 
       caption = "data: B.C. Ministry of Environment and Climate Change Strategy")
}

Now that we’ve got the function, the code to create the plot for the Burnaby South station is significantly simplified: assign the station name, and call the function.

# our Burnaby plot
station_name <- "Burnaby South"

PM25_plot(PM25_data, station_name)




And what about the towns that are the closest to the fires? While there are fires burning across the province, the fires that are burning the forests of the Nechako Plateau have understandably received a lot of attention. You may have seen the news stories and images from Prince George like this and this, or the images of the smoke plume from the NASA Worldview site.

Prince George is east of many major fires, downwind of the prevailing westerly winds. So what has the air quality in Prince George been like?

station_name <- "Prince George Plaza 400"
PM25_plot(PM25_data, station_name)




Or still closer to the fires, the town of Burns Lake.

station_name <- "Burns Lake Fire Centre"
PM25_plot(PM25_data, station_name)



The town of Smithers is west of the fires that are burning on the Nechako Plateau and producing all the smoke experienced in Burns Lake and Prince George. The residents of Smithers have had a very different experience, only seeing smoke in the sky when the winds shifted to become easterly.
station_name <- "Smithers St Josephs" 
PM25_plot(PM25_data, station_name)


multiple stations in one plot

You may have noticed that the Y axis on the plots can be quite different–for example, Victoria reaches 300, Smithers gets to 400, and Prince George is double that at 800, and Burns Lake is more than double again. There are two ways we can compare multiple stations: a single plot, or faceted plots.

a line plot with four stations

station_name <- c("Burns Lake Fire Centre", "Prince George Plaza 400", 
                  "Smithers St Josephs", "Victoria Topaz")

PM25_data %>%
  filter(STATION_NAME %in% station_name) %>%
  ggplot() +
  geom_line(aes(x = DATE_PST, y = REPORTED_VALUE, colour = STATION_NAME)) +
  labs(x = "date",
       title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
       subtitle = "one hour average, µg/m3 of PM2.5", 
       caption = "data: Ministry of Environment and Climate Change Strategy")




With four complex lines as we have here, it can be hard to discern which line is which. 

Use facets to plot the four stations separately


Facets give us another way to view the comparisons. In the first version, with the facets stacked vertically, it emphasizes comparisons on the X axis–that is, over time. In this way, we can see that the four locations have had smoke events that have occurred at different times.

station_name <- c("Burns Lake Fire Centre", "Prince George Plaza 400", 
                  "Smithers St Josephs", "Victoria Topaz")


PM25_data %>%
  filter(STATION_NAME %in% station_name) %>%
  ggplot() +
  geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
  facet_grid(STATION_NAME ~ .) +
  labs(title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
       subtitle = "one hour average, µg/m3 of PM2.5", 
       caption = "data: B.C. Ministry of Environment and Climate Change Strategy") +
  theme(axis.text.x=element_text(size=rel(0.75), angle=90),
        axis.title = element_blank())
In the second version, the facets are placed horizontally, making comparisons on the Y axis clear. The smoke events in the four locations have been of very different magnitudes.
PM25_data %>%
  filter(STATION_NAME %in% station_name) %>%
  ggplot() +
  geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
  facet_grid(. ~ STATION_NAME) +
  labs(title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
       subtitle = "one hour average, µg/m3 of PM2.5", 
       caption = "data: B.C. Ministry of Environment and Climate Change Strategy") +
  theme(axis.text.x=element_text(size=rel(0.75), angle=90),
        axis.title = element_blank())




These two plots show not only that Burns Lake and Prince George have had the most extreme smoke events, but that they have had sustained periods of poor air quality through the whole month. While the most extreme event in Smithers exceeds that of Victoria, there hasn’t been a prolonged period of smoke in the air like the other three locations.

-30-






To leave a comment for the author, please follow the link and comment on their blog: Bayes Ball.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)