Sakura blossoms in Japan

[This article was first published on Opiate for the masses, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Springtime and cherry blossoms

It is spring, and the flowers bloom. One tree in particular symbolises this: the Japanese cherry (sakura) tree. Every year, millions of Japanese enjoy hanami, jointly watching the cherry blossoms, philosophising about the the transient nature of beauty, and the inevitability of death.

There are regular “sakura forecasts” on TV, and elsewhere. The Japanese National Tourist Office has its own map and details available on their website as well.

The data and graph section of The Economist recently published a beautiful chart showing 1200 years of data on the cherry blossom – as well as one of their customary great puns: “the elephant in the bloom”. I wondered if I could replicate the graph in R.

I remembered an emoji font I had stumbled across, and that there is a sakura (cherry blossom) emoji. I also felt a bit competitive – with a Japanese on this blog’s author roll, and another author being a certified expert on emojis, I felt that I needed show off that I, too, can poach in their areas of expertise!

Data

The Economist’s article mentions a professor Yasuyuki Aono, who wrote several papers using data on Japanese cherry blossom dates. A quick googleing brought up his homepage. He also linked to a page detailing the cherry blossom data. Fantastic! A pretty straightforward data grab, it is then.

if(!file.exists("sakura.xls")){
  # data is available on AONO's university homepage: 
  # http://atmenv.envi.osakafu-u.ac.jp/aono/kyophenotemp4/
  aono.dta <- "http://atmenv.envi.osakafu-u.ac.jp/osakafu-content/uploads/sites/251/2015/10/KyotoFullFlower7.xls"
  # make sure to read as binary, because windows is stupid
  download.file(aono.dta, "sakura.xls", mode= "wb")
} 

# read in data
dta <- readxl::read_excel("sakura.xls", skip = 25) %>% 
  # make "good" column names
  setNames(make.names(names(.), unique = TRUE)) %>% 
  mutate(
    # convert the blossom date to R dateformat 
    blossom.date = as.Date(paste0(sprintf("%04d", AD), "0", Full.flowering.date), format = "%Y%m%d")
  )

We download the data if not available on disk, do some cleaning of column names, and generate a proper date column of the full flowering date, since this is the data with the most observations.

It’s a fascinating dataset: 1200 years of human records on a yearly event! It’s not often that one can work with these kinds of data. For this, I would like to thank professor Aono. Generating such a historical dataset is a tremendous amount of work (he probably had to find and read through hundreds of historical records), and providing the data on his homepage is not something I see often in academia.

Graphing

The original emoji graphing system I had in mind, emoGG, worked, but I could not change the size of the individual emojis. It turns out, emoGG is deprecated, and I should look at emojifont instead.

emojifont uses geom_text() to draw the emojis onto a ggplot() graph. After a quick install of the font I was ready to go. Two things one needs to keep in mind when working with emojifont: you will need to load the font to start with, and if you’re using RStudio, you will need to initialise a new graphics environment. The standard RStudio environment is not able to preview/show the fonts. Both can be accomplished with these commands:

# need this to load the font
load.emojifont()
# only needed for RStudio
windows()

One other thing that was problematic: emojifont uses a font to populate the graph by ggplot2::aes(label = "labelname") to specify the emoji. It needs the label pulled from the original dataset, and not generated within the aes().

# plot as sakuras
dta %>% 
  # include the sakura as emoji character in dataset
  mutate(
    sakura.emoji = emoji("cherry_blossom"),
    # join to a common year for axis label (2000 is a leap year)
    common.date = as.Date(paste0("2000-", format(blossom.date, "%m-%d")), "%Y-%m-%d")
  ) 

I created the common.date variable to scale the y-axis of the graph to a month-day format.

dta %>% 
  # plot in ggplot
  ggplot(aes(x=AD, y=common.date))+
  # alternatively, with geom_emoji:
  # geom_emoji("cherry_blossom", x=dta$AD, y=dta$Full.flowering.date..DOY., size = 3)+
  #geom_point(size = 0.5, colour = "red")+
  # include emoji as text, h-/vjust to center; strange it needs vjust 0.25 -- why? 975572 BD77A4
  geom_text(aes(label = sakura.emoji, hjust = 0.5, vjust = 0.25), family = "EmojiOne", size = 4, colour = "#FF506E")+
  # trend line
  geom_smooth(method = "loess", span = 0.1, fill = "#D2A5C2", colour = "grey", size = 0.5)

To copy the Economist’s graph as closely as possible, I manually set the y-axis breaks. The x-axis breaks are more interesting, because the original graphs has major ticks every 200 years, labelled, and minor ticks every 100 years in between, which are not labelled. This is unfortunately not possible in ggplot, currently, so I had to resort to a workaround.

I create the sequence of years, with ticks every 100 years (so 800, 900, 1000, ...). I then “overwrite” every even element (900, 1100, 1300, ...) of the vector, resulting in a “blank” label for those years. The axis label would thus look like this: 800, "", 1000, "", 1200, ....

scale_y_date(
  name = "Date of cherry-blossom peak-bloom",
  breaks = c("2000-03-27", "2000-04-01", "2000-04-10", "2000-04-20", "2000-05-01", "2000-05-04") %>% as.Date(),
  # Apr-01
  labels = scales::date_format("%b-%d")
)+
scale_x_continuous(
  limits = c(800, 2020),
  # axis ticks every 200 years
  breaks = seq(800, 2000, 100),
  # no minor axis ticks in ggplot2::theme(): http://stackoverflow.com/questions/14490071/adding-minor-tick-marks-to-the-x-axis-in-ggplot2-with-no-labels
  # length(breaks): 13; replace every even element with empty string, to remove from axis labels
  labels = replace(seq(800, 2000, 100), seq(2, 12, 2), ""), 
  name = "Year"
)

All that remains then are some theme() hacking to get the graph pretty.

Behold, 1200 years of cherry blossom history!

sakura plot

I’m not super happy with the graph colours, it seems too “light”, and pastel. Not really very readable. The Economist’s version is in purple; much more readable and much prettier to look at. I did want to use the sakura pink tones, though, and thus the graph colour scheme seems as fragile and transient as the actual sakura blossoms.

Concluding remarks

The code is available on github, as always.

Aono uses the data to predict March temperatures, and in various papers he manages to do this to an astonishing degree of accuracy: 0.1°C.

It’s very disheartening to see the blossom dates dramatically moving earlier into the year in the last 50 years. Global warming is to blame, and it always saddens me that people chose to ignore these facts… As a cold-loving northern German, I will need to find a retirement location in northern Norway, if things continue the way they do!

Sakura blossoms in Japan was originally published by Kirill Pomogajko at Opiate for the masses on April 11, 2017.

To leave a comment for the author, please follow the link and comment on their blog: Opiate for the masses.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)