Visualizing the economics of tourism in the Pacific with Plotly

[This article was first published on Kyle Walker, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the last month or so, I’ve been working a lot with Plotly to visualize data for my World Regional Geography course. Plotly is a web-based service that allows for the creation of interactive, D3-based visualizations that can be easily shared. Visualizations are deployed and hosted on Plotly’s servers, so the service requires a connection to the internet to work. One of the great things about Plotly is that it has APIs for popular data programming languages like R, Python, and Julia. I really got hooked on Plotly, however, when I learned about ggplotly, which allows you to convert ggplot2 graphs in R to interactive Plotly charts with just a couple extra lines of code. I’m going to go over an example of how I used ggplotly to assist with a discussion of the economic significance of tourism in the Pacific.


In my final class of the semester, I focused on Oceania (defined as Australia, New Zealand, and Pacific Island nations). In this class, I wanted to show students how small, relatively isolated countries in the Pacific can be acutely affected by larger global forces. We focused on topics such as trash accumulation in the Pacific, sea level rise due to climate change, and the ways in which island countries are connected to the global economy. As part of this, we discussed the role of tourism in Pacific Island economies. To help illustrate this to my students, I sought out data on the tourist economies of these countries, which is freely available from the World Travel & Tourism Council’s Economic Data Search Tool. To download the data, I followed these steps:

  1. For Step 1 of the search tool, I clicked Countries > Oceania, then selected Fiji, Kiribati, Other Oceania, Solomon Islands, Tonga, and Vanuatu.
  2. For Step 2, I selected “Travel & Tourism Total Contribution to GDP.” These figures include both tourism’s direct contributions to GDP (e.g. GDP generated by hotels, airlines, etc.) and its indirect contributions to the economy; see this report for more details.
  3. For Step 3, I selected “% share” as my unit, as I was interested in the extent to which tourism figures in the overall constitution of these countries’ economies.
  4. I chose the entire date range from 1988 to 2024 (projected), and clicked Submit.

The data can then be downloaded as an Excel file. However, the Excel file is formatted in a way that resembles the HTML on the web page, and as such is not in the “tidy” format that is required for visualization in ggplot2. As such, I saved the file as a CSV (which is available from my GitHub repository) and imported the data into R for some cleaning.


The R code below shows one way to clean up data downloaded from the WTTC in this particular format (multiple countries, one variable). I’ve written a function, tidy_WTTC, that should work for any dataset in this form available from the WTTC.

# Load packages and data

library(reshape2)
library(ggplot2)
library(plotly)
library(zoo)

dat <- read.csv('tourism.csv')

## Define the "tidy" function and call it

tidy_WTTC <- function(df) {
  
  start <- df[7, 2]
  end <- df[7, ncol(df)]
  
  df <- df[8:nrow(df), ]
  
  nms <- c("country", paste0("y", seq(start, end, 1)))
  
  names(df) <- nms
  
  df <- na.locf(df, fromLast = TRUE)
    
  df <- df[seq(1, nrow(df), 2), ]
  
  df.melt <- melt(df, id.vars = "country", value.name="value", variable.name = "year")
  
  df.melt$year <- as.numeric(gsub("y", "", df.melt$year))
  
  df.melt$value <- as.numeric(df.melt$value)
  
  df.melt
}

tidy_dat <- tidy_WTTC(dat)

A few notes about the above code. To do anything with Plotly in R, it is necessary to use the plotly R package, which is available from rOpenSci’s repository on GitHub. As such, it must be installed first using the devtools package with the command install_github("ropensci/plotly").

I’d also like to explain the function in brief. I’ve set it up to handle data downloads from the WTTC, provided that a) you’ve chosen a group of countries and one variable, and b) you’ve saved the Excel download as a CSV. Given the structure of the WTTC data, the function will detect the years you’ve chosen for your download and re-format the data accordingly. Next, I use the na.locf function from the zoo package to resolve the fact that country names are on one row in the original data file, with corresponding values on the next row; this fills in NAs with their correct values. I then remove the unnecessary rows, and then reshape and reformat the data for visualization with ggplot2.

The data are now ready for plotting with ggplot2, and can then be converted to an interactive Plotly chart. The code below shows how to get this done.

## Create the ggplot, initialize the plotly object, and convert the ggplot to plotly

t1 <- ggplot(tidy_dat, aes(x = year, y = value, color = country)) + 
  geom_line(size = 2) + 
  scale_color_brewer(palette = "Set1") + 
  labs(list(x = "Year", 
            y = "Total contribution of tourism to GDP (percent)", 
            title = "Total contribution of tourism to GDP (percent), select Pacific Island countries.  Data source: WTTC"))

py <- plotly("YOUR USERNAME HERE", "YOUR API KEY HERE")

py$ggplotly(t1)

The ggplot2 code is fairly straightforward; I’m making a basic line chart with minimal customization. After creating the ggplot, the last two lines of code are all that are necessary to create an interactive chart with Plotly. You will need to sign up for a Plotly account and get an API key to get this to work; fill in your own username and password where instructed to initialize the Plotly interface object, which I’ve called py. Finally, the py$ggplotly(t1) command converts the ggplot t1 into an interactive chart, which should appear in your browser. If everything has worked correctly, you’ll get a chart that looks like the one below:

(If the iframe is not showing up properly, click here to access the chart)



I now have an interactive chart that contains a wealth of information about the tourist economies of select Pacific Island countries. As the chart reveals, the economic significance of tourism figures differently from country to country in the Pacific. In places like Vanuatu and Fiji, tourism is a large component of the local economy and is projected to increase in importance. In contrast, tourism is less prominent in places like Tonga, which relies heavily on remittances from Tongans living abroad; this example is familiar to my students, as there is a large Tongan community living in cities very near to TCU.

By default, Plotly embeds several options for interacting with the chart. The user can click and drag on the chart itself to zoom in to particular areas; with a time series chart, this can be useful if the user wants to highlight a particular period of time in the graph. Additionally, tooltips appear on hover that give direct access to the data informing the chart. More interactivity is also available from the icons in the upper-right-hand corner of the chart, allowing users to zoom, pan, and change the desired tooltip display (all data series, or one at a time).

Once the chart is published to your Plotly account, you can customize it even further using Plotly’s web-based GUI. In the above example, I moved the position of the legend manually using the GUI to maximize the space occupied by the data; you can also edit the chart’s appearance, change the axis and chart titles, and add notes to the chart, among other options.

Using ggplotly in this way has become a major part of my workflow in preparing interactive materials for my teaching. Here are a sampling of other charts I’ve prepared in this way:

More examples can be found at my Plotly page, and I’ll be posting the code for these visualizations to GitHub. While ggplotly has worked great for me so far, it is still very early in development and only supports a limited number of chart types from ggplot2 (see this post for details). In turn, you’ll need to use Plotly’s APIs to create the unavailable chart types, which I have just started to work with. This summer, I’m going to explore using Plotly as well using Python, which has a well-developed API and appears to work very well with the IPython Notebook.

As always, please contact me at [email protected] or get in touch with me on Twitter if you have any questions or feedback.

Thanks to:

To leave a comment for the author, please follow the link and comment on their blog: Kyle Walker.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)