Reproducible Finance with R: ETF Country Exposure

[This article was first published on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Jonathan Regenstein

Today, we are going to tackle a project that has long been on my wish list: a Shiny app to take a fund or portfolio, analyze its exposure to different countries, and display those exposures on a world map. Now you know how exciting my wishlists are.

Before describing our data importing/wrangling work here in the Notebook, it might be helpful to look at where we’re headed. The final Shiny app is here. This is similar to a previous project because we are building a leaflet map, shading it according to data added to the spatial dataframe, and including another HTML widget that is reponsive to the map. However, our current project differs in important ways and has a completely different use.

The previous project allowed a user to click a country on the map and view the time series of returns. Our current project will allow the user to choose an ETF and see how that ETF is invested in different countries by how a world map is shaded.

From a substantive perspective, this app helps visualize country risks instead of returns – indeed, it’s the first in our series that does not import stock returns in any way. From an R perspective, in our current project the map is the responsive object according to user inputs, whereas before, the dygraph was the responsive object according to user clicks on a map. They are related and require spatial dataframes, but very different.

If you looked closely at the Shiny app, you noticed that we do have a data object that is responsive to a map click: we display a datatable of companies held by the ETF in whatever country is clicked. That is, if a user chooses an ETF and sees by the shading that the ETF is allocated X% to China, the user can click on the map to see which company the ETF owns in China. That functionality is similar to the dygraphs functionality, except of course, we have to wire up a datatable and do some filtering by country instead of passing an xts object to dygraphs. The fulcrum will still be the clicked map shape.

Alright, that app is what we’re ultimately building but, by way of what we’ll do in this Notebook, here’s the roadmap.

First, we are going to grab the data for one fund, the MSCI Emerging Markets ETF. Note that we are not going to get return data over time. Instead, we just want a snapshot of the ETF holdings: its constituents, their weights, and their home countries. Our eventual app will include several ETFs, but we are going to work with one ETF in this Notebook, with the foreknowledge that we want to reuse our steps when it’s time to build the Shiny app. In short, let’s get it right for this Emerging Markets ETF, and then we can iterate over other ETFs when we move to building our Shiny app.

After we download the snapshot of the emerging markets fund, we will do some data wrangling and some country weight aggregation, and then merge that data to our spatial dataframe. Adding that data will depend on the ETF using the same country naming convention as our spatial dataframe, so we’ll pay attention to that in the wrangling process.

Once we add the data to our spatial dataframe, we will recycle some old code, build a leaflet map, and shade it according to the ETF’s country exposure. This is just a test to see how things will look in the Shiny app, and we can even play around with different color palettes to get things just right.

Once we have the map aesthetics sorted, we’ll turn to Part Two: displaying the details of each country holding. Really, this is just filtering our dataframe by country name – whatever country the user clicks – but we’ll go ahead and make sure things look how we want in this Notebook, and then pass that object to our app eventually.

Let’s get to it!

First, let’s grab the fund data from MSCI’s homepage. We will use the readcsv() function from the readr package. We will title it emergingmarkets_fund, since we’ll be pulling in other funds later.

Note that we have to skip the first 11 rows, which is why the ‘skip = 11’ argument is included. That’s because this csv file is loaded with oddly formatted data in the first 11 rows. If we don’t skip those 11 rows, this import will be totally unhelpful. The ‘import dataset’ button in the IDE saved me minutes/hours of frustration here!

Now, we have our fund data and the wrangling begins. We are actually going to use this initial object to create two other objects: one will be merged with the spatial dataframe, and one will be a standalone object to be loaded in our Shiny app.

Those country weights are striking! China + Korea + Taiwan comprise 51% of this fund. The fund is concentrated in economies that are probably closely linked. Perhaps that’s by design? Perhaps the inter-economy correlation isn’t as high as I believe? A cross-border investment or trade Shiny app would be helpful here.

It’s worth a second to consider the definition of ’emerging market’, a term that has become ubiquitous and has a know-it-when-we-see-it feel (if you’re not into political economy, feel free to skip this paragraph). The phrase was coined in 1981 by the World Bank’s Antoine Van Agtmael to help encourage investment in developing nations, as he felt that the phrase ‘Third World’ country was both distasteful and stifling to investors. Learn more here. Today, the phrase connotes an economy that is growing and transitioning from developing to developed, though some commentators include a political transition as well. Since we are working with an MSCI fund, we should consider their definition. It wasn’t easy to track down, but according to the Financial Times, MSCI takes into account the number of listed companies of a certain size (an economic measure) and openness to foreign capital (a political measure).

Back to our task at hand: we have downloaded the fund data and gotten it into shape to be added to our shapefile. That process is the exact same as in our previous post, so before we do that, let’s use the original fund data to create one other object to store country-level detail on companies, weights and sectors. If that seems a bit confusing, head back to the Shiny app and click on a country. The datatable displays company names and details, and we need to create a dataframe to extract and hold that data.

This is what a user of our Shiny app will see upon clicking on Brazil; it is the country-level detail of how the fund is invested in Brazil. We will save that ‘EEM’ object in the .RDat file so it can be loaded into our Shiny app.

Okay, let’s go ahead and build that map of the world and add our fund country weights to it. This process is identical to how we did it here, but we’ll go through the steps again.

First, let’s download the spatial dataframe. We will also use the ms_simplify() function from rmapshaper to reduce the size of the dataframe. This function will reduce the number of longitude and latitude coordinates used to build each country. It will make loading faster in our Shiny app, but won’t affect any of our logic.

Now, we will use the merge() function from the sp package to add our country weight data. Remember above where we made sure to use a consistent country naming convention when wrangling the ETF data? This is where it will come in handy – we use the ‘name’ column to perform the merge. After the merging, ETF exposures will be added for each country that has a match in the ‘name’ column. For those with no match, the EEM column will be filled with NA.

We have our data added to the shapefile. Let’s go ahead and construct a map. First we’ll build a popup to show some detail, then we will create a green palette and a purple palette for no other reason than to see which is more visually appealing.

Let’s invoke leaflet! As before, we will use layerId = ~name. This is, again, massively important because when we create a Shiny app, we want to pass country names to our datatable and filter accordingly. The layerId is how we’ll do that: when a user clicks on a country, we capture the layerId, which is a country name that can be used for filtering.

Both those maps look good to me, but purple might be the way to go ultimately. That’s a decision for next time – see you then!

To leave a comment for the author, please follow the link and comment on their blog: RStudio. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)