Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, we go back a bit to where we probably should have started in the first place, but it wouldn’t have been as much fun. In our previous work on volatility, we zipped through the steps of data import, tidy and transformation. Let’s correct that oversight and do some spade work on transforming daily asset prices to monthly portfolio log returns.

Our five-asset portfolio will consist of the following securities and weights:

+ SPY (S&P500 fund) weighted 25%
+ EFA (a non-US equities fund) weighted 25%
+ IJS (a small-cap value fund) weighted 20%
+ EEM (an emerging-mkts fund) weighted 20%
+ AGG (a bond fund) weighted 10%

A brief interlude on portfolios: a portfolio is a weighted collection of assets (its etymology harkens back to the Latin for “to carry sheets of paper”, which I s’pose made its way to the world of investments because securities used to be sheets of paper). The theoretical reason that rational investors prefer a portfolio to a single asset is that a portfolio can offer a better risk/return trade-off due to low or negative covariance amongst portfolio components.

Back to the task at hand: transform a collection of daily ETF prices into an object of portfolio log returns.

library(tidyverse)
library(tidyquant)
library(timetk)

First, we import daily prices for the five ETFs, using getSymbols to grab the data, map(~Ad(get(.))) to select adjusted prices only, and reduce(merge) to mash our five prices into one xts object.

# The symbols vector holds our tickers.
symbols <- c("SPY","EFA", "IJS", "EEM","AGG")

# The prices object will hold our raw price data throughout this book.
prices <-
getSymbols(symbols, src = 'yahoo', from = "2005-01-01",
auto.assign = TRUE, warnings = FALSE) %>%
reduce(merge) %>%
colnames<-(symbols)

Next, we convert those daily adjusted prices to monthly log returns using two methods. For the first method, we stay in the xts world.

prices_monthly <- to.monthly(prices, indexAt = "last", OHLC = FALSE)
asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log"))

For the second method, we will head to the tidyverse/tidyquant world. We will convert from xts to tibble using a call to tk_tbl(preserve_index = TRUE, rename_index = "date"). We will add a column for log returns using mutate(returns = (log(returns) - log(lag(returns)))).

# Tidyverse method, to long, tidy format
asset_returns_long <-
prices %>%
to.monthly(indexAt = "last", OHLC = FALSE) %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns))))

Have a peek at both asset return objects.

head(asset_returns_xts)
##                     SPY          EFA         IJS         EEM          AGG
## 2005-02-28  0.020688126  0.037150948  0.02860922  0.09241734 -0.003721035
## 2005-03-31 -0.018461970 -0.026583514 -0.02388198 -0.08240681 -0.009790455
## 2005-04-29 -0.018913092 -0.016309073 -0.05255677 -0.01255390  0.017081857
## 2005-05-31  0.031716351 -0.008674664  0.05973603  0.03111818  0.008242118
## 2005-06-30  0.001514103  0.014225362  0.03840792  0.03892299  0.008724056
## 2005-07-29  0.037547542  0.029527397  0.05677107  0.07400825 -0.010408874
## # A tibble: 6 x 3
## # Groups:   asset [1]
##         date asset      returns
##       <date> <chr>        <dbl>
## 1 2005-01-31   SPY           NA
## 2 2005-02-28   SPY  0.020688126
## 3 2005-03-31   SPY -0.018461970
## 4 2005-04-29   SPY -0.018913092
## 5 2005-05-31   SPY  0.031716351
## 6 2005-06-30   SPY  0.001514103

Do we notice any differences?

First, have a look at the left most part of asset_returns_xts, where the date is stored. The asset_returns_xts has a date index, not a column. It is accessed via index(asset_returns_xts). asset_returns_long has a column called “date”, accessed via the $date convention, i.e., asset_returns_long$date.

Second, notice the first date observation for January of 2005. asset_returns_long contains NA, and asset_returns_xts excludes the observation completely. Does it matter? It depends. In a few weeks when we get to the Sortino Ratio, we will see that it can matter quite a bit.

Third, asset_returns_xts is in wide format, which in this case means there is a column for each of our assets. This is the format that xts likes, and it’s the format that is easier to read as a human. However, asset_returns_long is in long, tidy format so that each variable has its own column. It’s a bit harder to read as human, but the tidyverse wants data in this format.

Now on to constructing a portfolio and calculating returns. To turn these five ETFs into a portfolio we need to assign them weights. Let’s first create a weights vector.

w <- c(0.25, 0.25, 0.20, 0.20, 0.10)

Before we use the weights in our calculations, we will run a quick sanity check in the next code chunk. This might not be necessary with five assets as we have today, but it is good practice because if we had 50 assets, it could save us a lot of grief to catch a mistake early.

# Make sure the weights line up with assets.
asset_weights_sanity_check <- tibble(w, symbols)
asset_weights_sanity_check
## # A tibble: 5 x 2
##       w symbols
##   <dbl>   <chr>
## 1  0.25     SPY
## 2  0.25     EFA
## 3  0.20     IJS
## 4  0.20     EEM
## 5  0.10     AGG

Make sure that tibble match up with the portfolio we want to create.

Finally, make sure the weights sum to 100%, or 1. Again, we can eyeball this with five assets, but with 50 assets it would be easier to run the sanity check.

sum(asset_weights_sanity_check$w) ## [1] 1 They sum to 1. Good to go, and on to portfolio returns. We will start with the textbook equation for the return of a multi-asset portfolio which is: $Return_{portfolio} = W_{1}*Return_{asset1}~+~W_{2}*Return_{asset2}~+~W_{3}*Return_{asset3}~+~W_{4}*Return_{asset4}~+~W_{5}*Return_{asset5}$ Here’s the LaTeX code for that equation. # $$Return_{portfolio} = W_{1}*Return_{asset1}~+~W_{2}*Return_{asset2}~+~W_{3}*Return_{asset3}~+~W_{4}*Return_{asset4}~+~W_{5}*Return_{asset5}$$ We ground through the LaTeX; now let’s grind through the R calculation by hand instead of using built-in functions. First, assign each weight from our w vector to a variable. Next, assign each asset return stored in asset_returns_xts to a variable. Last, we insert those new variables into the equation. w_1 <- w[1] w_2 <- w[2] w_3 <- w[3] w_4 <- w[4] w_5 <- w[5] asset1 <- asset_returns_xts[,1] asset2 <- asset_returns_xts[,2] asset3 <- asset_returns_xts[,3] asset4 <- asset_returns_xts[,4] asset5 <- asset_returns_xts[,5] portfolio_returns_byhand <- (w_1 * asset1) + (w_2 * asset2) + (w_3 * asset3) + (w_4 * asset4) + (w_5 * asset5) names(portfolio_returns_byhand) <- "returns" Our first portfolio returns calculation is now complete and stored as portfolio_returns_byhand. From a substantive perspective, we are finished and could head to visualization. We want to cover more methods, though, so let’s head to to the xts world and the PerformanceAnalytics package. We didn’t explicitly load that package in the setup, because tidyquant imports it for us. We will use theReturn.portfolio function, which requires two arguments for a portfolio, an xts object of asset returns, and a vector of weights. We have those at hand: asset_returns_xts and w. It’s not necessary, but we will set rebalance_on = "months" so we can confirm it matches our by-hand calculations. Remember, in the by-hand equation, the portfolio weights are fixed, meaning they never change on a month-to-month basis. That is equivalent to re-balancing every month, which in practice would be quite rare. portfolio_returns_xts_rebalanced_monthly <- Return.portfolio(asset_returns_xts, weights = w, rebalance_on = "months") %>% colnames<-("returns") Next let’s change to a more realistic annual re-balancing and set rebalance_on = "years". This will change our results so that they no longer match our by-hand calculation, which effectively re-balanced every month (since we hard-coded asset weights to be the same each month). portfolio_returns_xts_rebalanced_yearly <- Return.portfolio(asset_returns_xts, weights = w, rebalance_on = "years") %>% colnames<-("returns") We can take a peek at our three portfolio objects and see how the annual re-balance made a small but important difference. head(portfolio_returns_byhand) ## returns ## 2005-02-28 0.03829298 ## 2005-03-31 -0.03349817 ## 2005-04-29 -0.02011949 ## 2005-05-31 0.02475548 ## 2005-06-30 0.02027345 ## 2005-07-29 0.04188371 head(portfolio_returns_xts_rebalanced_monthly) ## returns ## 2005-02-28 0.03829298 ## 2005-03-31 -0.03349817 ## 2005-04-29 -0.02011949 ## 2005-05-31 0.02475548 ## 2005-06-30 0.02027345 ## 2005-07-29 0.04188371 head(portfolio_returns_xts_rebalanced_yearly) ## returns ## 2005-02-28 0.03829298 ## 2005-03-31 -0.03418759 ## 2005-04-29 -0.02018237 ## 2005-05-31 0.02441794 ## 2005-06-30 0.02032339 ## 2005-07-29 0.04228070 Do you notice where the annual re-balancing starts to show a difference from monthly re-balancing? As before, we could stop here and have accomplished our substantive task (twice already - by hand and using the built-in function from PerformanceAnalytics), but we want to explore alternate methods in the world of tidyverse/tidyquant. We will use our long, tidy-formatted asset_returns_long and convert to portfolio returns using the tq_portfolio function from tidyquant. The tq_portfolio function takes a tibble and then asks for an assets column to group by, a returns column to find return data, and a weights column. It’s a wrapper for Return.portfolio, and thus also accepts the argument rebalance_on = "months". Since we are re-balancing by months, we should again get a portfolio returns object that matches our two existing objects portfolio_returns_byhand and portfolio_returns_xts_rebalanced_monthly. portfolio_returns_tq_rebalanced_monthly <- asset_returns_long %>% tq_portfolio(assets_col = asset, returns_col = returns, weights = w, col_rename = "returns", rebalance_on = "months") If we want to re-balance annually, it’s the same code as above, except we set rebalance_on = "years". portfolio_returns_tq_rebalanced_yearly <- asset_returns_long %>% tq_portfolio(assets_col = asset, returns_col = returns, weights = w, col_rename = "returns", rebalance_on = "years") We now have two more portfolio returns objects and they are both tidy tibbles. Let’s take a quick look and compare how a tidy tibble of portfolio returns compares to an xts object of portfolio returns. head(portfolio_returns_tq_rebalanced_yearly) ## # A tibble: 6 x 2 ## date returns ## <date> <dbl> ## 1 2005-01-31 0.00000000 ## 2 2005-02-28 0.03829298 ## 3 2005-03-31 -0.03418759 ## 4 2005-04-29 -0.02018237 ## 5 2005-05-31 0.02441794 ## 6 2005-06-30 0.02032339 head(portfolio_returns_xts_rebalanced_yearly) ## returns ## 2005-02-28 0.03829298 ## 2005-03-31 -0.03418759 ## 2005-04-29 -0.02018237 ## 2005-05-31 0.02441794 ## 2005-06-30 0.02032339 ## 2005-07-29 0.04228070 Again, we can see a discrepancy for January of 2005. Our xts object elides that date completely, while our tibble records it as a 0.00. Since there is only one column of returns, there is no wide versus long format for the tibble, and it looks almost identical to the xts object. The only difference is the date: the tibble has a column that holds the date that can be accessed with the$ operator, whereas the xts object has a date index, accessed with index.

That’s all for today. The xts and tidyquant object have their own uses and advantages depending on our end goal. Next time we will think about how to visualize portfolio returns, and how the different objects fit into different visualization paradigms.