# Monte Carlo Part Two

June 12, 2018
By

(This article was first published on R Views, and kindly contributed to R-bloggers)

In a previous post, we reviewed how to set up and run a Monte Carlo (MC) simulation of future portfolio returns and growth of a dollar. Today, we will run that simulation many, many, times and then visualize the results.

Our ultimate goal is to build a Shiny app that allows an end user to build a custom portfolio, simulate returns, and visualize the results. If you just can’t wait, a link to the final Shiny app is available here.

This post builds off the work we did previously. I won’t go through the logic again, but the code for building a portfolio, calculating returns, mean and standard deviation of returns, and using them for a simulation is here:

``````library(tidyquant)
library(tidyverse)
library(timetk)
library(broom)
library(highcharter)

symbols <- c("SPY","EFA", "IJS", "EEM","AGG")

prices <-
getSymbols(symbols, src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE, warnings = FALSE) %>%
reduce(merge) %>%
`colnames<-`(symbols)

w <- c(0.25, 0.25, 0.20, 0.20, 0.10)

asset_returns_long <-
prices %>%
to.monthly(indexAt = "lastof", OHLC = FALSE) %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns)))) %>%
na.omit()

portfolio_returns_tq_rebalanced_monthly <-
asset_returns_long %>%
tq_portfolio(assets_col  = asset,
returns_col = returns,
weights     = w,
col_rename  = "returns",
rebalance_on = "months")

mean_port_return <-
mean(portfolio_returns_tq_rebalanced_monthly\$returns)

stddev_port_return <-
sd(portfolio_returns_tq_rebalanced_monthly\$returns)

simulation_accum_1 <- function(init_value, N, mean, stdev) {
tibble(c(init_value, 1 + rnorm(N, mean, stdev))) %>%
`colnames<-`("returns") %>%
mutate(growth =
accumulate(returns,
function(x, y) x * y)) %>%
select(growth)
}``````

That code allows us to run one simulation of the growth of a dollar over the next 10 years, with the `simulation_accum_1()` that we built for that purpose. Today, we will review how to run 51 simulations, though we could choose any number (and our Shiny application allows an end user to do just that).

First, we need an empty matrix with 51 columns, an initial value of \$1, and intuitive column names.

We will use the `rep()` function to create 51 columns with a 1 as the value, and `set_names()` to name each column with the appropriate simulation number.

``````sims <- 51
starts <-
rep(1, sims) %>%
set_names(paste("sim", 1:sims, sep = ""))``````

Take a peek at `starts` to see what we just created, and how it can house our simulations.

``head(starts)``
``````sim1 sim2 sim3 sim4 sim5 sim6
1    1    1    1    1    1 ``````
``tail(starts)``
``````sim46 sim47 sim48 sim49 sim50 sim51
1     1     1     1     1     1 ``````

51 columns, with a value of 1 in one row. This is where we will store the results of the 51 simulations.

Now, we want to apply `simulation_accum_1` to each of the 51 columns of the `starts` matrix, and we will do that using the `map_dfc()` function from the `purrr` package.

`map_dfc()` takes a vector – in this case, the columns of `starts` – and applies a function to it. By appending `dfc()` to the `map_` function, we are asking the function to store each of its results as the column of a data frame (`map_df()` does the same thing, but stores results in the rows of a data frame). After running the code below, we will have a data frame with 51 columns, one for each of our simulations.

We also need to choose how many months to simulate (the N argument to our simulation function) and supply the distribution parameters as we did before. We do not supply the `init_value` argument because the `init_value` is 1, that same 1 that is already in the 51 columns.

``````monte_carlo_sim_51 <-
map_dfc(starts,
simulation_accum_1,
N = 120,
mean = mean_port_return,
stdev = stddev_port_return)

tail(monte_carlo_sim_51 %>%  select(growth1, growth2,
growth49, growth50), 3)``````
``````# A tibble: 3 x 4
growth1 growth2 growth49 growth50

1    1.81    1.38     2.32     1.80
2    1.84    1.37     2.38     1.84
3    1.82    1.33     2.39     1.82``````

Have a look at the results. We now have 51 simulations of the growth of a dollar, and we simulated that growth over 120 months; however, the results are missing a piece that we need for visualization, namely a `month` column.

Let’s add that `month` column with `mutate()` and give it the same number of rows as our data frame. These are months out into the future. We will use `mutate(month = seq(1:nrow(.)))`, and then clean up the column names. `nrow()` is equal to the number of rows in our object. If we were to change to 130 simulations, that would generate 130 rows, and `nrow()` would be equal to 130.

``````monte_carlo_sim_51 <-
monte_carlo_sim_51 %>%
mutate(month = seq(1:nrow(.))) %>%
select(month, everything()) %>%
`colnames<-`(c("month", names(starts))) %>%
mutate_all(funs(round(., 2)))

tail(monte_carlo_sim_51 %>%  select(month, sim1, sim2,
sim49, sim50), 3)``````
``````# A tibble: 3 x 5
month  sim1  sim2 sim49 sim50

1   119  2.16  1.81  1.46  2.32
2   120  2.28  1.84  1.46  2.38
3   121  2.26  1.82  1.46  2.39``````

We have accomplished our goal of running 51 simulations, and could head to data visualization now, but let’s explore an alternative method using the the `rerun()` function from `purrr`. As its name implies, this function will “rerun” another function, and we stipulate how many times to do that by setting `.n = number of times to rerun`. For example to run the `simulation_accum_1` function 5 times, we would set the following:

``````monte_carlo_rerun_5 <-
rerun(.n = 5,
simulation_accum_1(1,
120,
mean_port_return,
stddev_port_return))``````

That returned a list of 5 data frames, or 5 simulations. We can look at the first few rows of each data frame by using `map(..., head)`.

``map(monte_carlo_rerun_5, head)``
``````[[1]]
# A tibble: 6 x 1
growth

1  1
2  0.983
3  0.965
4  0.946
5  0.967
6  0.962

[[2]]
# A tibble: 6 x 1
growth

1  1
2  0.980
3  0.975
4  0.964
5  0.969
6  0.914

[[3]]
# A tibble: 6 x 1
growth

1  1
2  1.03
3  0.997
4  0.979
5  1.04
6  1.04

[[4]]
# A tibble: 6 x 1
growth

1  1
2  0.974
3  0.962
4  0.943
5  0.942
6  0.963

[[5]]
# A tibble: 6 x 1
growth

1  1
2  0.990
3  1.02
4  1.06
5  1.12
6  1.13 ``````

Let’s consolidate that list of data frames to one `tibble`. We start by collapsing to vectors with `simplify_all()`, then add nicer names with the `names()` function and finally coerce to tibble with `as_tibble()`. Let’s run it 51 times to match our previous results.

``````reruns <- 51

monte_carlo_rerun_51 <-
rerun(.n = reruns,
simulation_accum_1(1,
120,
mean_port_return,
stddev_port_return)) %>%
simplify_all() %>%
`names<-`(paste("sim", 1:reruns, sep = " ")) %>%
as_tibble() %>%
mutate(month = seq(1:nrow(.))) %>%
select(month, everything())

tail(monte_carlo_rerun_51 %>%  select(`sim 1`, `sim 2`,
`sim 49`, `sim 50`), 3)``````
``````# A tibble: 3 x 4
`sim 1` `sim 2` `sim 49` `sim 50`

1    1.99    1.97     3.66     2.20
2    2.00    1.86     3.68     2.25
3    2.02    1.95     3.78     2.18``````

Now we have two objects holding the results of 51 simulations, `monte_carlo_rerun_51` and `monte_carlo_sim_51`.

Each has 51 columns of simulations and 1 column of months. Note that we have 121 rows because we started with an initial value of 1, and then simulated returns over 120 months.

Now let’s get to visualization with `ggplot()` and visualize the results in `monte_carlo_sim_51`. The same code flows for visualization would also apply to `monte_carlo_rerun_51`, but we will run them for only `monte_carlo_sim_51` here.

We start with a chart of all 51 simulations, and assign a different color to each one by setting `ggplot(aes(x = month, y = growth, color = sim))`. `ggplot()` will automatically generate a legend for all 51 time series, but that gets quite crowded. We will suppress the legend with `theme(legend.position = "none")`.

``````monte_carlo_sim_51 %>%
gather(sim, growth, -month) %>%
group_by(sim) %>%
ggplot(aes(x = month, y = growth, color = sim)) +
geom_line() +
theme(legend.position="none")``````

We can check the minimum, maximum and median simulation with the `summarise()` function here.

``````sim_summary <-
monte_carlo_sim_51 %>%
gather(sim, growth, -month) %>%
group_by(sim) %>%
summarise(final = last(growth)) %>%
summarise(
max = max(final),
min = min(final),
median = median(final))
sim_summary``````
``````# A tibble: 1 x 3
max   min median

1  4.81  1.31    2.3``````

We can clean up our original visualization by including only the max, min and median that were just calculated.

``````monte_carlo_sim_51 %>%
gather(sim, growth, -month) %>%
group_by(sim) %>%
filter(
last(growth) == sim_summary\$max ||
last(growth) == sim_summary\$median ||
last(growth) == sim_summary\$min) %>%
ggplot(aes(x = month, y = growth)) +
geom_line(aes(color = sim)) ``````

Now let’s port our results over to `highcharter`, but in a major departure from our usual code flow, we will pass a tidy `tibble` instead of an `xts` object.

Our first step is to convert the data from wide to long tidy format with the `gather()` function.

``````mc_gathered <-
monte_carlo_sim_51 %>%
gather(sim, growth, -month) %>%
group_by(sim)

``````# A tibble: 6 x 3
# Groups:   sim [1]
month sim   growth

1     1 sim1    1
2     2 sim1    0.99
3     3 sim1    1.01
4     4 sim1    1.06
5     5 sim1    1.08
6     6 sim1    1.1 ``````

We can now pass this `tibble` directly to the `hchart()` function, specify the type of chart as `line`, and then work with a similar grammar to `ggplot()`. The difference is we use `hcaes`, which stands for `highcharter aesthetic`, instead of `aes`.

``````# This takes a few seconds to run
hchart(mc_gathered,
type = 'line',
hcaes(y = growth,
x = month,
group = sim)) %>%
hc_title(text = "51 Simulations") %>%
hc_xAxis(title = list(text = "months")) %>%
hc_yAxis(title = list(text = "dollar growth"),
labels = list(format = "\${value}")) %>%
hc_exporting(enabled = TRUE) %>%
hc_legend(enabled = FALSE)``````

We just plotted 51 lines in `highcharter` using a tidy `tibble`. For tidy data fans out there, this is a big deal because we can stay in the grammar of the tidyverse but also use `highcharter`.

Very similar to what we did with `ggplot`, let’s isolate the maximum, minimum and median simulations and save them to an object called `mc_max_med_min`.

``````mc_max_med_min <-
mc_gathered %>%
filter(
last(growth) == sim_summary\$max ||
last(growth) == sim_summary\$median ||
last(growth) == sim_summary\$min) %>%
group_by(sim)``````

Now we pass that filtered object to `hchart()`.

``````hchart(mc_max_med_min,
type = 'line',
hcaes(y = growth,
x = month,
group = sim)) %>%
hc_title(text = "Min, Max, Median Simulations") %>%
hc_xAxis(title = list(text = "months")) %>%
hc_yAxis(title = list(text = "dollar growth"),
labels = list(format = "\${value}")) %>%
hc_exporting(enabled = TRUE) %>%
hc_legend(enabled = FALSE)``````

That concludes our visualization of Monte Carlo simulations.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...