IPO Exploration Part Two

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way.

Since there are several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive. If you don’t want to tax your machine too much, you can always subset down the data before running the script. With that, let’s do this!

I won’t go through the code again in detail since we covered it last time, but our first step is to get the tickers and IPO years for a large universe of companies. Here is the code to retrieve the tickers for all companies that went public between 2004 and 2014 – and have survived (meaning today’s analysis is going to have serious survivor bias).

nasdaq <-
  tq_exchange("NASDAQ")

amex  <- 
  tq_exchange("AMEX")

nyse <- 
  tq_exchange("NYSE")

company_ipo_sector <-
  nasdaq %>% 
  bind_rows(amex) %>% 
  bind_rows(nyse) %>% 
  select(symbol, company, ipo.year, sector) %>% 
  filter(!is.na(ipo.year) & ipo.year < 2015)

And here is what we get:

company_ipo_sector %>% 
  head()
# A tibble: 6 x 4
  symbol company                                ipo.year sector           
  <chr>  <chr>                                     <dbl> <chr>            
1 PIH    1347 Property Insurance Holdings, Inc.     2014 Finance          
2 FLWS   1-800 FLOWERS.COM, Inc.                    1999 Consumer Services
3 VNET   21Vianet Group, Inc.                       2011 Technology       
4 TWOU   2U, Inc.                                   2014 Technology       
5 JOBS   51job, Inc.                                2004 Technology       
6 ABIL   Ability Inc.                               2014 Capital Goods    

There are about 1,830 tickers in this data frame and it conveniently includes a column called ipo.year, which we’ll make us of later in this post. Our next step is to download the price histories of these tickers from tiingo. We’ll need to divide our tickers into smaller subsets and pull in the data in pieces so we don’t hit our tiingo API limits.

tickers_2004_2006 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2004, 2006)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2007_2009 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>%
  filter(!is.na(sector) &  between(ipo.year, 2007, 2009)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2010_2012 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2010, 2012)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

tickers_2013_2014 <-
  company_ipo_sector %>% 
  distinct(company, .keep_all = TRUE) %>% 
  filter(!is.na(sector) &  between(ipo.year, 2013, 2014)) %>% 
  group_by(ipo.year) %>%
  pull(symbol)

We now have 4 sets of tickers. Let’s pass them to the riingo function one at a time and set the resample_frequency to monthly, so we’re pulling in 12 prices per year per company instead of ~252.

prices_riingo_2004_2006 <-
  tickers_2004_2006 %>%
  riingo_prices(start_date = "2004-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_2007_2009 <-
  tickers_2007_2009 %>%
  riingo_prices(start_date = "2007-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)

prices_riingo_2010_2012 <-
  tickers_2010_2012 %>%
  riingo_prices(start_date = "2010-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_2013_2014 <-
  tickers_2013_2014 %>%
  riingo_prices(start_date = "2013-01-01", resample_frequency = "monthly") %>% 
  group_by(ticker)


prices_riingo_full <-
  prices_riingo_2004_2006 %>% 
  bind_rows(prices_riingo_2007_2009) %>% 
  bind_rows(prices_riingo_2010_2012) %>% 
  bind_rows(prices_riingo_2013_2014)

We have our prices, now let’s add a column of monthly returns with a call to mutate(monthly_returns = close/lag(close) - 1).

prices_riingo_full <-
  prices_riingo_full %>% 
  group_by(ticker) %>% 
  mutate(monthly_returns = close/lag(close) - 1)

One more step, I won’t bore you with the details but after a lot of slogging, I discovered a few errors in my data, where the ipo.year came after the first year in which the ticker was traded. That’s probably because of some equities being relisted on other exchanges or changing ticker names, but suffice it to say, it jacked up my portfolio analysis! I’m going to remove those problematic tickers with the code below. First, we filter(date == min(date)) to get the first day for which we have returns. Then we isolate the year of that first trade with mutate(first_trade = year(date)). If that year doesn’t match ipo.year, we pull() the ticker.

tickers_remove_mismatch_dates <- 
prices_riingo_full %>% 
  left_join(company_ipo_sector, by = c("ticker" = "symbol")) %>% 
  filter(date == min(date)) %>% 
  select(ticker, date, ipo.year) %>% 
  mutate(first_trade = year(date)) %>% 
  filter(ipo.year != first_trade) %>% 
  pull(ticker)

We now have a vector of the tickers to remove, but we haven’t removed them yet. I separated these two steps so I could take a peak at those tickers first, and maybe come back to later to address this issue in a better way.

Let’s go ahead and remove them (I’m going to remove PGTI also, it was showing up duplicate downstream in my code).

prices_riingo_full <- 
 prices_riingo_full %>% 
 filter(!(ticker %in% tickers_remove_mismatch_dates) & ticker != 'PGTI')

A final step before saving: we’ll left_join() the company_ipo_sector tibble so that we have company name and ipo.year as part of this object.

prices_riingo_full <- 
prices_riingo_full %>% 
  left_join(company_ipo_sector, by = c("ticker" = "symbol")) 

To avoid going through that process each time we revisit this project, we can save this price/returns data as a .RDS file.

write_rds(prices_riingo_full, "prices_riingo_full.RDS")

But, another exciting way to save and share data is the new pins package (see intro to pins here). Pins is a package that makes it easy to save and share data. I’m going to use RStudio Connect for that, but you could use GitHub or kaggle as well.

First, we register our Connect server.

library(pins)

pins::board_register(server = "path to your RStudio Connect server here", key = "your API key here", board = "rsconnect")

Now, we’re ready to pin this data up to Connect where we, or any of our colleagues, could pull down this data in the future and take advantage of all the gymnastics we just performed to get this in shape.

pins::pin(
  x = prices_riingo_full, 
  name = "ipo_riingo_prices_pins", 
  description = "Monthly price history of IPOs from 2004 - 2014.", 
  board = "rsconnect"
)

For the curious, here’s how to retrieve that data from RStudio Connect.

ipo_riingo_prices_pins <- 
  pins::pin_get(name = "ipo_riingo_prices_pins", 
  board = "rsconnect")

Now, back to our ultimate goal: we want to see how portfolios that allocated dollars to IPOs each year would have performed. That is, we’ll build a portfolio for 2004, 2005…through to 2014, 11 portfolios consisting solely of the companies that IPO’d in a specific year. Obviously, we are dealing with selection bias, meaning if we had implemented this strategy, we’d have invested in lots of companies that went to zero and delisted. Those dead companies didn’t make it to today’s project because we pulled the tickers of companies currently listed on the AMEX, NYSE and NASDAQ. So don’t try this at home and nothing in this post is investment advice.

Let’s start by building a portfolio of stocks that IPO’d in 2006, assuming we invest equally in each company. We’ll select() a few columns needed for the analysis and then filter(ipo.year == 2006). Next we use the built-in tq_portfolio() function from tidyquant wherein we specify the assets_col as our tickers, the returns_col as monthly_returns and paste together a name that incorporates 2006. We also set it to rebalance_on = "months" so that each month we make sure to maintain an equal weighting to each company. We do not need to specify a weights argument because equal weighting is the default.

prices_riingo_full %>%
  select(ticker, date, monthly_returns, ipo.year) %>% 
  filter(ipo.year == 2006) %>% 
  group_by(ticker) %>% 
  filter(!is.na(monthly_returns)) %>%
  tq_portfolio(assets_col  = ticker,
               returns_col = monthly_returns,
               col_rename  = paste(2006, "_port_returns", sep = ""),
               wealth.index = F,
               rebalance_on = "months") %>% 
  tail()
# A tibble: 6 x 2
  date                `2006_port_returns`
  <dttm>                            <dbl>
1 2019-05-31 00:00:00             0.261  
2 2019-06-28 00:00:00             0.0533 
3 2019-07-31 00:00:00             0.0346 
4 2019-08-30 00:00:00            -0.00789
5 2019-09-30 00:00:00            -0.00539
6 2019-10-31 00:00:00             0.0233 
  # slice(1:3, (n()-3):n())

That code works for 2006, but now we want to scale it for more general use. We can turn the code flow into a function that takes year as an argument and then apply the function to all of our IPO returns on a year-by-year basis. I’m also going to add a second argument, show_growth, that can be set to TRUE or FALSE if we wish output the growth of a dollar instead of monthly returns.

ipo_by_year_portfolios <- function(year, show_growth = F){
  
  prices_riingo_full %>%
  select(ticker, date, monthly_returns, ipo.year) %>% 
  filter(ipo.year == year) %>% 
  tq_portfolio(assets_col  = ticker,
               returns_col = monthly_returns,
               col_rename  = paste(year, "_port_returns", sep = ""),
               wealth.index = show_growth,
               rebalance_on = "months")
  
}

Let’s test that function on 2006 and confirm we get the same result.

ipo_by_year_portfolios(2006)  %>% 
  tail()
# A tibble: 6 x 2
  date                `2006_port_returns`
  <dttm>                            <dbl>
1 2019-05-31 00:00:00             0.261  
2 2019-06-28 00:00:00             0.0533 
3 2019-07-31 00:00:00             0.0346 
4 2019-08-30 00:00:00            -0.00789
5 2019-09-30 00:00:00            -0.00539
6 2019-10-31 00:00:00             0.0233 

And now, we want to apply that function to every year in our data set and mash the results together into one data frame.

We’ll start by creating a vector of years from 2004 to 2014.

years_numeric <- seq(2004, 2014, by = 1)

And now, we map() our function across that vector of years. If we just ran map(years_numeric, ipo_by_year_portfolios), it would result in 11 lists of portfolio returns, one for each year of our IPOs. I want a data frame with all the results stored together. Let’s use reduce(left_join) for that task.

returns_each_year_ipo_portfolios <-
map(years_numeric, ipo_by_year_portfolios) %>% 
  reduce(left_join) 

returns_each_year_ipo_portfolios %>% 
  head()
# A tibble: 6 x 12
  date                `2004_port_retu… `2005_port_retu… `2006_port_retu…
  <dttm>                         <dbl>            <dbl>            <dbl>
1 2004-02-27 00:00:00          0                     NA               NA
2 2004-03-31 00:00:00         -0.00296               NA               NA
3 2004-04-30 00:00:00         -0.00141               NA               NA
4 2004-05-31 00:00:00          0.00703               NA               NA
5 2004-06-30 00:00:00         -0.0194                NA               NA
6 2004-07-30 00:00:00         -0.0472                NA               NA
# … with 8 more variables: `2007_port_returns` <dbl>,
#   `2008_port_returns` <dbl>, `2009_port_returns` <dbl>,
#   `2010_port_returns` <dbl>, `2011_port_returns` <dbl>,
#   `2012_port_returns` <dbl>, `2013_port_returns` <dbl>,
#   `2014_port_returns` <dbl>

We now have one data frame with a date column that runs from the beginning of 2004, the first year for which we pulled in IPO tickers, to the today’s date, the last date for which we have returns for these companies. Then we have one column with the returns for each yearly portfolio constructed by equal weighting each company that IPO’d from 2004 to 2014. Since each column represents a year, we start out with values for the 2004 column but the rest have NA for all of 2004. Then in 2005, the 2005 portfolio springs into existence, but the rest of the columns have NA. And then each year another column gets values as new companies IPO in those years.

We constructed this so that each year we allocated fresh money to the new IPOs, equal weighted, with no regard to the past or any other market conditions. We didn’t have to do any of that, of course. We could use a very similar construction to tweak our existing portfolio based on what happened last year, or look at the past performance of basket of funds in a sector and then allocate to certain tickers going forward. In that case, we would map our function to first create a signal year-by-year, then run our allocation algorithm based on the signal from the previous year. Also note that we’re doing something a bit unusual by constructing each portfolio by year and then doing nothing beyond a monthly rebalance: not quite set it and forget it, but pretty close.

Let’s dig in to our new portfolio returns object by calculating the Sharpe Ratio for each of these IPO portfolios. We currently have our data in wide, decidedly un-tidy format so let’s pivot_longer() this data to make it tidy (long time readers or dplyers might note that we would have used gather() for this a few months ago - pivot_longer() has replaced gather() in the latest release of dplyr). We are treating each yearly portfolio as a discrete return stream so we can goup_by() the portfolio_by_year column. From there we call tq_performance(), same as if these return streams were from ETFs or any other asset. Our result should be one Sharpe ratio for each year, or 11 Sharpe Ratios.

port_sharpes <-
  returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>%
  filter(!is.na(monthly_return)) %>%
  tq_performance(Ra = monthly_return,
                 performance_fun = SharpeRatio,
                 Rf = 0,
                 FUN= "StdDev") %>%
  `colnames<-`(c("portfolio_by_year", "port_sharpe"))%>%
  add_column(year = years_numeric)

port_sharpes
# A tibble: 11 x 3
# Groups:   portfolio_by_year [11]
   portfolio_by_year port_sharpe  year
   <chr>                   <dbl> <dbl>
 1 2004_port_returns       0.234  2004
 2 2005_port_returns       0.192  2005
 3 2006_port_returns       0.249  2006
 4 2007_port_returns       0.190  2007
 5 2008_port_returns       0.142  2008
 6 2009_port_returns       0.220  2009
 7 2010_port_returns       0.279  2010
 8 2011_port_returns       0.152  2011
 9 2012_port_returns       0.309  2012
10 2013_port_returns       0.182  2013
11 2014_port_returns       0.218  2014

Let’s port this data to plotly and create a bar chart, where the height of each bar is the portfolio’s Sharpe Ratio.

(
port_sharpes %>% 
  ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) + 
  geom_col(width = .4) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
  theme_minimal()
) %>% 
  ggplotly()

Here is the exact same data, except displayed as a point chart. Notice how defaults to a slightly different scale, with the y-axis not extending all the way to 0. That makes the poor performance of 2008 seem even more dramatic. Were the companies that IPO’d in 2008 all dogs, or is it just a bad idea to invest in IPOs when the market is in a sharp correction because of a financial crisis?

(
port_sharpes %>% 
  ggplot(aes(x = year, y = port_sharpe, fill = portfolio_by_year)) + 
  geom_point(size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(x = "", y = "sharpe by year", title = "IPO Portfolio Sharpe Ratios") +
  theme_minimal()
) %>% 
  ggplotly()

If you’re an avid reader of these posts, it won’t surprise you to learn that we’re going to calculate the rolling Sharpe Ratio as well. We can use the same code as last time and the blazing fast roll package.

library(roll)
returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>% 
  mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sharpe = rolling_mean/rolling_sd) %>% 
  na.omit() %>% 
  head()
# A tibble: 6 x 6
# Groups:   portfolio_by_year [1]
  date                portfolio_by_ye… monthly_return rolling_mean
  <dttm>              <chr>                     <dbl>        <dbl>
1 2004-07-30 00:00:00 2004_port_retur…       -0.0472     -0.0107  
2 2004-08-31 00:00:00 2004_port_retur…        0.00956    -0.00906 
3 2004-09-30 00:00:00 2004_port_retur…        0.0493     -0.000354
4 2004-10-29 00:00:00 2004_port_retur…        0.00395     0.000539
5 2004-11-30 00:00:00 2004_port_retur…        0.131       0.0211  
6 2004-12-31 00:00:00 2004_port_retur…        0.0737      0.0367  
# … with 2 more variables: rolling_sd <dbl>, rolling_sharpe <dbl>

And let’s port these rolling calculations straight to plotly.

(
 returns_each_year_ipo_portfolios %>%
  pivot_longer(-date, names_to = "portfolio_by_year", values_to = "monthly_return") %>%
  group_by(portfolio_by_year) %>%
  arrange(portfolio_by_year, date) %>% 
  mutate( rolling_mean = roll_mean(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sd = roll_sd(as.matrix(monthly_return), 6, complete_obs  = T),
         rolling_sharpe = rolling_mean/rolling_sd,
         date = ymd(date)) %>% 
  na.omit() %>% 
  ggplot(aes(x = date, y = rolling_sharpe, color = portfolio_by_year)) +
   scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
  geom_line() +
   theme_minimal() 
) %>% 
  ggplotly()

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)