IPO Exploration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Inspired by recent headlines like “Fear Overtakes Greed in IPO Market after WeWork Debacle” and “This Year’s IPO Class is Least Profitable since the Tech Bubble”, today we’ll explore historical IPO data, and next time we’ll look at the the performance of IPO-driven portfolios constructed during the ten-year period from 2004 to 2014. I’ll admit, I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline-gobbling whales dominating the collective consciousness. We hear a lot about a few IPOs each year, but there are dozens about which we hear nothing.
Here are the packages we’ll be using today.
library(tidyverse) library(tidyquant) library(dplyr) library(plotly) library(riingo) library(roll) library(tictoc)
Let’s get all the companies listed on the NASDAQ, NYSE, and AMEX exchanges and their IPO dates. That’s not every company that IPO’d in those years, of course, but we’ll go with it as a convenience for today’s purposes. Fortunately, the tq_exchange()
function from tidyquant
makes it painless to grab this data.
nasdaq <- tq_exchange("NASDAQ") amex <- tq_exchange("AMEX") nyse <- tq_exchange("NYSE")
Big-time warning alert: not only have we missed companies that IPO’d and are not listed on those exchanges, we have also missed companies that IPO’d and have ceased to exist, i.e., the companies that went bust, i.e., the very companies that would absolutely scare the heck out of us before we invested in things like recent IPOs. You would need to correct for major survivor bias should you choose to explore this for actual trading. As we’ll see next time, even without those dead companies, portfolios built upon these IPOs are very risky. And, while we’re on the caveat theme, nothing in this post is financial advice in any way.
Back to the exciting part, the code!
Notice how we pulled in data from three different data sources, but our objects have the same column structures. That’s due to some nice work from the tidyquant
authors, and it makes our lives easier in the next step, wherein we use bind_rows()
to combine the data into one object.
After binding these data together, we select(symbol, company, ipo.year, sector)
to isolate a few columns of interest. We will also filter out any tickers with ipo.year
equal to NA with filter(!is.na(ipo.year))
.
company_ipo_sector <- nasdaq %>% bind_rows(amex) %>% bind_rows(nyse) %>% select(symbol, company, ipo.year, sector) %>% filter(!is.na(ipo.year)) company_ipo_sector %>% head() # A tibble: 6 x 4 symbol company ipo.year sector <chr> <chr> <dbl> <chr> 1 TXG 10x Genomics, Inc. 2019 Capital Goods 2 YI 111, Inc. 2018 Health Care 3 PIH 1347 Property Insurance Holdings, Inc. 2014 Finance 4 FLWS 1-800 FLOWERS.COM, Inc. 1999 Consumer Services 5 BCOW 1895 Bancorp of Wisconsin, Inc. 2019 Finance 6 VNET 21Vianet Group, Inc. 2011 Technology
Before we start implementing and testing portfolio strategies in next week’s post, let’s spend today on some exploration of this data set. We have the sector and IPO year of each sector, and a good place to start is visualizing the number of IPOs by year. The key here is to call count(ipo.year)
, which will do exactly what we hope: give us a count of the number of IPOs by year
company_ipo_sector %>% group_by(ipo.year) %>% count(ipo.year) %>% tail() # A tibble: 6 x 2 # Groups: ipo.year [6] ipo.year n <dbl> <int> 1 2014 258 2 2015 210 3 2016 184 4 2017 274 5 2018 397 6 2019 310
Then we want to pipe straight to ggplot()
and put the new n
column on the y-axis.
company_ipo_sector %>% group_by(ipo.year) %>% count(ipo.year) %>% ggplot(aes(x = ipo.year, y = n)) + geom_col(color = "cornflowerblue") + scale_x_continuous(breaks = scales::pretty_breaks(n = 20)) + theme(axis.text.x = element_text(angle = 90))
I like that chart, but it would be nice to be able to hover on the bars and get some more information. Let’s wrap the whole code flow inside of the ggplotly()
function from plotly
, which will convert this to an interactive chart. The names of the columns will be displayed in the tooltip, so let’s use rename(num IPOs = n, year = ipo.year)
to create better labels.
ggplotly( company_ipo_sector %>% group_by(ipo.year) %>% count(ipo.year) %>% rename(`num IPOs` = n, year = ipo.year) %>% ggplot(aes(x = year, y = `num IPOs`)) + geom_col(color = "cornflowerblue") + scale_x_continuous(breaks = scales::pretty_breaks(n = 20)) + theme(axis.text.x = element_text(angle = 90)) )
We see a big decline in 2008 due to the financial crisis, and a steady rise until 2014 when things jump, but that might be due to the fact that since 2014, not as many companies have had a chance to be delisted. I’ll leave it to an IPO maven to explain things further. I did come across this treasure trove of data on the IPO market for the curious. There’s a lot of interesting stuff in there, but one thing to note about this data source and others I stumbled upon is that IPO data tends to focus on companies with a certain market cap, generally greater than $50 million. We didn’t make any cutoff based on market cap, and thus will have more observations than you might find if you Google something like ‘number of IPOs in year XXXX’. For the curious, I’ll post how to create this market cap filter on linkedin, and more importantly, it does set off some neurons in my brain to think that researchers tend to focus on IPOs of a certain market cap. That usually means there’s weird data stuff going on in the ignored area, or it’s risky, or it’s not worth the time to institutional investors because of market structure issues - or any of a host of reasons to investigate the stuff that other people find unattractive.
Let’s get back on course and chart IPOs by sector by year. Instead of using count
, we’ll use add_count()
, which is a short-hand for group_by()
+ add_tally()
.
company_ipo_sector %>% group_by(ipo.year, sector) %>% select(ipo.year, sector) %>% add_count(ipo.year, sector) %>% slice(1) %>% filter(ipo.year > 2003) # A tibble: 193 x 3 # Groups: ipo.year, sector [193] ipo.year sector n <dbl> <chr> <int> 1 2004 Basic Industries 3 2 2004 Capital Goods 4 3 2004 Consumer Durables 1 4 2004 Consumer Non-Durables 2 5 2004 Consumer Services 14 6 2004 Energy 2 7 2004 Finance 8 8 2004 Health Care 13 9 2004 Miscellaneous 2 10 2004 Public Utilities 2 # … with 183 more rows
Now let’s take that data and pipe it to ggplot()
. I want to highlight sector differences by year, so will use the fill = sector
aesthetic mapping along with facet_wrap(~ipo.year)
. Let’s also save some room on the x-axis labels by removing the word Consumer
from the sector
column using mutate(sector = str_remove(sector, "Consumer"))
.
company_ipo_sector %>% group_by(ipo.year) %>% filter(ipo.year > 2003 & !is.na(sector)) %>% mutate(sector = str_remove(sector, "Consumer")) %>% count(sector) %>% ggplot(aes(x = sector, y = n, fill = sector)) + geom_col() + facet_wrap(~ipo.year) + theme(axis.text.x = element_text(angle = 90)) + labs(x = "")
Not perfect, but better. Looking at 2013 through 2019, it immediately jumps out that the Health Care and Finance sectors have the most IPOs. Let’s use between(ipo.year, 2004, 2019)
to cut down on the number of years and enforce three rows with facet_wrap(~ipo.year, nrow = 3)
.
We’ll also wrap our entire code flow in parentheses, and then pipe to ggplotly()
.
( company_ipo_sector %>% group_by(ipo.year) %>% filter(between(ipo.year, 2004, 2019) & !is.na(sector)) %>% mutate(sector = str_remove(sector, "Consumer")) %>% count(sector) %>% ggplot(aes(x = sector, y = n, fill = sector)) + geom_col() + facet_wrap(~ipo.year, nrow = 5) + theme(axis.text.x = element_text(angle = 90)) + labs(x = "") ) %>% ggplotly()
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.