Over the year 2016, I manually gathered press clippings announcing Venture Capital (VC) deals from various online or newsletters public sources each time I bumped into something that caught my attention. Early January, I then put together and cleaned the data and made it R-usable as a csv dataset of 1,720 different deals. The dataset comprises info about VC deals that took place in 50 different countries summing to a total value of approx. 22.3b US$ over 10 different funding rounds (from angel to series F). However, the dataset should not be considered as a representative sample (in the statistical sense) that accurately reflects the 2016 VC deals universe. The caveat being clearly stated, I nonetheless found interesting digging a bit deeper into the data, at least to have some kind of flavor of VC deals during last year. Expectedly, such flavor reflects more what captured my attention than anything else.
A quick visualsation of the data shows 3 important points:
- More than 40% of all deals in the dataset have a value lower than 7.5m US$ –actually, based on my experience in the investment industry, I am personally convinced that the value of the vast majority of deals is lower than 0.5m US$ but perhaps smaller deals do not attract media’s attention and should be found through many of the ad hoc monitoring services available online for a (high) fee rather than through my own cherry-picking approach.
- In all industries but pharma there is a much higher number of smaller deals than bigger ones. In pharma and, in a lesser extent, in healthcare or biotech the distribution of deals is more uniform, from smaller to very big deals. This is a known phonomenon: to bring innovation to life in pharma, healthcare and biotech it usually takes more resources (time and funding), especially when heavy capital expenditure is needed.
- Despite several outliers, the value of deals increases from an investment round to the next one, until series D. From series D onwards, the median value does not vary much, although variability (interquartile range) of series D is much higher than of series E and F.
ANOVA test of value of deal ~ industry:
- F Value = 2.9164321
- Pr(>F) = 0.0031213
ANOVA test of value of deal ~ funding round:
- F Value = 62.9453751
- Pr(>F) = 0
It is interesting to underline that the rankings of countries by number of deals, by total value of deals and, therefore, by average value per deal are quite different.
a. Countries Ranked by Number of Deals
|Country||Number of VC deals|
b. Countries Ranked by Total Value of Deals
|Country||Total value of VC deals (US$)|
|united arab emirates||362,700,000|
c. Countries Ranked by Average Value of Deals
|Country||Average value of VC deals (US$)|
|united arab emirates||72,540,000|
At this point, it is quite easy to verify that the difference in rankings is caused (amongst others) by the country variation of the number of deals in each funding round, knowing that each funding round has a different mean value of deal funding.
1.3. Dominant Industry
I was also curious to see where investments go (in terms of industry) in each country. Rather than the absolute number of deals, the total amount of investment in each industry seems a better indicator. I am pretty convinced that this map would be (very) different if referred to any other year. But, I think the main caveat with this approach is the reliability of the deal allocation per industry in the dataset: many deals are ambivalent and could very well be classified under several different industries. Let’s take the example of a VC deal to support a new digital imaging system for agricultural purposes to be embarked on commercially available drones –under which industry would it fall ? agriculture ? ict ? ar (augmented reality) ? any other category ? There is no right or wrong answer here (as long as consistency is ensured all over the data collection phase) but this has a strong impact on the visualisation and the conclusions that can or cannot be drawn from it.
A similar approach could be adopted to spot, for example, the dominant funding round in each country present in the dataset.
Out of the 1,720 press clippings in the dataset, 1,433 provide info on the investors participating to the deals, for a total count of 2,804 unique investors. Many deals are closed by several investors (1,074 deals have more than 1 investor), and many investors are involved in more than a single deal.
|Investor||Number of VC deals|
Overall, the number of investors per deal seems to be impacted by the funding round; however, it does not depend on the industry.
ANOVA test of number_of_investors per deal ~ funding round:
- F Value = 7.1950899
- Pr(>F) = 0
ANOVA test of number_of_investors per deal ~ industry:
- F Value = 0.7514098
- Pr(>F) = 0.6459718
3. More to come…
My original objective in saving press clippings rather than just facts & figures over a year was to run some text mining and analysis exercise. More on that in the next post.
The dataset and complete R code will be made available for download at the end of the second part of this case study.