Venture Capital Deals in 2016 – An Overview (1/2)

[This article was first published on English – R-blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Over the year 2016, I manually gathered press clippings announcing Venture Capital (VC) deals from various online or newsletters public sources each time I bumped into something that caught my attention. Early January, I then put together and cleaned the data and made it R-usable as a csv dataset of 1,720 different deals. The dataset comprises info about VC deals that took place in 50 different countries summing to a total value of approx. 22.3b US$ over 10 different funding rounds (from angel to series F). However, the dataset should not be considered as a representative sample (in the statistical sense) that accurately reflects the 2016 VC deals universe. The caveat being clearly stated, I nonetheless found interesting digging a bit deeper into the data, at least to have some kind of flavor of VC deals during last year. Expectedly, such flavor reflects more what captured my attention than anything else.


1. Investments

1.1. Overview

A quick visualsation of the data shows 3 important points:

  1. More than 40% of all deals in the dataset have a value lower than 7.5m US$ –actually, based on my experience in the investment industry, I am personally convinced that the value of the vast majority of deals is lower than 0.5m US$ but perhaps smaller deals do not attract media’s attention and should be found through many of the ad hoc monitoring services available online for a (high) fee rather than through my own cherry-picking approach.
  2. In all industries but pharma there is a much higher number of smaller deals than bigger ones. In pharma and, in a lesser extent, in healthcare or biotech the distribution of deals is more uniform, from smaller to very big deals. This is a known phonomenon: to bring innovation to life in pharma, healthcare and biotech it usually takes more resources (time and funding), especially when heavy capital expenditure is needed.
  3. Despite several outliers, the value of deals increases from an investment round to the next one, until series D. From series D onwards, the median value does not vary much, although variability (interquartile range) of series D is much higher than of series E and F.

ANOVA test of value of deal ~ industry:

  • F Value = 2.9164321
  • Pr(>F) = 0.0031213


ANOVA test of value of deal ~ funding round:

  • F Value = 62.9453751
  • Pr(>F) = 0


1.2. Rankings

It is interesting to underline that the rankings of countries by number of deals, by total value of deals and, therefore, by average value per deal are quite different.


a. Countries Ranked by Number of Deals

CountryNumber of VC deals


b. Countries Ranked by Total Value of Deals

CountryTotal value of VC deals (US$)
united arab emirates362,700,000


c. Countries Ranked by Average Value of Deals

CountryAverage value of VC deals (US$)
united arab emirates72,540,000


At this point, it is quite easy to verify that the difference in rankings is caused (amongst others) by the country variation of the number of deals in each funding round, knowing that each funding round has a different mean value of deal funding.



1.3. Dominant Industry

I was also curious to see where investments go (in terms of industry) in each country. Rather than the absolute number of deals, the total amount of investment in each industry seems a better indicator. I am pretty convinced that this map would be (very) different if referred to any other year. But, I think the main caveat with this approach is the reliability of the deal allocation per industry in the dataset: many deals are ambivalent and could very well be classified under several different industries. Let’s take the example of a VC deal to support a new digital imaging system for agricultural purposes to be embarked on commercially available drones –under which industry would it fall ? agriculture ? ict ? ar (augmented reality) ? any other category ? There is no right or wrong answer here (as long as consistency is ensured all over the data collection phase) but this has a strong impact on the visualisation and the conclusions that can or cannot be drawn from it.


A similar approach could be adopted to spot, for example, the dominant funding round in each country present in the dataset.



2. Investors

Out of the 1,720 press clippings in the dataset, 1,433 provide info on the investors participating to the deals, for a total count of 2,804 unique investors. Many deals are closed by several investors (1,074 deals have more than 1 investor), and many investors are involved in more than a single deal.



InvestorNumber of VC deals


Overall, the number of investors per deal seems to be impacted by the funding round; however, it does not depend on the industry.


ANOVA test of number_of_investors per deal ~ funding round:

  • F Value = 7.1950899
  • Pr(>F) = 0


ANOVA test of number_of_investors per deal ~ industry:

  • F Value = 0.7514098
  • Pr(>F) = 0.6459718


3. More to come…

My original objective in saving press clippings rather than just facts & figures over a year was to run some text mining and analysis exercise. More on that in the next post.

The dataset and complete R code will be made available for download at the end of the second part of this case study.

To leave a comment for the author, please follow the link and comment on their blog: English – R-blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)