Deconstructing Crunchbase: Understanding Public Self-Reported DataJD Maturen, 2013/12/18, San Francisco, CA IntroductionIn this post we’ll take a stab at revising our expected startup outcomes based on the public self-reported funding and exit data over the last decade on Crunchbase. The first question to ask your data is: how much can I trust you? what bias went into your creation? As a worst case scenario the safe rule is that companies will only release data when a) there is some substantial gain to be had and b) the upside is much greater than the downside or c) there is a regulatory requirement. Given that, we can reasonably expect frequent reporting of favorable news, e.g. large exits, and underreporting of unfavorable news, e.g. companies dying. This raises the question, will we find Poincaré’s Baker and be able to determine what the true underlying data is? Or by looking at Crunchbase are we really just looking at our collective ideal, that which we as a whole want to be true? One further note before we dive in, Crunchbase was founded in late 2007. What further bias can we expect for companies which are on Crunchbase but were founded before then? ¶ Source & OverviewI grabbed 189,729 full company profiles via the Crunchbase API on November 12, 2013. The Yahoo Finance API was then used to grab market cap data on November 16, 2013. Of the 54,155 (29%) companies which have a founding date 48,137 were founded since 2003, the timespan we'll be looking at. Of those founded since 2003, 7,728 have raised money. ¶ The Crunchbase OddsIn order to revise our expectations we need to calculate the conditional odds of an exit along with the average value of said exit. Of the 4,051 companies that raised $1M or more since 2003: 43 (1.1%) have gone public and have an average current market cap of $5.8B; 49 (1.2%) were acquired for cash with an average valuation of $116M; and 430 (10.4%) were acquired for some mix of stock and cash at an average valuation of $51M. In order to calculate a proper liquid value of that last category we’d need to know the valuation of the acquiring company at the time of acquisition. That isn’t provided in this dataset. We’ll be generous and include it as if it were liquid. Aggregated that gives us a 12.9% chance of an average $529M exit. Plugging those numbers into the calculator gives us a net discounted value of $22,710 for a 0.1% stake in a random early stage funded startup. 90% of that value is from the IPO pool and, to once again reiterate the problem with averages, Facebook makes up 44% of the total value of all exits. Removing Facebook from the equation that 0.1% is worth $12,717. The $22k valuation is 27x larger than the previous analysis, driven up by a much smaller denominator — 4,000 companies instead of 60,000 – and a bigger IPO/unicorn pool — $250B instead of $140B. While the latter is hard to argue with, the former is by definition an underestimate of the true number of funded companies over the last decade. What else can we see in this data? ¶ Money Raisedmoney raised | companies | sum money raised ---------------+-----------+--------------------- ? | 40,409 | $0 $1,000 | 92 | $403,766 $10,000 | 1,114 | $41,940,530 $100,000 | 2,451 | $920,360,065 $1,000,000 | 2,812 | $9,301,408,021 $10,000,000 | 1,152 | $33,352,352,104 $100,000,000 | 103 | $21,407,386,660 $1,000,000,000 | 4 | $10,433,154,927 The above table is a log histogram of companies by the amount of money they have raised as well as the total amount invested in companies with that amount raised. The first row is exclusively companies with an unknown amount of money raised. The fourth row is read as “there are 2,451 companies founded since 2003 that raised between $100,000 and $999,999 and investors have put $920M into these companies.” We can see a few things here. Companies are normally distributed w/r/t to the log of the amount of money raised, with a mean of $1.3M raised. The peak of money invested (the largest sum column) is the next bracket up, in companies that have raised between $10M and $99M. The top 4 companies, each of which has raised over a billion dollars, make up 14% of all reported investments. ¶ Market Capmoney raised | companies | sum_market_cap | ipos ---------------+-----------+------------------+------ ? | 40,409 | $41,122,200,000 | 10 $1,000 | 92 | | 0 $10,000 | 1,114 | | 0 $100,000 | 2,451 | | 0 $1,000,000 | 2,812 | $4,816,500,000 | 6 $10,000,000 | 1,152 | $19,742,800,000 | 17 $100,000,000 | 103 | $72,530,200,000 | 17 $1,000,000,000 | 4 | $151,407,000,000 | 4 The above table shows total market cap as well as the number of IPOs, again on the log histogram of money raised. We can see a nice normal distribution of IPO counts but the bulk of all value created lies in not just the top bracket but the top company, Facebook. Removing the outlier it looks like: money raised | companies | sum_market_cap | ipos ---------------+-----------+------------------+------ ? | 40,409 | $41,122,200,000 | 10 $1,000 | 92 | | 0 $10,000 | 1,114 | | 0 $100,000 | 2,451 | | 0 $1,000,000 | 2,812 | $4,816,500,000 | 6 $10,000,000 | 1,152 | $19,742,800,000 | 17 $100,000,000 | 103 | $72,530,200,000 | 17 $1,000,000,000 | 3 | $31,107,000,000 | 3 Roughly a log normal distribution with the mode somewhere in the $100M raised bucket. ¶ Cash Acquisitionsmoney raised | companies | sum_cash_acq_value | cash_acqs ---------------+-----------+--------------------+----------- ? | 40,409 | $14,503,167,359 | 99 $1,000 | 92 | | 0 $10,000 | 1,114 | | 0 $100,000 | 2,451 | $8,500,000 | 7 $1,000,000 | 2,812 | $646,975,000 | 32 $10,000,000 | 1,152 | $4,629,500,000 | 18 $100,000,000 | 103 | $1,200,000,000 | 1 $1,000,000,000 | 4 | | 0 Here we can see a big peak in cash acquisitions in the $1-9M raised bucket but the bulk of the money spent was on fewer companies in the next bracket up. Past $1M raised, the cash acquisition rate is around 1% and due to the small number of companies that raise > $100M the cash acquisitions drop towards zero quickly. Of the 58 cash acquisitions of companies that had raised money, 37 also listed a valuation. Of those, 3 were for less than the amount of money raised and 5 were for 1-4x the amount raised. ¶ Other Acquisitionsmoney raised | companies | sum_other_acq_value | other_acqs ---------------+-----------+---------------------+------------ ? | 40,409 | $5,502,751,000 | 553 $1,000 | 92 | | 0 $10,000 | 1,114 | | 14 $100,000 | 2,451 | $105,235,000 | 85 $1,000,000 | 2,812 | $2,132,900,000 | 275 $10,000,000 | 1,152 | $17,174,270,000 | 152 $100,000,000 | 103 | $2,557,500,000 | 7 $1,000,000,000 | 4 | | 1 Other acquisitions follow a similar pattern though at a 10x higher acquisition rate. Of the 534 other acquisitions of companies that had raised money, only 114 listed a valuation. Of those, 12 were for less than the amount of money raised and 35 were for 1-4x the amount raised. ¶ Company Deathsmoney raised | companies | deadpooled ---------------+-----------+------------ ? | 40,409 | 441 $1,000 | 92 | 19 $10,000 | 1,114 | 142 $100,000 | 2,451 | 226 $1,000,000 | 2,812 | 173 $10,000,000 | 1,152 | 44 $100,000,000 | 103 | 1 $1,000,000,000 | 4 | 0 Companies die young. The peak of reported dead companies lags well behind the distribution of the entire populace. ¶ Relative Value of OptionsUsing the sum of acquisition and IPO valuations we can compute the relative value of different stage companies. money raised | shares necessary | to match value at $1m ---------------+---------------------- ? | 9.6944 $1,000 | 1.9017 $10,000 | 1.8790 $100,000 | 1.6041 $1,000,000 | 1.0000 $10,000,000 | 0.3162 $100,000,000 | 0.0320 $1,000,000,000 | 0.0017 So, if as an employee you had an offer from Company A that had raised $1M of 0.1% and you were looking to get Company B that had raised $10M to match it you’d want at least 0.03% (0.1% * 0.3162). Interestingly from this we can see that going from $1M to $10M raised only increases the value by about 3x, while going from $10M to $100M increases it by 10x, and $100M to $1B increases it by 19x. However, if we discount the outlier company, Facebook, we significantly lower the value of those higher tiers: money raised | shares necessary | to match value at $1m ---------------+---------------------- ? | 1.0575 $1,000 | 1.0000 $10,000 | 1.0000 $100,000 | 1.0000 $1,000,000 | 1.0000 $10,000,000 | 0.8774 $100,000,000 | 0.5419 $1,000,000,000 | 0.1526 Going from $1M to $10M raised increases values by only 15%, while $10M to $100M and $100M to $1B only increase by 60% and 350% respectively. Intuitively this seems massively wrong. One possible explanation for this would be if Crunchbase significantly underestimates the number of early stage companies. ¶ Age At Acquisitionyears old | # | w/ val. | 25th percentile | 75th percentile at acq | | | | -----------+-------+---------+-----------------+----------------- 0 | 95 | 16% | $5,000 | $26,300,000 1 | 240 | 16% | $10,000,000 | $90,000,000 2 | 257 | 14% | $4,800,000 | $90,000,000 3 | 180 | 17% | $13,950,000 | $250,000,000 4 | 158 | 18% | $25,000,000 | $190,000,000 5 | 111 | 31% | $10,000,000 | $170,000,000 6 | 84 | 32% | $31,000,000 | $300,000,000 7 | 51 | 29% | $30,000,000 | $119,000,000 8 | 29 | 17% | $24,000,000 | $59,000,000 9 | 8 | 38% | $47,500,000 | $775,000,000 The above table shows the distribution of acquired companies by the age of the company at the time of the acquisition as well as the percent of companies each year that had an acquisition valuation. For the 25% of companies that had both an acquisition date and valuation the 25th and 75th percentile valuations are listed. The other 75% of companies are presumed to have been acquired at a loss or break even. The easiest thing to see here is that acquisitions peak at 2 years old. This could be explained by a few different factors. Perhaps it is a reflection of the underlying distribution of company ages and companies die at a similar rate. Or companies may choose to be acquired instead of raising more money and/or they may be unable to raise money but are able to get acquired. It is also worth noting that there is not a super clear relationship between age and valuation. Though between 4 and 5 years old there is a big uptick in the number of companies acquired with a listed valuation. This may just be an artifact of survivor bias as those companies are more likely to have been founded pre-Crunchbase. ¶ EpilogueCrunchbase paints a much rosier, though sometimes confusing, picture of startup value than our previous analysis. It reinforces several of the presumed shapes of the startup landscape: that companies die young and get acquired young, that investment and returns follow a roughly log-normal distribution, and that while the $75B invested in the companies we’ve looked at has only generated $48B in acquisitions it has helped create $250B worth of market cap. As employees of venture-backed startups we can use this data to get a more realistic expectation of the value of our options. As founders and as managers we may be able to use this data to come up with more equitable options schemes, particularly as our companies age. I’d be particularly curious to see how the relative valuation of options from this analysis stacks up against the typical math used for options or the empirical distribution of options. ¶ Appendix A |