The Surprising Frequency of Low Probability Events
Damon Levine, CFA, ARM, MA
Experienced Risk, ERM, and Operational Risk leader Specialized experience in Open FAIR, TPRM, model risk, BC/DR, and strategic risk management
Yes, that picture is from The Black Swan and not entirely relevant...but if you are involved in risk modeling or risk management it was crucial to make sure you got to this page!
Expecting the Unexpected and an Appearance from e
Consider a very unlikely event with annual probability 1/n. Perhaps this event is regarded as a “deep tail” scenario and an event of such impact does not exist in the historical record. This means we may think of n as a large number such as 100 or 500. In a given year the probability the event does not occur is 1-1/n. Assuming year to year independence the probability that it does not occur over a period of n years is (1-1/n) ^n. As n approaches infinity this expression approaches 1/e ≈ 0.37 and therefore, for large n, the probability the event does occur in n years is approximately 1 – 1/e ≈ 0.63.
This convergence is relatively fast so the value of n need not be very large for the approximation to work well. This means that for an event with (annual) probability of 1/50, in 50 years the probability of the event occurring is approximately 1 - 1/e or about 63%. Similarly, the same can be said for an event with probability of 1/100 over a 100 year time horizon. This approximation works well for large n (e.g. n>50) and any “1 in n year” event with year to year independence.
We now turn to the related question. Assume we have N independent, potential events each having probability of p. In a year what is the probability of at least one of them occurring? Using similar reasoning the answer is 1 – (1-p)^N. If N= 20 and p=1/100 (so the events are regarded as having low likelihood) the probability of at least one occurring in a single year is about 0.18 or 18%. In 10 years the probability that at least one occurs is about 1 – (1 – 0.18)^10 or 86%.
The practical take-away from this is that if an ERM framework identifies 20 low likelihood risks which are independent (or “close enough” from a practical viewpoint) then in a given year we should not be very surprised if one of these “unlikely” events occurs. Furthermore, over a relatively long time horizon we should really be ready for it!
Modeling Black Swans: A Rebuttal to Nassim Taleb’s Ludic Fallacy
The ludic fallacy is a term coined by Nassim Taleb in his 2007 book The Black Swan. He uses this term to invoke the Latin noun ludus referring to games or play. He views mathematical models that attempt to forecast or quantify future results as deeply flawed and doomed to fail. He goes on to say that statistical models are better left for casino gambling and other well defined games of chance. He dismisses models based on empirical data as flawed because, in his view, they are not be able to predict large impact events which have not been previously observed. In other words, he eschews nearly all mathematical business models because, he claims, they cannot model black swans. Taleb may have been overreaching on this last point.
The above is certainly true for many models but it may be due to a flaw with the model parameters or chosen modeling approach. Indeed it is possible to make statistically sound inferences about future observations which are worse than any previously seen. Two possible approaches to such modeling are the use of Extreme Value Theory (EVT) and the application of Chebyshev’s inequality.
EVT is a branch of statistics dealing with extreme deviations from the median of probability distributions. Under very general conditions one of EVT’s mains results, the Pickands-Balkema-de Hann theorem (PBH), describes observations above a high, fixed threshold as a generalized Pareto distribution (GPD). Given a set of historical data, one may choose a high threshold T within that data (e.g., 95th percentile) and then examine the excesses above T for the subset of observations greater than or equal to T. That collection of distances or excesses (positive real numbers) can be well modeled as a GPD which contains two parameters which are relatively easy to estimate. The resulting GPD is capable of modeling, in a statistically sound manner, the potential magnitude and likelihood of future observations which are worse than any previously seen. The PBH theorem applies to an extremely large family of distributions and an example of this approach applied to corporate bond returns can be found in the CIA/CAS/SOA September 2009, Joint Risk Management Section newsletter.
Chebyshev’s inequality is another result which is powerful in that very few assumptions are needed for its application. For any random variable X with finite expected value μ and finite non-zero standard deviation σ we have for any real number k>0,
P (|X - μ| ≥ k σ) ≤ 1/k^2
The inequality is only useful for k >1 because otherwise the right hand side is larger than 1 and it only says the probability is bounded above by 1. For k>1 is states that the probability of a realization of X being at least k standard deviations away from the mean is at most 1/k^2. For example, for an arbitrary random variable with finite expected value and finite non-zero variance (“typical”), we can make the practical statement that for large sample sizes the observed portion of observations 3 or more standard deviations away from the mean is at most 1/9. This allows us to assign probabilities for observations in various tails. Many of us have intuition regarding tails that has been shaped by the familiar bell curve of the normal distribution. (The pun was only somewhat intentional.) The upper bound of 1/k^2 in the inequality helps us understand the possibly higher likelihood of deep tail events for unspecified but typical distributions. We can therefore avoid what could be called a “Gaussian bias”.
In many cases we are interested solely in the events from the left tail and the following one sided version of the inequality may be used for typical distributions and any k>0:
P (X ≤ μ - k σ) ≤ 1/(1+k^2)
Use of either PBH or Chebyshev’s inequality in risk models allows for modeling of black swans in a rigorous manner. PBH tends to provide better results when a large data set is available.
A good model is a map of sorts. It helps one to understand a particular area of interest: the “territory”. The model is not meant to fully capture reality any more than a map reflects all details of the territory. The Polish-American scientist and philosopher Alfred Korzybski remarked that "the map is not the territory". Louis Carroll put things somewhat less seriously in Sylvie and Bruno Concluded, with his description of a fictional map that had "the scale of a mile to the mile". A character notes some practical difficulties with such a map and states that "we now use the country itself, as its own map, and I assure you it does nearly as well." The (serious) point to remember is that a model can be useful even though it does not perfectly mimic reality or perhaps comes up short in black swan prediction.
In his book, The Failure of Risk Management, Douglas Hubbard makes a good point when he cites this quote of DC based consultant Jerry Brashear: “A successful model tells you things you didn’t tell it to tell you.” There is nothing to prevent a model from offering surprises to its own builders even though the model assumptions, parameters, and logic were well known before the model was run. Hubbard describes a model he produced for the Marine Corps which used stochastic modeling to forecast battlefield fuel usage. The model provided the far from obvious result that road conditions on the main supply routes were much better predictors of fuel use than the chance of enemy contact. He mentions that “such revelations are even more profound (and helpful) when we use empirical measurements that themselves had surprising results.”
note: the preceding is an excerpt from ERM at the Speed of Thought: Mitigation of Cognitive Bias in Risk Assessment, available on ermvalue.com