Modelling an extreme value for a variable | Vose Software

# Modelling an extreme value for a variable

Imagine that we are building a bridge between two islands. The bridge must stand up to extreme weather events, like very high or powerful waves, and very high sustained winds or gusts. For example, it might be specified that the bridge must have a 90% probability of withstanding the highest sustained  (>10 minutes, for example) wind that might occur in the next one hundred years. Of course, we could be very unlucky: the highest wind of the century could occur tomorrow, and then with 10% probability it blows the bridge down! However, we can't build infinitely strong bridges and costs make us reach a specification compromise like the one above.

Since the wind speed at any moment is a continuous random variable, it follows that the greatest wind speed over the next century is also a continuous random variable. There are many such situations in which we wish to model not the entire range that a variable might take, but an extreme, either the minimum or maximum. For example, earthquake power impinging on a building - it must be designed to sustain the largest earthquakes with minimum damage within the bounds of the finances available to build it; maximum wave height for designing offshore platforms, breakwaters and dikes; pollution emissions for a factory to ensure that, at its maximum, it will fall below the legal limit; determining the strength of a chain, since it is equal to the strength of its weakest link; modelling the extremes of meteorological events since these cause the greatest impact. People have put a lot of effort into determining the distributions of these extremes for various situations, but it is often not easy. You can imagine that if, for example, we have only ten years of wind data, we will have to make some assumptions to estimate what the greatest wind speed of the century might be.

It is not just engineers that are interested in extreme values of a parameter (like minimum strength, maximum impinging force) because they are the values that determine whether a system will potentially fail. Insurance companies, for example, are also interested in the size of a claim from extreme events, like hurricanes and terrorist attacks.

The theory behind determining the extreme value distributions is as follows:

Let X be a random variable with cumulative distribution function F(x).

Let Xmax = MAX(X1, X2, ..., Xn) and Xmin = MIN(X1, X2, ..., Xn)

Then the cumulative distribution function of Xmax and Xmin are:

and

Substituting the cumulative distribution functions for each parent distribution and then letting n approach infinity gives the equations of each distribution's respective extreme value distribution.

The ExtValueMax distribution offered by ModelRisk is also frequently known as the Gumbel distribution, or the Extreme Value distribution. Actually, it is one of only three possible extreme value distributions. The other two distributions are a version of the Weibull distribution (the variable -X is Weibull distributed) and the Frechet distribution though the Frechet is not popularly used. They have the following cumulative distribution functions:

 Distributions for largest extreme Distribution CDF Type I (GumbelMax(a,b) = VoseExtValueMax(a,b) ) , , Type II (FrechetMax(a,b,c) ) , , Type III (Weibull-typeMax(a,b,c) ) , ,
 Distributions for smallest extreme Distribution CDF Type I (GumbelMin(a,b)) , , Type II (FrechetMin(a,b,c) ) , , Type III (Weibull-typeMin(a,b,c) ) , ,

The theory of extreme values says that the largest or smallest value from a set of values drawn from the same parent distribution tends to an asymptotic distribution that only depends on the tail of the parent distribution. The Gumbel distribution is the extreme value distribution for all parent distributions of the Exponential family, e.g. Exponential, Gamma, Normal, Lognormal, Logistic and itself. The Frechet distribution is the extreme value distribution for parent distributions of the form of Pareto, Student-t, Cauchy, log-Gamma and itself. The Weibull distribution is the extreme value distribution for Beta, Uniform and Weibull distributed variables but the convergence can be very slow.

As discussed above, the three standard extreme value distributions are the Gumbel, the Frechet (not directly available with ModelRisk - but Model Frechet.xls generates the distribution), and the Weibull.

The problem with all these extreme value distributions is that:

1. they only work for certain types of parent distributions,

2. they are only asymptotically correct, meaning that one needs to be considering the extreme of a potentially very large set of observations before the extreme distribution is a good model, and

3. the parameter values for these extreme distributions are also difficult to estimate, or even calculate if one knows the parent distribution very well.

At times, a more practical approach to determining the extreme value distribution is to first estimate the underlying parent distribution, and then simulate a set of observations from that distribution and determine at each iteration what the maximum (or minimum) of that set of observations is. The ModelRisk functions VoseLargest and VoseSmallest do this directly.

Thus, by running many iterations one arrives at a well-defined extreme distribution. A lot of iterations (probably several thousand) are needed to determine the extreme distribution well because simulation statistics like a maximum or minimum take a long time to stabilise.

The parameters of the Extreme Value distribution are usually determined by data fitting except in certain circumstances where the parent distribution is known and the relationship between its parameter values and the parameters values of the appropriate extreme value distribution are also known. Gumbel (1958) provides an old but still excellent treatise on extreme value theory.

Contagious extreme value distributions

Sometimes we are interested in the largest (or smallest) of a random number of random variables.  For example, the largest flood that might occur in a period, where the number of floods is random, and also the size of each flood is random. Other examples are earthquakes, explosions, stock price jumps, and accidents. Sometimes, neat mathematical solutions are available for modelling the extremes of such systems. For example, if the number of gas explosions in a period can be described by VosePoisson(l) and the intensity of an explosion is described by a shifted Exponential distribution (e.g. = c + VoseExpon(b) ), then the maximum explosion intensity is given by an Extreme Value distribution: = VoseExtValueMax(c+bLN(l), b). Example Model Contagious_extreme_value_distribution.xls demonstrates the result by simulation.

Similarly, if the number of explosions in a period can be described by VosePoisson(l) and the size of an explosion is described by a Pareto(q,a) distribution, then the maximum explosion intensity is given by a Frechet(0, al1/q, q) distribution. Care needs to be taken here in that one is assuming that the frequency of events and the event intensities are independent. For example, it is well-recognised that earthquake intensities are related to the number of earthquakes: the more earthquakes, the more gently released the tectonic plate energy, and thus the lower the earthquake intensities. Similar arguments can be made about floods. Kottegoda and Rosso (1998) provide plenty of excellent worked examples.