The Three Extreme Value Distributions: An Introductory Review

The statistical distribution of the largest value drawn from a sample of a given size has only three possible shapes: it is either a Weibull, a Fr\'echet or a Gumbel extreme value distributions. I describe in this short review how to relate the statistical distribution followed by the numbers in the sample to the associate extreme value distribution followed by the largest value within the sample. Nothing I present here is new. However, from experience, I have found that a simple and compact guide on this matter written for the physics community is missing.


I. INTRODUCTION
Extreme value statistics offers a powerful tool box for the theoretical physicist. But it is the kind of tool box that is not missed before one has been introduced to it -perhaps a little like the smart phone. It concerns the statistics of extreme events and it aims to answer questions like "if the strongest signal I have observed over the last hour had the value x, what would the strongest signal expected to be if measured over hundred hours?" Furthermore, if I divide up this hundred-hour interval into a hundred one-hour intervals, what would be the statistical distribution of strongest signal in each onehour interval?
It is the latter question which is the focus of this minireview.
There is no lack of literature on extreme value statistics, see e.g. [1][2][3][4][5] or simply google the term. We find it used in connection with spin glasses and disordered systems [6], in connection with 1/f noise [7], in connection with optics [8], in connection with the fiber bundle model [9], etc. There are plenty of examples from diverse fields of physics.
So, there is no lack of material for the novice that has seen a need for this tool. The problem is that it is not so easy to penetrate the literature, which is often cast in a rather mathematical language which takes work to penetrate. The aim of this mini-review is to present the theory behind and the main results concerning the extreme value distributions in a simple and compact way. We will present nothing new. For a longer, wider and more detailed review of extreme value statistics, Fortin and Clusel [10] or Majumdar et al. present exactly that [11].
We have a statistical distribution p(x) and its associated cumulative probability which is the probability to find a number smaller than or * Electronic address: Alex.Hansen@ntnu.no equal to x. We draw N numbers from this distribution and record the largest of the N numbers. We repeat this procedure M times and thereby obtain M largest numbers, one for each sequence. What is the distribution of these M largest numbers in the limit when M → ∞, which then defines the extreme value distribution? It turns out that depending on p(x), the extreme value distribution will have one of three functional forms: • The Weibull cumulative probability where we assume α > 0. Note that Φ(−∞) = 0. The corresponding Weibull extreme value distribution is • The Fréchet cumulative probability Also here we assume α > 0. Note that Φ(∞) = 1. The Fréchet extreme value distribution is • The Gumbel cumulative probability where −∞ < u < ∞, so that Φ(−∞) = 0 and Φ(∞) = 1. The corresponding Gumbel extreme value distribution is given by The questions are 1. which classes of distributions p(x) lead to which of the three extreme value distributions and 2. what is the connection between x and u in each case? It turns out that (8), lead to the Weibull extreme value distribution, • and distributions where p(x) falls of faster than any power law as x → ∞, see equation (53) lead to the Gumbel extreme value distribution.
Furthermore, we will find that • for the Weibull extreme value distribution, u is given in terms of x in equation (13), • for the Fréchet extreme value distribution, u given in terms of x in equation (28), • for the Gumbel extreme value distribution, u is given in terms of x in equations (51) and (43).
The discussion that will now follow, will be built on the following relation. When drawing N numbers from the probability distribution p(x), the cumulative probability for the largest value is the probability that all N values are either smaller than or equal to it. This probability is P (x) N . Our task is to figure out the asymptotic shape of P (x) N → Φ(u) as N → ∞, and what is u = u(x) as we approach this limit.
This is equivalent to p(x) having the form where b is positive. We note that 0 < α < 1 leads to a diverging probability density as x → x − 0 . We furthermore note that α = 1 implies that p(x) approach a constant when x → x − 0 -which for example is the case when the distribution is uniform. The corresponding cumulative probability is given by The extreme value cumulative probability for N samplings is given by for x → x − 0 . We introduce the variable change Equation (12) then becomes In the limit of N → ∞, this becomes for negative u. Hence, we have that which is the Weibull cumulative probability, valid for all values of u even though we only know the behavior of p(x) close to x 0 . The Weibull probability density is given by We note that the Weibull distribution resembles a stretched exponential. This is correct for α < 1. However, α ≥ 1 is much more common in the wild.
We express the Weibull cumulative probability in terms of the original variable x using equation (13), (18) Hence, in terms of the original variable x, the Weibull extreme value distribution becomes

A. Weibull: An Example
We now work out a concrete example. Let us assume that p(x) is given by i.e., b = 1 and x 0 = 1 in equation (10). The cumulative probability is then The curve that has its maximum at x = 0 is the probability distribution (20) with α = 3. The curve that has its maximum in the middle isφ(x), equation (22) with N = 100 and the curve that has its maximum to the right is φ(x) with N = 1000.

From equation (19) and we have that
We show the distribution (20) with α = 3 together with the corresponding extreme value distributions for N = 100 and N = 1000, equation (19) in figure 1.
Using a random number generator producing numbers r uniformly distributed on the unit interval, we may stochastically generate numbers that are distributed according to the probability density p(x) given in (20). We do this by inverting the expression P (x) = r, where the cumulative probability is given by (21). Hence, we have where we have also used that r may be substituted for 1−r in (21). We generate a sequence of sequences of numbers using this algorithm, each sequence having length N . We then identify the largest value within each sequence. We chose N = 100 and N = 1000, in each case generating 10 7 such sequences. The histograms based on the random numbers themselves, and of the extreme values for each sequence of length either 100 or 1000 we show in figure 2. This figure should be compared to figure 1. The Weibull distribution, equation (17) is much used in connection with material strength [13]. This is no coincidence. Consider a chain. Each link in the chain can sustain a load up to a certain value, above which it fails. This maximum value is distributed according to some probability distribution. When the chain is loaded, it will be the link with the smallest failure threshold that will break first causing the chain as a whole to fail. Hence, the strength distribution of an ensemble of chains is an extreme value distribution, but with respect to the smallest rather than the largest value. The link strength must a positive number. Hence, the link strength distribution is cut off at zero or some positive value. The distribution close to this cutoff value must behave as a power law in the distance to the cutoff, e.g. due to a Taylor expansion around the cutoff. The corresponding extreme value distribution, which is the chain strength distribution, must then be a Weibull distribution.

III. FRÉCHET CLASS
We now assume that the probability distribution p(x) whose associated cumulative probability behaves as where k > 0 and α ≥ 0. This means that p(x) behaves as and the corresponding cumulative probability behaves as The extreme value cumulative probability for N samplings is given by for x → ∞. We introduce the variable change We now plug this change of variables into equation (27) to find In the limit of N → ∞, this becomes where u ≥ 0 is given by equation (28). We see that Φ(u) → 0 as u → 0 + . Furthermore, for u < 0, the function is no longer real. Hence, we define Φ(u) = 0 for u < 0. The ensuing extreme value cumulative probability is then given by which is the Fréchet cumulative probability. The Fréchet probability density is given by We express the Fréchet cumulative probability in terms of the original variable x using equation (28), Hence, in terms of the original variable x, the Fréchet extreme value distribution becomes

A. Fréchet: An Example
We consider the distribution The corresponding cumulative probability is given by Using equation (34), we find the corresponding Fréchet extreme value distribution to bẽ valid for all x > 1. We show p(x) and the corresponding φ(x) for α = 3 and N = 100 and N = 1000 in figure 3.
In order to compare with numerical results, we generate numbers distributed according to (35) by solving the equation P (x) = r where r is drawn from a uniform distribution on the unit interval. From equation (36), we get We generate a sequence of numbers using this algorithm, grouping them together in sequences of N = 100 or N = 1000. We generate 10 7 such sequences. The histograms based on the random numbers themselves generated with equation (  The curve that has its maximum at x = 1 is the probability distribution (35) with α = 3. The curve that has its maximum in the middle isφ(x), equation (37) with N = 100 and the curve that has its maximum to the right is φ(x) with N = 1000. The histograms shown here are based on data according to the probability distribution (35) with α = 3. The histogram having its maximum to the left shows all the generated data. The histogram having its maximum in the middle shows the largest number among each sequence of numbers of length 100, and the histogram having its maximum to the right shows the largest number among each sequence of numbers of length 1000. For each sequence length, 10 7 such sequences were generated.

IV. GUMBEL CLASS
We now assume we have a probability distribution that takes the form where f ′ (x) = df (x)/dx. We have that x 0 is any number, positive or negative, and f (x) is an increasing function with x. The cumulative probability is then We do not care about the form of p(x) or P (x) for x ≤ x 0 .
The extreme value cumulative probability for N samplings is given by for x > x 0 . We introduce the variable changẽ where x N is given by From equation (40) we then have that Let us now define We then expand f (x) around x N , where so that the first order term in the expansion becomes constant as N increases, we will have that then in this limit, we will find where we define Here we have used equations (40) and (44). If we combine equation (49) for n = 2 with equations (39) and (40), we find which is equivalent to Equation (53) is in fact a sufficient condition for (49) to hold for all n > 1. We may show this through induction.
We have that If condition (52) is fulfilled, that is when the expression above is zero in the limit x → ∞ for n = 2, we also have that since both terms on the right hand side of equation (54) are zero in this limit. We now assume equation (49) to be true for some n > 3. We then have that again due to both terms on the right hand side of equation (54) are zero in this limit. This completes the proof. We now combine equations (42) with equation (41) to find In the limit of N → ∞, this becomes which is the Gumbel cumulative probability. Here −∞ < u < ∞. The Gumbel probability density is given by We express the Gumbel cumulative probability in terms of the original variable x using equation (51), Hence, in terms of the original variable x, the Gumbel extreme value distribution becomes A. An Example: the Gaussian Here is an example: the gaussian. The gaussian probability density is given by where σ is the square of the standard deviation. The cumulative probability is where erf(x) is the error function. In order to verify that the gaussian generates the Gumbel extreme distribution, we use the sufficient condition (53), The gaussian cumulative probability in equation (63) has the asymptotic form for large x. We determine x N solving equation (43) using this asymptotic form. We find where W (z) is the Lambert W function, also known as the product logarithm, which is the solution to the equation W (z) exp[W (z)] = z. For large arguments, it approaches the natural logarithm. This gives us when inserting the expression for x = x N , equation (69) into equation (62). Thus we may now express the variable u in the Gumbel cumulative probability (57) in terms of the variables x, σ and N using equation (51), We show in figure 5 the gaussian and the corresponding Gumbel distributions for σ = 1 and N = 100 and N = 1000. We find that x 100 = 2.375 and x 1000 = 3.115. These are the confidence intervals for 99% and 99.9%.
We show in figure 6 a histogram based on numbers distributed according to a gaussian distribution using the Box-Müller algorithm [12]. These numbers were grouped together in sets of either N = 100 or N = 1000 elements. I generated 10 7 such sets. The figure displays the two extreme distributions for the two set sizes. This figure should be compared to figure 5. In contrast to the two other extreme value distributions, we see that there are visible discrepancies between the calculated Gumbel distributions in figure 5 and the extreme value histograms in figure 6. We see furthermore that the histogram for  N = 1000 is closer to the calculated Gumbel distribution than the histogram for N = 100. This is due to the very slow convergence induced by the Lambert W functions. Slow convergence is typical for the Gumbel extreme value distributions. This slow convergence has been analyzed and recently and through clever use of scaling methods remedied [14].

V. CONCLUDING REMARKS
We have only discussed the distributions associated with the largest values of x except for the Weibull extreme value distribution, section II. It is, however, easy to work out: just transform x → −x. Otherwise, the story presented here is rather complete.
There is one remark that needs to be made, though. In the derivation of the Gumbel extreme value distribution, section IV, we defined a variable x N in equation (43). First of all, x N defined in equation (43) may be calculated for any cumulative probability P (x) and it has an interpretation making it very useful.
The probability density for the largest among N numbers drawn using the probability distribution p(x) is given by We calculate the average of the cumulative probability P (x) for the extreme value based on N samples, For large N , we may write this as using here equation (43). Hence, we may interpret x N as the x value corresponding to the average confidence interval of the largest observed value in sequences of N numbers. It is essentially the typical size of the extreme value for a sample of size N .