How many random variables add up to a fixed total?

The other topics about aggregate modeling mostly focus on determining the distribution of the sum of a (usually random) number of random variables.

We are also often interested in the reverse question: how many random variables will it take to exceed a given total?

For example, we might want to answer the questions:

How many random people entering a lift will it take to exceed the maximum load allowed?
How many sales will a company need to make to reach its year-end target?
How many random exposures to a chemical will it take to reach the exposure limit?

Some questions like this are directly answered by known distributions, for example the Negative Binomial, Beta-Negative Binomial and Inverse Hypergeometric describe how many trials will be needed to achieve s successes for the binomial, beta-binomial and hypergeometric processes respectively. However, if the random variables are not 0 or 1 but are continuous distributions there are no distributions available that are directly useful.

The most general method is to use Monte Carlo simulation with a loop that consecutively adds a random sample from the distribution in question until the required sum is produced.

ModelRisk offers such a function called Vose Stop Sum. This can however be quite computationally intensive when the required number is large, so it would be useful to have some quicker methods available.

The table shown in the Aggregate distributions introduction gives us some identities that we can use. For example, the sum of n independent variables following a Gamma(a, b) distribution is equal to a Gamma(n*a, b). If we require a total of at least T, then the probability that (n-1) Gamma(a, b) variables will exceed T is 1-F_(n-1)(T) where F_(n-1)(T) is the cumulative probability for a Gamma(n*a, b). Excel has the GAMMADIST function that calculates F(x) for a Gamma distribution (ModelRisk has the function VoseGammaProb that performs the same task but without the errors GAMMADIST sometimes produces). The probability that n variables will exceed T is given by 1-F_n(T). Thus, the probability that it was the nth random variable that took the sum over the threshold is (1-F_n(T)) - (1-F_(n-1)(T)) = F_(n-1)(T) - F_n(T). You can therefore construct a model that calculates the distribution for n directly as shown in the following spreadsheet:

The same idea can be applied with the Cauchy, Chi Squared, Erlang, Exponential, Levy, Normal, and Student-T distributions.

The VoseStopSum function in ModelRisk implements this shortcut automatically.

How many random variables add up to a fixed total?

Navigation