Recursive formulas for discrete distributions | Vose Software

# Recursive formulas for discrete distributions

The Binomial(106, 10-6) distribution is shown below:

Though in theory it extends from 0 to 106, in practice it is very much constrained to the lowest values. Calculating the probabilities for each possible value of this distribution is a problem, since one needs to evaluate 106!, which is beyond the capacity of most computers (Excel will calculate factorials up to 170, for example). The Stirling formula can be used to obtain a very good approximation of high factorials, but we still end up with the problem of having to deal with manipulating very high numbers. An easier approach is to use recursive formulae. These formulae relate the equation for the probability of the (i+1)th value to the probability of the ith value. Then one simply has to calculate the probability of any one value explicitly (a value chosen to give the simplest calculation) and then use the recursive formula to determine all other probabilities.

The binomial probability mass function gives:
and

Thus,

i.e.:

The binomial probability of zero successes is easily calculated:

so p(1), p(2), etc. can be readily determined using the recursive formula:

If the binomial probability in this example had been extremely high, say 0.999 99, instead of 0.000 01, we would use the same technique but calculate backwards from p(1 000 000), i.e.

and work backwards using

Here are other useful recursive formulae for some of the most common discrete probability distributions:

 Poisson Negative Binomial Geometric Hypergeometric

The formula for p(0) for the Hypergeometric distribution is unwieldy but can still be very accurately approximated without resorting to factorial calculations by using the Stirling formula to give:

This last formula will usually have to be calculated in log space first, then converted to real space at the end in order to avoid intermediary calculations of very large numbers. Another formula for p(0) which is only very slightly less accurate for larger M and generally more accurate for small n is provided by Cannon and Roe (1982):

(1)

This formula is produces by expanding the factorials for the equation for p(0) and cancelling common factors top and bottom to give:

(2)

We then take the first and last terms in each of the numerator and denominator, average the two terms and raise to the power of the number of terms (D) top and bottom:

(3)

An alternative formulation is obtained by swapping around n and D, to give:

This works since Equation (2) is symmetric in n and D, i.e. if n and D swap places the equation remains the same. Equation (1) is a better approximation when n > D and Equation (3) is better when n < D.