Number of hypergeometric samples to get a specific number of successes | Vose Software

# Number of hypergeometric samples to get a specific number of successes

Consider the situation where we are sampling without replacement from a population M with D items with the characteristic of interest until we have s items with the required characteristic. The distribution of the number of failures we will have before the s success can be easily calculated in the same manner as we developed the Negative Binomial distribution. The probability of observing (s-1) successes in (x+s-1) trials (i.e. x failures) is given by direct application of the Hypergeometric distribution:

The probability p of then observing a success in the next trial (the (s+x)th trial), is simply the number of D items remaining (=D-(s-1)) divided by the size of the population remaining (= M-(s+x-1)):

and the probability of having exactly x failures up to the sth success, where trials are stopped at the sth success, is then the product of these two probabilities:

This is the probability mass function for the Inverse Hypergeometric distribution InvHypergeo(s,D,M) and is analogous to the Negative Binomial distribution for the binomial process and the Gamma distribution for the Poisson process. So:

n = s + VoseInvHypergeo(s,D,M)

For a population M that is large compared to s, the Inverse Hypergeometric distribution is closely approximated by the Negative Binomial:

InvHypergeo(s,D,M) » NegBin(s,D/M)

and if the probability D/M is very small:

InvHypergeo(s,D,M) » Gamma(s,M/D)

The four figures below show examples of the Inverse Hypergeometric distribution. In the first figure you can see the probability mass function of the number of failures before getting 4 successes when drawing samples from a population 50 in which 5 individuals have the characteristic you are interested in. We leave to you the task to explain in words the figures 2 - 4.

An Inverse Hypergeometric distribution shifted k units along the domain is sometimes called a Negative Hypergeometric distribution.