Multivariate Hypergeometric distribution | Vose Software

# Multivariate Hypergeometric distribution

Format: MultiHypergeo(n, {D})

The Multivariate Hypergeometric distribution is an extension of the Hypergeometric distribution where more than two different states of individuals in a group exist.

## Example

In a group of 50 people, of whom 20 were male, a VoseHypergeo(10,20,50) would describe how many from ten randomly chosen people would be male (and by deduction how many would therefore be female). However, let's say we have a group of 10 people as follows:

 German English French Canadian 3 2 1 4

Now let's take a sample of 4 people at random from this group. We could have various numbers of each nationality in our sample:

 German English French Canadian 3 1 0 0 3 0 1 0 3 0 0 1 2 2 0 0 2 1 1 0 2 1 0 1 2 0 2 0 2 0 1 1 2 0 0 2 ... ... ... ... Etc.

and each combination has a certain probability. The Multivariate Hypergeometric distribution is an array distribution, in this case generating simultaneously four numbers, that returns how many individuals in the random sample came from each sub-group (e.g. German, English, French, and Canadian).

## Generation

The Multivariate Hypergeometric distribution is created by extending the mathematics of the Hypergeometric distribution. For the Hypergeometric distribution with a sample of size n, the probability of observing s individuals from a sub-group of size M, and therefore (n - s) from the remaining number (M - D):

and results in the probability distribution for s:

where M is the group size, and D is the sub-group of interest. The numerator is the number of different sampling combinations (each of which has the same probability because each individual has the same probability of being sampled) where one would have exactly s from the sub-group D (and by implication (n-s) from the sub-group (M-D). The denominator is the total number of different combinations of individuals one could have in selecting n individuals from a group of size M. Thus the equation is just the proportion of different possible scenarios, each of which has the same probability, that would give us s from D.

The Multivariate Hypergeometric probability equation is just an extension of this idea. The figure below shows the graphical representation of the multivariate hypergeometric process: D1, D2, D3 and so on are the number of individuals of different types in a population, and x1, x2, x3, ... are the number of successes (the number of individuals in our random sample (circled) belonging to each category).

and results in the probability distribution for {s}:

where

## ModelRisk functions added to Microsoft Excel for the Multivariate Hypergeometric distribution

VoseMultiHypergeo generates random values from this distribution for Monte Carlo simulation

VoseMultiHypergeoProb returns the probability mass or cumulative distribution function for this distribution

VoseMultiHypergeoProb10 returns the log10 of the probability mass or cumulative distribution function