Modeling correlation

Independent variables may take any value from their distributions irrespective of the value from any other variable. For example, if one wished to model the weight of 10 random people queuing to get into a lift, each person's weight is independent of every other person's. In other words, the probability that one variable will take a specific value is unrelated to the value that any of the other variables take.

For dependent variables, the probability that a dependent variable takes a specific value is in some way related to the value that another variable, or other variables, take. For example, we might have three distributions representing the time it will take to design (D), code (C) and test (T) a new software application that our company is writing:

We might argue that the longer the time required to design the software, the more complicated it turned out to be, and thus the longer it would take to code. In that case, C will tend to either both be large or small or in the middle:

If we were to plot out random scenarios from these two distributions, they might look like this:

The figure below plots distributions of values of the code time C generated when D is in a low range (between 7 and 8) and again when D is in a high range (between 32 and 33). These are called conditional distributions for C: they are conditioned on the value that D takes.

There are several different ways to model correlation:

Rank order correlation

Most Monte Carlo add-ins to Excel offer rank order correlation as a quick way of forcing two or more distributions to be sampled to produce a correlation effect. It is a quick, easy but non-intuitive method of correlating variables through their random number generation. Rank order correlation does not need to model the direction of the influence, so one does not have to specify which variable is dependent on which.

Envelope method

This method has the dependent variable being modelled by a distribution whose parameters are functions of the independent variable. It is well suited to modeling expert opinion of correlated variables, and is easy to use and check. It can model one-to-many relationships, but is difficult to adapt to many-to-many relationships and would require determining a logical sequence of relationships.

Using lookup tables

This method modifies a distribution or selects from different distributions to model a variable, according to the value that is generated for a variable it is being influenced by. The lookup table method is well suited to modeling expert opinion of correlated relationships, and can model one-to-many influences, but is difficult to adapt to many-to-many influences and would require a sequence of influence.

Conditional logic

There are various functions (e.g. IF(), AND(), OR()) in Excel that allow one to build up a logic that makes a Cell switch between values according to other Cell values. We can capitalise on these features to build up relationships between our model variables.

Copulas

Copulas have received a great deal of attention in recent years, especially in the insurance and finance fields. A copula is a multivariate distribution whose marginal distributions are Uniform(0,1). If we generate values from a copula we can then use those values to generate univariate distributions using the inversion method, which gives a correlation structure to the variables. Like rank order correlation, copulas do not model the direction of the influence, so one does not have to specify which variable is dependent on which. ModelRisk offers the most popular copulas.

Modeling correlation

Rank order correlation

Envelope method

Using lookup tables

Conditional logic

Copulas

Navigation