Probability distributions used in Tamara | Vose Software

# Probability distributions used in Tamara

Tamara uses the following probability distributions:

·         Bernoulli – for modelling a risk event’s occurrence where it may only occur once, including productivity risk factors that may or may not occur

·         Poisson – for modelling a risk event’s occurrence where it may occur several times

·         Modified PERT – for modelling scope uncertainty and productivity risk factors

·         ThreePointEstimate – for modelling the delays incurred by risk events

These distributions are described in detail below.

## The Bernoulli distribution

The Bernoulli distribution is defined by a single parameter p. A random variable following a Bernoulli(p) has probability p of being 1 and probability (1-p) of being 0:

p can take any value from 0 to 1. Bernoulli(p) is used internally in Tamara to simulate the occurrence of a risk event (1 represents the event occurring, and 0 represents the event not occurring) where p is the ‘expected frequency’ (i.e. probability) defined in the risk register and the user has checked the Single Event box:

## The Poisson distribution

The Poisson distribution is defined by a single parameter l. A random variable following a Poisson(l) can take any non-negative integer (0,1,2,…) and has a mean of l:

l can take any non-negative value. The Poisson(l) distribution arises naturally as the distribution of the number of events that may occur within a certain period where:

• On average, we expect l events to occur over the period in question;

• Events occur randomly in time;

• The system has no memory, meaning that the probability of an event occurring in each moment in time is constant and unrelated to how many events have already occurred

Poisson(l) is used internally in Tamara to simulate the number of occurrences of a risk event where l is the expected frequency defined in the risk register and the user has not checked the Single Event box:

## The Modified PERT distribution

The Modified PERT distribution was first introduced in Vose (2000)[1]. It is a version of the common PERT distribution, which itself was offered as an alternative to the problematic Triangle distribution.

A Triangle distribution is intuitively appealing as a subjective estimate of the uncertainty of a continuous[2] variable because it is defined by three easily understandable parameters: the minimum, most likely (mode), and maximum values the variable might take. For example, if the time taken to get home from work was between 20 mins (minimum) and 80 mins (maximum), most likely 30 mins (mode), the Triangle(20,30,80) would look like this:

The main issue with a Triangle distribution is that the result tends to overestimate the probabilities in the tails of the distribution. To correct this, the PERT distribution was created:

The PERT distribution now assigns too little weight to the most extreme tail (around 80 in the above chart). One way to compare the Triangle and PERT is to look at the equations for their means:

The PERT places 4 times more emphasis on the Mode (which is often fairly well understood) than the Minimum or Maximum. The Triangle places equal emphasis on all three parameters. The Modified PERT alters this weighting. In practice, the best weighting as a rough rule of thumb is 3, so:

This places a little more emphasis on the tails, whilst retaining the more natural curved shape of the PERT:

ModifiedPERT(min, mode, max) is used internally in Tamara to model Work Amount Uncertainty and Productivity Risk Factors.

## The ThreePointEstimate distribution

The Modified PERT distribution requires that one provide the Minimum, Mode and Maximum values for a variable. Although intuitive, in practice it turns out to be very difficult to ask such questions for the impact of risk events because:

·         Such events will probably never, or at least rarely, have occurred in the past and there will be little data to base an estimate on;

·         Risk events can have a potentially extremely large maximum impact. Experts providing the estimates often don’t feel comfortable trying to put a number to that maximum, even perhaps of conceiving what the scenario might be.

This becomes a blocking point in the estimation process. Tamara gets round this issue in the estimation of delay due to a risk event by asking for the minimum, mode (most likely) and the P90 (a value that the expert estimates the risk impact has a 90% chance of being below). The ThreePointEstimate distribution then creates a version of the Modified PERT that satisfies these conditions, for example:

The distribution extrapolates beyond the P90 (here given as 60) so that there is a 90% probability of lying below 60, shown by the blue area.

Note that the ThreePointEstimate has a finite maximum, in contrast to other distributions like the Lognormal with a similar shape that have an infinite tail. Thus Tamara will not generate scenarios of impact delays that are impossibly long, which would occur occasionally with infinite-tail distributions.

[1] Vose, David (2000). Risk Analysis: A Quantitative Guide. Published by John Wiley and Sons, Chichester, UK.

[2] A continuous variable may take any value within a range. Typical continuous variables are time, distance, volume, weight, and perhaps money, which have an infinitely divisible scale.