Fitting a first order parametric distribution to observed data | Vose Software

Fitting a first order parametric distribution to observed data

See also: Fitting distributions to data, Fitting in ModelRisk, Analyzing and using data

This section describes methods of finding a theoretical (parametric) distribution that best fits the observed data. Another section deals with fitting a second order parametric distribution, i.e. a distribution where the uncertainty about the parameters needs to be recognised. A parametric distribution type may be selected as the most appropriate to fit the data for three reasons:

  • The distribution's mathematics correspond to a model that accurately represents the behaviour of the variable being considered;

  • The distribution to be fitted to the data is well known to closely fit this type of variable;

  • The analyst simply wants to find the theoretical distribution that best fits the data, whatever it may be.

The third option is very tempting, especially when software is available that can automatically attempt fits to a large number of distribution types at the click of an icon. However, this option should be used with caution. The analyst must ensure that the fitted distribution covers the same range over which, in theory, the variable being modelled may extend: for example, a four-parameter Beta distribution fitted to data will not extend past the range of the observed data if its minimum and maximum are determined by the minimum and maximum of the observed data. The analyst should ensure that the discrete or continuous nature of the distribution matches that of the variable. S/he should also be flexible about using a different distribution type in a later model, should more data become available, though this may cause confusion when comparing old and new versions of the same model. Finally, s/he may find it difficult to persuade the decision-maker of the validity of the model: seeing an unusual distribution in a model with no intuitive logic associated with its parameters can easily promote distrust of the model itself. The analyst should consider including in the report a plot of the distribution being used against the observed data to reassure the decision-maker of its appropriateness.

The distribution parameters that make a distribution type best fit the available data can be determined in several ways. The most common technique is to determine parameter values known as maximum likelihood estimates (MLEs). The MLEs of the distribution are the parameters that maximise the joint probability density or probability mass for the observed data. MLEs are very useful because, for many distributions, they provide a quick way to arrive at the best-fitting parameters, but also because they can readily be adapted to censored data. An alternative method for determining the distribution parameters is to use an optimiser (like Microsoft Solver that comes with Excel) to minimise the differences between the cumulative probability curves of the data and the fitted distribution or some other measure of goodness-of-fit. Both using MLEs and minimising goodness of fit statistics enable us to determine first order distributions. However, for fitting second-order distributions we need additional techniques for quantifying uncertainty, like the Bootstrap, Bayesian inference and some classical statistics.