 Some Poisson models | Vose Software

# Some Poisson models

Lightning strikes, car accidents, machine failures, political crises, disease outbreaks - are all random events in time that can be thought of as independent of each other. Daisies on a lawn, bacteria in a liquid, mould in a silo, diamonds in a rock - can all be thought of as random events in either two (surface) or three (volume) dimensional space.

The most common approach in modelling a distribution of how many of these events a might occur in a given amount of time or space t is to assume that the counts are from a Poisson process., in which case the counts will take a Poisson distribution:

Counts a = Poisson(l*t)

where l is the mean (expected) number of events that would occur per unit t. Care needs to be taken with the units of l and t to ensure that they match.  The product l*t is the expected number of events over the period t and is sometimes called the Poisson intensity.

In a Poisson process, there is a continuous and constant opportunity for an event to occur. The Poisson process could be applied to approximate the number of events that occur both in time and space. For example, if there is a constant opportunity that a person receives an e-mail with a virus, then the number of viruses a that the person will receive within a year could be modelled as:

a = Poisson (l * t )

where l is the number of infected e-mails per period of time (one day for example), and t - is the number of periods (365 days in this case).

The same applies to the number of events in space. If bacteria were randomly distributed in a vat of liquid, and not dying or multiplying, the number of bacteria consumed a by drinking from that vat would follow a Poisson process, where the measure of exposure would be the amount of liquid consumed:

a = Poisson (l * t ), where l is the number of bacteria per unit of space (one cm3 for example), and t - is the amount of cubic centimetres of liquid consumed by a person.

ModelRisk provides a Poisson function =VosePoisson(l) with just one parameter - the Poisson intensity which they call l. We prefer to separate l (as the expected counts per unit exposure) and t (the amount of exposure) as it helps to avoids some common confusion over units, but the result is the same.

Excel offers a function POISSON( ) that calculates probabilities of the required Poisson distribution. This sometimes causes confusion as both functions (Excel and ModelRisk) have the same name, but they are easy to distinguish as their formats are quite different:

= POISSON(x, l, 0/1_toggle)

=VosePoisson(l)

The VosePoisson function also returns random numbers that are integers, and the POISSON function returns a probability.

Two more useful formulae: from the probability mass equation for the Poisson distribution, we get:

Probability of zero counts = EXP(-l*t) = POISSON(0,l*t,0)

Probability >zero counts = 1- EXP(-l*t) = 1-POISSON(0,l*t,0)

Examples of Poisson count modelling

Example 1

An insurance policy has sold 230 000 car insurance policies for the next year. Last year, there were 0.045 claims/car insured/year, and this rate is expected to continue for next year. How many insurance claims will there be next year:

Ignoring our uncertainty about l (see here), we would proceed as follows:

l  = 0.045 claims/car insured/year

t = 230 000 insured car years

Claims = Poisson(230 000 * 0.045) = Poisson(10350) This Poisson distribution is almost exactly a Normal(l, √l) - see Central Limit Theorem for an explanation. It is interesting to note that there is not much randomness about the 10 350 value: the distribution varies between about 10 000 and 10 700. This is because a Poisson distribution has a standard deviation equal to the square root of its mean, so a Poisson(10 350) distribution has a standard deviation of about 102: less than 1% of its mean. If the insurance company had sold just a few policies, so it could expect 50 claims in a year for example, the standard deviation would have been about 7, or about 14% of the mean. The stability that comes from large numbers of claims enables insurance companies to accurately predict their expenditure and therefore offer very competitive policies.

Example 2

In a factory, it is estimated that there will be about 0.3 injuries/person/1000 hours worked. There are 20 machines, each manned by one person. The machines run 12 hours/day, 250 days per year. How many injuries will there be in the next year?

l = 0.03 injuries/person/1000 hours worked = 0.00003 injuries/person hour worked

t = 20*12*250 = 60000 person hours of work

Injuries = Poisson(0.00003*60000) = Poisson(1.8) The probability there will be no injuries next year = EXP(-1.8) ≈ 17%, and thus the probability of at least one injury in the year is 83%.

Extension to the model to account for seasonality

The rate at which things occur in time is often seasonal. For example, car accidents occur more during rush hour (a daily seasonality), the beginning of summer holidays (a yearly seasonality), Monday to Friday when people work (a weekly seasonality), etc. If the time being considered means that any seasonality will be averaged out, we can ignore it. For example, if we wanted to estimate insurance claims next year, all daily, weekly and yearly seasonality are averaged out in a yearly estimate of l. However, if wanted to forecast for just one month, it would be important to know which month: is it a winter month, in which case there may be more accidents due to snow or ice, etc.

There is a probability identity that says that Poisson(a) + Poisson(b) = Poisson(a+b). So, if we calculate lambdas for, say, n periods each of length t, the counts for the total n*t period can be modeled as:

=Poisson(l1t+ l2t + ...+ lnt) = Poisson(t* )

In other words, we simply have to sum the ls for each period to get a total l for the entire n*t period and use one Poisson distribution to model the total counts.

Extension to the model to account for 'over-dispersion'

Sometimes historic observations in equal length periods have a wider distribution than a fitted Poisson distribution would suggest. If that 'over-dispersion' cannot be accounted for by seasonal variations in the Poisson intensity, then it could be that the Poisson intensity is randomly varying itself. A rather neat result is that, if one models the random variation of l as a Gamma(a,b) distribution, the resultant Poisson distribution becomes a Negative Binomial distribution. So:

=VosePoisson(VoseGamma(a,b)) is the same as = VoseNegBin(a, 1/(1+b))

This is very useful because a Gamma distribution can take a wide variety of right-skewed distributions: from an Exponential, through a Lognormal type of shape, to a Normal distribution, so the NegBin gives us quite some flexibility.

Example 3

In 79 weeks, a company has observed 21 transaction failures.

Each transaction failure could cost:

 Cost (Ј) Probability 70 70% 120 20% 190 10%

Question: What is the cost for these transaction failures next year?

Example model Transaction failures shows the solution to this problem.