Time series projection of events occurring randomly in time

Many things we are concerned about occur randomly in time: people arriving at a queue (customers, emergency patients, telephone calls into a centre, etc); accidents, natural disasters, shocks to a market, terrorist attacks, particles passing through a bubble chamber (a physics experiment), etc. Naturally we may want to model these over time, perhaps to figure out whether we will have enough stock vaccine, storage space, etc.

The natural contender for modelling random events is the Poisson distribution which returns the number of random events occurring in time t when events are expected per unit time within t. Often we might think that the expected number of events may increase or decrease over time so we make a function of t.

For example, we could use the following equation:

St=Poisson(m*t+c)

The example model Poisson_random_walk.xls illustrates this process.

This model can be used, for example, to describe vehicle accident claims made to an insurance company, or cases of a disease for a health authority: as the number of cars increases, the number of car crashes increases correspondingly according to some function; as the pollution level in a city increases, the number of people with respiratory disease increases.

The fractional variation of the series is much bigger on the left panel than that on the right panel. This is because the standard deviation of Poisson(l) counts equals √l. Thus, the coefficient of variance (std.dev./mean) is 1/√l. which gets smaller as l gets bigger, meaning that the larger the expected number of events, the smaller the fractional variation one would observe. This property of a Poisson process is very useful to insurance companies: the more people they cover, the more stable their liabilities become, and the less margin they need to cover themselves at a certain risk level... an example of when big is actually better.

The equation St=Poisson(m*t+c) has some limitations in that if m is negative then after time T=-c/m the equation will produce negative (i.e. impossible) values for the Poisson mean. If one is approaching such a situation it is worth considering the following equation, which is the basis of Poisson regression techniques:

St=Poisson(EXP(m*t+c))

I.e. ln(l)=m*t+c

A variation of this model is to take account of seasonality by multiplying the expected number of events by seasonal indices (which should average to 1).

Seasonality for lambda

Imagine that an insurance company needs to create a risk analysis model of the number of car crashes that will occur in the country in the next 52 weeks. A reasonable assumption (which can be checked by analyzing the historic data) is that the number of car crashes n(t) over a period of time follows a Poisson process, i.e. each car crash is independent of any other. This is, of course, not exactly true since many of the car crashes involve at least two cars, and sometimes more than 10, but probably not from the same insurance company. Here we will neglect this small approximation, so:

n(t) = Poisson(l(t))

The Poisson intensity parameter - (t) - is the mean, or expected, number of events per unit time. In this model it is not constant throughout the year because of two factors:

The number of crashes depends on the number of cars in the country. Let's assume that the number of cars in the country will grow within a period of one year by 15%. And since the correlation between the two parameters is probably not perfect, the number of car crashes is expected to increase by 10% over the same period.
The seasonality factor. The number of car crashes increases in the winter season due to several reasons like slippery roads and low visibility, and with certain yearly events like summer holidays, Christmas, etc. Seasonality is a repeated underlying pattern (perhaps disguised by overlying randomness) from one year to the next.

We can model seasonality as follows:

where f(t) - is a trend function and Si - is a seasonality factor for period i.

The example model Poisson_series shows an example of the above technique.

The Poisson intensity parameter may also include other factors - in fact, as many factors as needed in order to give a fair estimate to the mean number of events over a period of time. For example, if the same insurance company was to model the number of old people deaths in transition-economy country X, (t) might consist of the following factors:

The trend factor, which is influenced by the changes in the population size and improvement of medical care;
The seasonality factor. The old people tend to die more often in hot and cold seasons, and less in other seasons; and
The economic factor. As Country X is going through economic hardships, many old people are affected by instability in the country and their death can be caused by factors like: stress, cold (as they are not able to pay for central heating), malnutrition.

This example model provides an example: Seasonal_Poisson_random_walk

Using a Polya

The Pólya and Delaporte distributions are counting distributions that are similar to the Poisson but allow to be a random variable too. The Pólya is particularly helpful because with one extra parameter, h, we can add some volatility to the expected number of events, as shown in the following model:

Example model Polya_time_series - A Pólya time series with expected intensity as a linear function of time and coefficient of variation of h = 0.3.

Notice the much greater peaks in the plot for this model compared to that of the previous model. Mixing a Poisson with a Gamma distribution to create the Pólya is a helpful tool because we can get the likelihood function directly from the pmf of the Pólya and therefore fit to historical data. If the MLE value for h is very small then the Poisson model will be as good a fit and has one less parameter to estimate, so the Pólya model is a useful first test.

The linear equation used in the above two models for giving an approximate description of the relationship of the expected number with time is often quite convenient, but one needs to be careful because a negative slope will ultimately produce a negative expected value, which is clearly nonsensical (which is why it is good practice to plot the expected value together with the modelled counts as shown in the two figures above). The more correct Poisson regression model considers the log of the expected value of the number of counts to be a linear function of time, i.e.:

where b₀, b₁ are regression parameters. The ln(e) term in the equation is included for data where the amount of exposure e varies between observations. For example, if we were analyzing data to determine the annual increase in burglaries across a country where our data are given for different parts of the country with different population levels, or where the population size is changing significantly (so the exposure measure e would be person-years). Where e is constant we can simplify the previous equation to:

The following model fits a Pólya regression to data (year <=0) and projects out the next three years on annual sports accidents where the population is considered constant so we can use the equation presented above:

Example model Polya_regression - Pólya regression model fitted to data and projected three years into the future. The LogL variable if optimized using Excel's solver with the constraint that h>0.

Read on: Seasonal time series

Time series projection of events occurring randomly in time

Seasonality for lambda

Using a Polya

Navigation