Time series

Introduction

Time series projections are used to model variables like import volumes, outbreak numbers, consumption rates, share price, exchange rates and bacterial growth where we are interested in modeling the variable over more than one period. We model these variables over time because it is important to us to know their values at intermediate stages in the history of the variable, not just at one point in time.

For the use and implementation of time series models in ModelRisk, see the Time series in ModelRisk topic.

The time series models that we produce must reflect:

The relationship of the variable's value at each modelled period;
Realistic ranges of the variable with time;
Any trends (drift), seasonality, and cyclicity (identifiable, non-periodic event)
The relationship between uncertainty and time (whether it increase or decreases, for example)

The simplest forecasting technique is to use the last value available in a time series as our estimate of all future values. This naive forecast is useful because we can look at the forecasting errors it produces and compare those errors with the errors produced by the other more sophisticated techniques. Clearly, if a more sophisticated and time-consuming technique does not provide us with an appreciable increase in accuracy over the naive forecast, it will not be worth adopting. We should attempt to find the technique that produces the smallest forecasting error for the least effort as our best estimator of the future.

The naive forecast may seem over-simplistic, but it is the most appropriate single point estimate of all methods if the parameter being estimated varies according to a random walk. The simplest random walk is where the (n+1)th term in a series is equal to the nth term plus a movement that has a symmetric, zero-centred probability distribution. Such a series has no memory of the path it took to arrive at the nth value: thus, no seasonal or cyclical patterns or trends exist except by pure chance. There are several other types of random walks, and we will look at some of the most important ones.

A notational convention

We use the following convention for describing a time series: St is the value of the time series variable at time t. Thus, for example, a random walk might be expressed as:

St=St-1*Uniform(0.9,1.1)

This means that the variable S at time t is only dependent on its value in the previous period (t-1), and is between 90% and 110% of its previous value.

Some useful principles

The model's behavior can be checked with imbedded Excel x-y scatter plots;
Split the model up into components rather than create long, complicated formulae. That way you'll see that each component is working correctly, and therefore have confidence in the time series projection as a whole;
Be realistic about the match between historic patterns and projections. Don't always go for a forecast model because it fits the data the best - also look at whether there is a logical reason for choosing one model over another;
Be creative. Short-term forecasts (say 20-30% of the historic period for which you have good data) are often adequately produced from a statistical analysis of your data. Even then, be selective about the model. However, beyond that time frame we move into crystal ball gazing. Including your perceptions of where the future may go, possible influencing events, etc will be just as valid as an extrapolation of historic data.

The properties of a time series forecast

When producing a risk analysis model that forecasts some variable over time I recommend you go through a list of several properties that variable might exhibit over time as this will help you both statistically analyze any past data you have and select the most appropriate model to use. The properties are: trend, randomness, seasonality, cyclicity or shocks, and constraints.

Trend

Most modeled variables have a general direction - a trend - in which they have been moving, or we believe they will move in the future. The four plots below give some examples of the expected value of a variable over time: top left - a steady relative decrease, such as one might expect for sales of an old technology, number of individuals remaining alive from a group; top right - a steady (straight line) increase, such as is often assumed for financial returns over a reasonably short period (sometimes called 'drift'); bottom left - a steady relative increase, such a bacterial growth or take up of new technology; and bottom right - a drop turning into an increase, such as the rate of component failures over time (like the bathtub curve in reliability modelling) or advertising expenditure (more at a launch, then lower, then ramping up to offset reduced sales).

Examples of expected value trend over time

Randomness

The second most important property is randomness. The four plots below give some examples of the different types of randomness: top left - a relatively small and constant level of randomness that doesn't hide the underlying trend ; top right - a relatively large and constant level of randomness that can disguise the underlying trend; bottom left - a steadily increasing randomness, which one typically sees in forecasting (care needs to be taken to ensure that the extreme values don't become unrealistic); and bottom right - levels of randomness that vary seasonally.

Examples of the behavior of randomness over time

Seasonality

Seasonality means a consistent pattern of variation of the expected value (but also sometimes its randomness) of the variable. There can be several overlaying seasonal periods but we should usually have a pretty good guess at what the periods of seasonality might be: hour of the day; day of the week; time of the year (summer/winter, for example, or holidays, or end of financial year). The following plot shows the effect of two overlaying seasonal periods. The first is weekly with a period of 7, the second is monthly with a period of 30, which complicates the pattern. Monthly seasonality often occurs with financial transactions that occur on a certain day of the month: for example, volumes of documents that a bank's printing facility must produce each day - at the end of the month they have to churn out bank and credit card statements and get them in the post within some legally defined time.

Expected value of a variable with two overlapping seasonal periods.

One difficulty in analyzing monthly seasonality from data is that months have different lengths, so one cannot simply investigate a difference each 30 days, say. Another hurdle in analyzing data on variables with monthly and holiday peaks is that there can be some spread of the effect over two or three days.

For example, we performed an analysis recently looking at the calls received into a US insurance company's national call centre to help them optimize how to staff the centre. We were asked to produce a model that predicted every 15 minutes for the next two weeks, and another model to predict out six weeks. We looked at the patterns by individual state and language (Spanish and English). There was a very obvious and stable pattern through the day that was constant during the working week, but had a different pattern on Saturday and on Sunday. The pattern was largely the same between states but different between languages. Holidays like Thanksgiving (the last Thursday of November, so not even a fixed date) were very interesting: call rates dropped hugely on the holiday to 10% of the level one would have usually expected, but were slightly lower than normal the day before (Wednesday), significantly lower the day after (Friday), a little lower during the following weekend, and then significantly higher the following Monday and Tuesday (presumably because people were catching up on calls they needed to make). Memorial Day, the last Monday of May, exhibited a similar pattern, as shown in the figure below.

Effect of holidays on daily calls to a call centre. The four lines show the effect on last four years.
Zero on the x-axis is the day of the holiday

The final models had logic built into them to look for forthcoming holidays and apply these patterns to forecast expected levels which had a trend by state and a daily seasonality. For the 15-minute models we also had to take into account the time zone of the state, since all calls from around the US were received into one location, which also involved thinking about when states changed their clocks from summer to winter and little peculiarities like some states having two time zones, Arizona doesn't observe daylight saving to conserve energy used by air-conditioners, etc.).

Cyclicity or shocks

Cyclicity is a confusing (rather similar to seasonality) term that refers to the effect of obvious single events on the variable being modelled. For example, the Hatfield rail crash in the UK on 12 October 2000 was a single event with a long-term effect on the UK railway network. The accident was caused by the lapsed maintenance of the track which led to 'gauge corner cracking', resulting in the rail separating. Investigators found many more such cracks in the area and a temporary speed restriction was imposed over very large lengths of track because of fears that other track might be suffering from the same degradation. The UK network was already at capacity levels so slowing down trains resulted in huge delays. The cost of repairs to the under-maintained track also sent RailTrack, the company managing the network, into administration. In analyzing the cause of train delays for our client, NetworkRail, a not-for-dividend company that took over from RailTrack, we had to estimate and remove the persistent effect of Hatfield.

Another obvious example is 9/11. Anyone who regularly flies on commercial airlines will have experienced the extra delays and security checks. The airline industry was also greatly affected, with several US carriers filing for protection under Chapter 11, though other factors also played a part like oil price increases and other terrorist attacks (also cyclicity events) which dissuaded people from going abroad. We performed a study to determine what price should be charged for parking at a US national airport, part of which included estimating future demand. Analyzing historic data, it was evident that the effect of 9/11 on passenger levels was quite immediate and as of 2006 was only just returning to 2000 levels, where previously there had been consistent growth in passenger numbers so levels still remain far below what would have been predicted before the terrorist attack.

Events like Hatfield and 9/11 are, of course, almost impossible to predict with any confidence. However, other types of cyclicity events are more predictable. As I write this (20 June, 2007), there are seven days left before Tony Blair steps down as Prime Minister of the UK which he announced on 10 May, and Gordon Brown takes over. Newspapers columnists are debating what changes will come about and, for people in the know, there are probably some predictable elements.

Two examples of the effect of a cyclicity shock. On the left, the shock produces a sudden and sustained increase of the variable; on the right the shock produces a sudden increase that gradually reduces over time - an exponential distribution is often used to model this reduction.

Constraints

Randomly varying time series projections can quite easily produce extreme values far beyond the range that the variable might realistically take. There are a number of ways to constrain a model. Mean reversion discussed later will pull a variable back to its mean so that it is far less likely to produce extreme values. Simple logical bounds like IF(S_t>100,100,S_t) will constrain a variable to remain at or below 100, and one can make the constraining parameter (100) a function of time too.

Read on: Time series models with leading indicators