Sum of a random number of random variables | Vose Software

Sum of a random number of random variables

See also: Aggregate modeling in ModelRisk, Aggregate distributions introduction

In most situations, we knew precisely the number of random variables we had to add together. However, a problem frequently arises where the number of random variables being summed up is itself a random variable. Some examples are:

  • The total purchases by the number of customers N that might enter a shop next week where we know the probability distribution of the purchase amount from a random customer.

  • The amount of lake water that might be drunk by campsite visitors N this summer where we know the probability distribution of the amount of lake water drunk by a random camper, and the resultant number of giardia cysts that might be consumed, where we know the concentration of giardia cysts in the lake water.

  • The cost of insurance claims to an insurer where it knows the expected number of claims it will receive in a period, and knows the probability distribution of the size of a random claim.

ModelRisk has many functions especially for handling the distribution of the sum of random variables. See Aggregate modeling in ModelRisk.

An in-depth explanation about summing random variables ('aggregate modeling'), including many more example models and advanced techniques can be found in the Aggregate distributions section.

Example 1

A company insures aeroplanes. They crash at a rate of 0.23 crashes per month. Each crash costs $Lognormal(120,52) million.

Question : What is the distribution of the value of the liability if we discount it at the risk free rate of 5%?

This requires that we know the time at which each accident occurred, using Exponential distributions. The solution is shown in the example model plane_crashes2.

Example 2

For extremely large numbers of random variables, we can use the CLT identity. For example, suppose we think that there will be Poisson(270000) potential customers passing by the front of a store, and that there is a 3% probability that any one of them will enter the store. Assuming each passer-by makes their decision to enter independently of any other passer-by, the number of people entering the store in a year will be Poisson(270000*3%). If there is a 10% probability that a customer in the store purchases and again we assume that the make the decision to buy independently of others, the number of purchasers will be Poisson(270000*3%*10%) = Poisson(810). Let's also suppose that we have empirical data on past purchase sizes that can be summarized in the following histogram plot:

A plot of the Poisson(810) distribution shows that the number of purchasers will in all probability be above about 720.

Since the distribution of purchase size by customer is not too skewed, and the number we are adding together large, we can use Central Limit Theorem. The mean and standard deviation of the histogram plot are $12.71 and $7.27 respectively, so a model of the total sales receipts for the year can be built as shown in the example model Sales_at_the_store.

The CLT limit distribution of a sum of random variables is implemented in ModelRisk with the VoseCLTSum function.