Bootstrap estimate of prevalence

Prevalence is the proportion of a population that has a particular characteristic. An estimate of the prevalence p is usually made by randomly sampling from the population and seeing what proportion of the sample has that particular characteristic. Our confidence around this single point estimate can be obtained quite easily using the non-parametric Bootstrap. Imagine that we have randomly surveyed 50 people in the capital city of country A and asked them how many will be voting for the party X in a presidential election the following day. Let's rather naively assume that they all tell the truth and that none of them will change their mind before tomorrow. The result of the survey is that 19 people said they will vote for X. Our data set is therefore a set of 50 values, 19 of which are 1 and 31 of which are 0. The relative frequency distribution of the results looks like this:

A non-parametric Bootstrap would sample from this data set. Thus, the Bootstrap replicate would be equivalent to a set of 50 Binomial(1,0.38) distributions.

The estimate of prevalence is then just the proportion of the Bootstrap samples that are 1, i.e.:

The sum of 50 Binomial(1,0.38) distributions is just Binomial(50,0.38), so our Bootstrap estimate of p is:

In general terms, for s observed successes in n trials, this is:

This is exactly the same as the classical statistics estimate. It is also the same result whether one took the parametric or non-parametric approach to the Bootstrap. Note that we don't recommend this particular estimation method because a more sophisticated analysis is possible.

Bootstrap estimate of prevalence

Navigation