Critical Values and Confidence Intervals for Goodness-of-Fit Statistics | Vose Software

# Critical Values and Confidence Intervals for Goodness-of-Fit Statistics

Analysis of the c2, K-S and A-D statistics can provide confidence intervals proportional to the probability that the fitted distribution could have produced the observed data. It is important to note that this is not equivalent to the probability that the data did, in fact, come from the fitted distribution, since there may be many distributions that have similar shapes and that could have been quite capable of generating the observed data. This is particularly so for data that are approximately normally distributed, since many distributions tend to a Normal shape under certain conditions.

Critical values are determined by the required confidence level a. They are the values of the goodness-of-fit statistic that has a probability of being exceeded that is equal to the specified confidence level. Critical values for the c2 test are found directly from the c2 distribution. The shape and range of the c2 distribution are defined by the degrees of freedom n where:

n = N-a-1

N = number of histogram bars or classes

a = number of parameters that are estimated to determine the best-fitting distribution

Critical values for K-S and A-D statistics have been found by Monte Carlo simulation (Stephens 1974, Stephens 1977 and Chandra et al 1981). Tables of critical values for the K-S statistic are very commonly found in statistical text books. Unfortunately, the standard K-S and A-D values are of limited use for comparing critical values if there are fewer than about 30 data points. The problem arises because these statistics are designed to test whether a distribution with known parameters could have produced the observed data. If the parameters of the fitted distribution have been estimated from the data, the K-S and A-D statistics will produce conservative test results, i.e. there is a smaller chance of a well fitting distribution being accepted. The size of this effect varies between the type of distribution being fitted.

Modifications to the K-S and A-D statistics have been determined to correct for this problem as follows where n is the number of data points and Dn and An2 are the unmodified K-S and A-D statistics respectively:

Kolmogorov-Smirnoff Statistics

 Distribution Modified Test Statistic Normal Exponential Weibull & Extreme Value All others

Anderson-Darling Statistics

 Distribution Modified Test Statistic Normal Exponential Weibull & Extreme Value All others

Another goodness-of-fit statistic with intuitive appeal, similar to the A-D and K-S statistics, is the Cramer-von Mises statistic Y:

The statistic essentially sums the squared differences between the cumulative percentile F0(Xi) for the fitted distribution for each Xi observation and the average of i/n and (i-1)/n: the low and high plots of the empirical cumulative distribution of Xi values. Tables for this statistic can be found in Anderson and Darling (1952).