Rank Order Correlation Coefficient | Vose Software

# Rank Order Correlation Coefficient

Note that there is also an entire section devoted to modeling correlation (explaining more advanced methods).

Spearman rank order correlation coefficient r is a non-parametric statistic for quantifying the correlation relationship between two variables. Non-parametric means that the correlation statistic is not affected by the type of mathematical relationship between the variables, unlike linear least squares regression analysis, for example, that requires the relationship to be described by a straight line with Normally distributed variation of the dependent variable about that line. Calculating the rank order correlation analysis proceeds as follows:

Replace the n observed values for the two variables X, Y by their ranking: the largest value for each variable has a rank of 1, the smallest a rank of n or vice versa. The Excel function RANK( ) can do this, but it is inaccurate where there are ties, i.e. where two or more observations have the same value. In such cases, one should assign to each of the same-valued observations the average of the ranks they would have had if they had been infinitesimally different from the value they take.

The Spearman rank order correlation coefficient r is calculated as

where ui, vi are the ranks of the ith pair of the X and Y variables. This is, in fact, a shortcut formula: it is not exact when there are tied measurements, but still works well when there are not too many ties relative to the size of n. The exact formula is:

where

and where ui, vi are the ranks of the ith observation in samples 1 and 2 respectively. This calculation does not require that one identify which variable is dependent and which is independent: the calculation for r is symmetric so X and Y could swap places with no effect on the value of r. The value of r varies from -1 to 1 in the same way as the least squares regression coefficient r. A value of r close to -1 and 1 means that the variables are highly negatively and positively correlated respectively. A value of r close to zero means that there is no correlation between the variables.

Just as with least squares regression, it is important to determine whether the degree of correlation given by the value of r is real or a spurious result brought about by the effects of randomness. The value of r can be tested for statistical significance by constructing a test statistic t in the same way as we have seen for least squares regression:

which approximates to a t-distribution with (n-2) degrees of freedom.

Read on: Least Squares Linear Regression