next up previous
Next: About this document ... Up: No Title Previous: A Simple Test for

Using Runs to test for randomness

Suppose that a sequence of H's and T's was supposed to represent an outcome for tosses of a fair coin. How could we construct a test that would help us distinguish the null hypothesis, ``It is a random sequence from a fair coin'' from the alternative hypothesis that it is not.

One way is to count the number of runs of H's and T's. For example, if the outcome were

HHHHHHHHHHHHHHHHHHHH

there would be one run of H's of length 20, and we would suspect that that sequence was not random (see Rosencrantz and Guildenstern Are Dead by T. Stoppard). Too many runs may not be good either, since

HTHTHTHTHTHTHTHTHTHT

has 20 runs of length 1, and we would have a hard time believing that this is random.

What then is the expected number of runs in N tosses of a fair coin. If we knew this, we would have the beginings of a test, as we would want to accept the null hypothesis if, in some sense, the observed number of runs was not far from this value. More importantly, we want the probability distribution for the number of runs. This can be obtained by a simple counting argument (see An Introduction to Probability Theory and Its Applications by W. Feller). We find that if the sequence contains rH runs of H's and rT runs of T's and k=2v then

\begin{displaymath}
Pr(2v\;{\rm runs}) =
2\frac{\mbox{$\left(\begin{array}
{c}r_...
 ...mbox{$\left(\begin{array}
{c}r_H+r_T\\ r_H\end{array}\right)$}}\end{displaymath}

and if k=2v+1 then

\begin{displaymath}
Pr(2v+1\;{\rm runs}) = 
\frac{\mbox{$\left(\begin{array}
{c}...
 ...mbox{$\left(\begin{array}
{c}r_H+r_T\\ r_H\end{array}\right)$}}\end{displaymath}

If we let R denote the number of runs, then the expected number of runs, given the observed number of H's and T's is

\begin{displaymath}
{\rm E}[R] = \frac{2r_Hr_T}{r_H+r_T} + 1\end{displaymath}

and the variance is

\begin{displaymath}
{\rm Var}[R] = \frac{2r_Hr_T(2r_Hr_T-r_H-r_T)}{(r_T+r_H)^2(r_T+r_H-1)}\end{displaymath}

The standardized number of runs has an approximately normal distribution when the number of H's and T's is large.
next up previous
Next: About this document ... Up: No Title Previous: A Simple Test for
Eric S Key
2/12/1999