Note: knowing intermediate probability and calculus is necessary to understand this post.
Last week, I completed a summer course at Stanford: CS109, Probability for Computer Scientists. This 8-week long class was intense and challenging but one of the most rewarding classes I’ve taken at Stanford so far. A few weeks ago, I had an idea for a homework problem but I couldn’t quite figure out how to implement the idea. The problem is as follows:
Below are two sequences of 300 “coin flips” (H for heads, T for tails). One of these is a true sequence of 300 independent flips of a fair coin. The other was generated by a person typing out H’s and T’s and trying to seem random. Which sequence is the true sequence of coin flips? Make an argument that is justified with probabilities calculated on the sequences. Both sequences have 148 heads, two less than the expected number for a 0.5 probability of heads.
Sequence 1: TTHHTHTTHTTTHTTTHTTTHTTHTHHTHHTHTHHTTTHHTHTHTTHTHH TTHTHHTHTTTHHTTHHTTHHHTHHTHTTHTHTTHHTHHHTTHTHTTTHH TTHTHTHTHTHTTHTHTHHHTTHTHTHHTHHHTHTHTTHTTHHTHTHTHT THHTTHTHTTHHHTHTHTHTTHTTHHTTHTHHTHHHTTHHTHTTHTHTHT HTHTHTHHHTHTHTHTHHTHHTHTHTTHTTTHHTHTTTHTHHTHHHHTTT HHTHTHTHTHHHTTHHTHTTTHTHHTHTHTHHTHTTHTTHTHHTHTHTTT
Sequence 2: HTHHHTHTTHHTTTTTTTTHHHTTTHHTTTTHHTTHHHTTHTHTTTTTTH THTTTTHHHHTHTHTTHTTTHTTHTTTTHTHHTHHHHTTTTTHHHHTHHH TTTTHTHTTHHHHTHHHHHHHHTTHHTHHTHHHHHHHTTHTHTTTHHTTT THTHHTTHTTHTHTHTTHHHHHTTHTTTHTHTHHTTTTHTTTTTHHTHTH HHHTTTTHTHHHTHHTHTHTHTHHHTHTTHHHTHHHHHHTHHHTHTTTHH HTTTHHTHTTHHTHHHTHTTHTTHTTTHHTHTHTTTTHTHTHTTHTHTHT
There’s actually a straightforward approach to solving this problem that involves the geometric distribution, but after researching ways to approach this problem, I stumbled across something known as the Wald–Wolfowitz runs test (or simply runs test). According to Wikipedia, this is a “statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.” In layman’s terms, it can be used to test the randomness of a distribution comprised of only two possible outcomes (heads and tails for example).
Intuitively, if someone flips a coin 10 times and gets 5 heads and 5 tails, it is more plausible that the order of outcomes H, T, T, H, T, H, H, H, T, T is more likely to be random than something like H, H, H, H, H, T, T, T, T, T. The runs test can analyze both experiments to see which one is more likely to be a truly random outcome of flips. By calculating the expectation (mean) and variance of the number of runs in this type of experiment, we can use the normal distribution to approximate the probability that we get a certain number of runs. We use the word “runs” to denote consecutive outcomes of the same result.
I was on a mission to find the expectation and variance for this problem. The expectation formula was simple to solve in a matter of minutes, but for some reason, I couldn’t find a proof of the variance anywhere on the internet (after an hour of searching). Every source I found cited textbooks that I do not have easy access to (this should be remedied once I’m an official Masters student this fall). I did, however, find a PDF of a book written in 1940 by the very people that the runs test is named after: “On a test whether two samples are from the same population”, by Wald and Wolfowitz. Excited to finally have the source of this test right at my fingertips, I began skimming the pages. To my disappointment, the authors decided to not reveal the “tedious calculations”:
*Blank stare*. I’m really curious to find out where exactly they proved these equations. Sigh. Thus I was left on my own to figure it out. I tried over several days to wrap my head around this problem, and it wasn’t until after finishing the class that I finally was able to approach the problem with a clear head. Here is my approach.
Let be the number of runs (consecutive outcomes of the same result) in a series of coin tosses that result in heads and tails. We will find the equations for the expectation and variance of .
First, we will find the expected number of runs. Define to be an indicator variable that represents whether the coin flip result does not equal the coin flip result. For any positive number of coin tosses, we can count the number of times the outcome switches to a different result. Since the second run counts as the first switch, we add 1 to the equation below to account for the first run, which defines the way the runs can alternate between heads and tails. For example, has 4 runs: 1 run of two heads followed by 3 switches (heads to tails, tails to heads, and heads to tails). We can now express R as 1 plus the sum of the indicator variables:
The summation starts from the first flip and goes to the second to last flip since we want to compare every flip to the flip that comes after it. Thus we compare the flips in every such pair. We can now express the expected number of runs when there are heads and tails as follows:
(Via linearity of expectation)
(Via linearity of expectation)
Since the expected value of an indicator variable is simply the probability of the variable’s success, we have:
We need the flip to be a different outcome than the flip that comes after it. If the flip is heads, there are ways to pick heads out of outcomes, giving a probability that the flip is heads. After choosing one heads, there are remaining flips and T ways to choose tails out of those remaining flips. We lastly double this result since we have the same probability of having tails on the flip followed by heads. Simplifying gives:
(Since the body of the summation didn’t depend on j)
Thus we have the expected number of runs when we have heads and tails.
To find the variance, we must compute . First, compute :
To compute , we’ll first find . Recall that . Thus:
(Substitution using equation for R)
(Equation 1: Expansion of square of summation)
We can split the square of the summation this way because the square of a summation represents all pairs in the range 1 to , inclusive, where there are two cases we can consider: the case where (represented by the first summation) and the case where (represented by the second summation). To show why this works, consider a simple example:
Note that we can actually compute the square using the FOIL method as follows:
We can rearrange the terms to get the following:
Notice that we can express these terms as the sum of two summations: a summation of squares and a summation of products of different-valued integer pairs:
This is like the trick we used earlier. At this point, we can also use another summation property to expand the latter summation:
Notice that the last three terms shown at the end of Equation 2 each have a product of an integer and a larger integer. Thus we can represent these products as the summation shown in Equation 3, where we double the summation to account for symmetry (e.g. ). We could’ve also reversed the inequality sign to achieve the same result. I arbitrarily chose .
We can now use this approach in our equation for (Equation 1):
(because I is an indicator variable)
(Substitution using equation for R)
Now we can compute :
(Via linearity of expectation)
(Via linearity of expectation & substitution using equation for R)
(Both variables must = 1 in order for their product to = 1)
In order to find the probability that both indicator variables equal 1, let’s further split the summation into two cases. Either the indicator variables and each refer to pairs of flips that overlap (meaning the second flip of (the flip) equals the first flip of ), or the two indicator variables represent pairs of flips that have no flips in common. Since these two outcomes have different probabilities, we split them into summations that cover each case separately. Let’s explore these cases in more depth.
Case 1: We consider all indicator variables that share a flip, starting with indicator variables that compare the first three flips and incrementing by one until we end the summation with indicator variables that compare the last three flips. For example, say we have the following outcomes of 6 coin flips: . The following highlighted outcomes represent indicator variables that share an outcome, where the first two purple-colored outcomes are the outcomes compared in the first indicator variable and the last two purple-colored outcomes are the ones compared in the second indicator variable:
This example makes it easier to see that the summation bounds go from to so we always have three outcomes to consider.
Case 2: We consider all indicator variables that do not share a flip, thus four flips are considered at a time. Let’s visualize this with our 6 coin flips example:
In this case, we can use a nested summation to ensure that and never refer to the same flip. The outer summation starts from . It ends at to ensure there are always four outcomes to consider (two for each indicator variable). The inner summation starts at (to ensure that it doesn’t consider one of the two flips that considers), and it ends at (to allow to consider the last two flips).
Notice that these two cases along with the case we saw earlier (where an indicator variable is squared) are mutually exclusive and exhaustive: we’ve accounted for all possible pairs of flips that the two indicator variables can compare. Now that we understand the cases, let’s update our equation:
Let’s first expand the probability for the first case, recalling that and share a flip. For both and to be true at the same time, flip must be different from flip , and flip must be different from flip (thus flip and must have the same outcome). This means that these three sequential flips are either or , as expressed below:
(Since summation body didn’t depend on j)
Plugging this back into Equation 4 gives us:
Now let’s consider the case where the two indicator variables do not overlap. We compute the probability that the two flips represented by each indicator variable have one and one :
We multiply the probability of each pair of flips having different outcomes by 2 to account for either HT or TH being the outcome. Let’s simplify, first by pulling out the constants that don’t depend on or :
At this point, we can split the summation:
Now let’s simplify by using the arithmetic series formula:
Let’s plug this back into equation 5:
Whew, that’s a lot of beautiful simplifying! Now we can plug this into our variance formula:
Thus for R, which represents the number of runs when there are heads and tails, we have:
You can verify on the Wikipedia page that this is indeed the formula for the variance of the number of runs. This is probably the most rigorous math I’ve ever written for a proof! Whew! I hope this is useful, and if you stumbled across this page because you were looking for a proof of the variance of the number of runs, please give me credit! :)
At this point, one can use the normal distribution to see which sequence in the homework problem I provided is the true sequence of coin flips, but I’ll leave that as an exercise for the reader!
Powered by QuickLatex
You know what? This is pretty awesome. I mean the sheer breakdown and elaboration of each step is commendable. I googled, went through the nonparametric mammoths but no one offered anything. Zilch. Finally the main paper. Have a book that did kind of obscure proof beyond my brain. And the author seemed to have omitted them. In fact l, I reached here after going through google images. Helpful is a underrated word. Amazing. Best wishes.
P.S. The moment mathematician speak or write about ‘tedious proof’ it isn’t tedious at all. The question is: can a shorter proof be found?
Thanks, the book I just studied skipped the proof. I now know why. By the way, I am not a mathematician.
I have sent a reply earlier to Roslyn, I used her math for some algorithm in currency trading. Very useful and a persistent job she made with the proof.
Mama should be really proud.
This was super helpful, thank you!
My head hurts just trying to read this! Lol Very proud!