Can anyone explain this probability paradox?

Tossing a coin until you get two heads in a row takes an average of 6 tosses.
But to get a head followed by a tail takes an average of only 4 tosses.

The following program demonstrates it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <random>
#include <chrono>

const int Reps = 1000000;

// set to false for head followed by tail
const bool TwoHeadsInARow = true;

int main() {
    unsigned seed = std::chrono::system_clock::
                    now().time_since_epoch().count();
    auto rng = std::default_random_engine(seed);
    auto dist = std::uniform_int_distribution<>(0, 1);

    int sum = 0;
    for (int i = 0; i < Reps; ++i) {
        int cnt = 0;
        bool prevWasHead = false;
        while (true) {
            ++cnt;
            bool head = dist(rng);
            if (prevWasHead && head == TwoHeadsInARow)
                break;
            prevWasHead = head;
        }
        sum += cnt;
    }
    std::cout << double(sum) / Reps << '\n';
}

Last edited on
Given a set of 3 ordered numbers, (X Y Z), each either being 0 or 1, how many possible trials have 1 followed by 0?


0 0 0 // false
0 0 1 // false
0 1 0 // true
0 1 1 // false
1 0 0 // true
1 0 1 // true
1 1 0 // true
1 1 1 // false

(4 / 8)

Given a set of 3 ordered numbers, (X Y Z), each either being 0 or 1, how many possible trials have 1 followed by a 1?


0 0 0 // false
0 0 1 // false
0 1 0 // false
0 1 1 // true
1 0 0 // false
1 0 1 // false
1 1 0 // true
1 1 1 // true

(3 / 8)

We can see there's more ways of having (1, 0) than there are with having (1, 1), at least within a set of 3 limited tosses. What would it look like if each trial had 4 numbers instead of 3 (e.g. 1 0 1 0)?

I haven't formulated the precise binomial trial math for this yet, but maybe that's a good hint?
If I figure out the binomial probability math, I'll post it.

Notice that in my second example, the output 1 1 1 is true. But hidden within this is actually 2x possibilities of 1 1, but only one of those possibilities can ever count because you stop iterating after finding the first one.

Perhaps even simpler:
There's two ways to to get the unordered set {0, 1} [rolling a 0 and then 1, or 1 and then 0], but there's only one way to get the unordered set {1, 1} [rolling a 1, and then 1 again]. If you looked at your results from an unordered point of view, I believe the chances would be 50/50 as you would expect.

Again, I know this isn't anywhere near a complete answer, but maybe it opens some doors to explore relating to binomial trials.

As my professor once said about probability... nothing about it is intuitive.
Last edited on
I think you are on to something there. It definitely is non-intuitive. I remember this conundrum:

I toss two coins and cover them up so you can't see the result. Then I peek at both and tell you that "at least one is a head". What is the probability that both are heads?

Naively, we might think, "well, we know at least one is a head, so it just depends on whether the 'other' is a head, which is 50/50".

But enumerating the possibilities where at least one is a head gives:
H H
H T
T H

And only 1 out of 3 of the possibilities is "both heads".

I got the paradox in the OP from this interesting article:
https://www.quantamagazine.org/mathematicians-discover-prime-conspiracy-20160313/
Another way to look at it is this. Look at 2 throws. The last value in my tables is the minimum number of additional throws needed to complete the challenge.


Heads followed by Tails
0	0	FALSE	2
0	1	FALSE	1
1	0	TRUE	
1	1	FALSE	1


2 Heads
0	0	FALSE	2
0	1	FALSE	1
1	0	FALSE	2
1	1	TRUE	


After 2 throws, both challenges have a 1/4 chance of being completed. But Heads/Tails has fewer (4) minimum required throws for the other 3 chances than does 2 Heads (5). this number compounds each time 2 throws are needed.

I don't remember the technical probability terminology (that was decades ago), but I think you can get the idea from this.
The probability that it takes n throws to get head followed by tail is P(n)=(n-1)/2^n. Dunno if your program can test that.

The expectation is (sum)n.P(n) and gives 4 as @dutch said (differentiate the power series for 1/(1-t) twice and set t=1/2).

Still working on the probabilities for the throws to get HH I'm afraid ...
@doug4, I see what you're saying, and clearly that must be the reason. Still very unintuitive, though. I don't think I'll ever quite wrap my head around it.

Minimum additional tosses required after 2 intial tosses:
Initial  H/T  T/H  H/H  T/T
 T  T     2    1    2    0
 T  H     1    0    1    2
 H  T     0    1    2    1
 H  H     1    2    0    2
          4    4    5    5

@lastchance, Your calc looks pretty good (not that I would know).
I think this "proves" your P(n)=(n-1)/2^n calculation for the H/T case.
It also shows simulation of the H/H case.
      H/T          H/H          Calc H/T
  2:  0.24997010   0.24968540   0.25000000
  3:  0.24996640   0.12506610   0.25000000
  4:  0.18758660   0.12509210   0.18750000
  5:  0.12521370   0.09383590   0.12500000
  6:  0.07802730   0.07825540   0.07812500
  7:  0.04681720   0.06250200   0.04687500
  8:  0.02722800   0.05079730   0.02734375
  9:  0.01565420   0.04104690   0.01562500
 10:  0.00879530   0.03317260   0.00878906
 11:  0.00489330   0.02684240   0.00488281
 12:  0.00268190   0.02166540   0.00268555
 13:  0.00145320   0.01761750   0.00146484
 14:  0.00079530   0.01421150   0.00079346
 15:  0.00042320   0.01147650   0.00042725
 16:  0.00023020   0.00930500   0.00022888
 17:  0.00012510   0.00749300   0.00012207
 18:  0.00006890   0.00609530   0.00006485
 19:  0.00003100   0.00493780   0.00003433
 20:  0.00001740   0.00398690   0.00001812
 21:  0.00001070   0.00321360   0.00000954
 22:  0.00000490   0.00260350   0.00000501
 23:  0.00000260   0.00212170   0.00000262
 24:  0.00000180   0.00171860   0.00000137
 25:  0.00000090   0.00139080   0.00000072
 26:  0.00000050   0.00111850   0.00000037
 27:  0.00000020   0.00090150   0.00000019
 28:  0.00000010   0.00073300   0.00000010
 29:  0.00000000   0.00059400   0.00000005
 30:  0.00000000   0.00047920   0.00000003

There's still more rows to the table. The longest H/H run was 77 (in that run).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
#include <iomanip>
#include <random>
#include <chrono>

const unsigned Reps = 10000000;
const unsigned CountSize = 100;

unsigned seed = std::chrono::system_clock::
                now().time_since_epoch().count();
auto rng = std::default_random_engine(seed);
auto dist = std::uniform_int_distribution<>(0, 1);

void doTrial(bool TwoHeadsInARow, unsigned counts[], unsigned& max_cnt) {
    for (unsigned i = 0; i < Reps; ++i) {
        unsigned cnt = 0;
        bool prevWasHead = false;
        while (true) {
            ++cnt;
            bool head = dist(rng);
            if (prevWasHead && head == TwoHeadsInARow)
                break;
            prevWasHead = head;
        }
        if (cnt >= CountSize)
            std::cerr << "counts overflow: cnt=" << cnt << '\n';
        else {
            ++counts[cnt];
            if (cnt > max_cnt) max_cnt = cnt;
        }
    }
}

int main() {
    using std::cout; using std::setw;

    unsigned counts[2][CountSize] {}, max_cnt = 0;
    doTrial(false, counts[0], max_cnt);
    doTrial(true, counts[1], max_cnt);

    double expect1 = 0, expect2 = 0, expect3 = 0, pow2 = 2;
    cout << std::fixed << std::setprecision(8);
    cout << "      H/T          H/H          Calc H/T\n";
    for (unsigned i = 2; i <= max_cnt; ++i) {
        double a = double(counts[0][i]) / Reps;
        double b = double(counts[1][i]) / Reps;
        double c = double(i - 1) / (pow2 *= 2);
        expect1 += i * a;
        expect2 += i * b;
        expect3 += i * c;
        cout << setw(3)  << i << ":  " << setw(10) << a << "   "
             << setw(10) << b << "   " << setw(10) << c << '\n';
    }

    cout << "Expectations:\n      ";
    cout << expect1 << "   " << expect2 << "   " << expect3 << '\n';
}
Last edited on
@Dutch,
Could you possibly try your "first end in HH" case to probability distribution:
P(n) = Fn-1/2n, for n>=2

where Fr is the rth Fibonacci number of the sequence
F1=1, F2 = 1, Fr = Fr-1+Fr-2 for r>=3

The expectation (sum)n.p(n) of this distribution is, indeed, 6.
Last edited on
@doug4, I see what you're saying, and clearly that must be the reason. Still very unintuitive, though. I don't think I'll ever quite wrap my head around it.


I was thinking about this again this morning. I think an even better way to consider it is as follows:

For both the H/T and the H/H cases, you have the same probability or average number of throws to get to the first match. (This is trivially proven because in both cases you are looking for the same result for the first match). Also in both cases, after the first match you have the same chance of succeeding on the following throw--50% T for H/T and 50%H for H/H.

The difference comes when you fail to match on the succeeding throw. When you are looking for H/T, a failure means you are now sitting on a H, so you still need to only throw a T. If you are looking for H/H, a failure means you are sitting on a T, and you need to throw the first H again. So after you attain your first match, a failure on the subsequent throw differs between still needing to match the second or having to start over.
@doug4, It's one of those things where you really need to look at it very closely for it to make sense. Thanks for the explanation.

@lastchance, The Fibonacci series?! How do you figure this shit out! Anyway, here's the result. It's a perfect fit.

      H/T          Calc H/T     H/H          Calc H/H
  2:  0.25038310   0.25000000   0.25024300   0.25000000
  3:  0.24973450   0.25000000   0.12482450   0.12500000
  4:  0.18740790   0.18750000   0.12484490   0.12500000
  5:  0.12505620   0.12500000   0.09373750   0.09375000
  6:  0.07814610   0.07812500   0.07802830   0.07812500
  7:  0.04688520   0.04687500   0.06250910   0.06250000
  8:  0.02735100   0.02734375   0.05081590   0.05078125
  9:  0.01559380   0.01562500   0.04103470   0.04101562
 10:  0.00876260   0.00878906   0.03320470   0.03320312
 11:  0.00483490   0.00488281   0.02689730   0.02685547
 12:  0.00267300   0.00268555   0.02175960   0.02172852
 13:  0.00145270   0.00146484   0.01758170   0.01757812
 14:  0.00080290   0.00079346   0.01425310   0.01422119
 15:  0.00043280   0.00042725   0.01150860   0.01150513
 16:  0.00022150   0.00022888   0.00930060   0.00930786
 17:  0.00012710   0.00012207   0.00759860   0.00753021
 18:  0.00006490   0.00006485   0.00611180   0.00609207
 19:  0.00003590   0.00003433   0.00493320   0.00492859
 20:  0.00001590   0.00001812   0.00396850   0.00398731
 21:  0.00000930   0.00000954   0.00320870   0.00322580
 22:  0.00000440   0.00000501   0.00257890   0.00260973
 23:  0.00000210   0.00000262   0.00211540   0.00211132
 24:  0.00000110   0.00000137   0.00170780   0.00170809
 25:  0.00000100   0.00000072   0.00136280   0.00138187
 26:  0.00000000   0.00000037   0.00112250   0.00111796
 27:  0.00000000   0.00000019   0.00091390   0.00090445
 28:  0.00000010   0.00000010   0.00073700   0.00073171
 29:  0.00000000   0.00000005   0.00059610   0.00059197
 30:  0.00000000   0.00000003   0.00048000   0.00047891
 31:  0.00000000   0.00000001   0.00038940   0.00038745
 32:  0.00000000   0.00000001   0.00031520   0.00031345
 33:  0.00000000   0.00000000   0.00024970   0.00025359
 34:  0.00000000   0.00000000   0.00020980   0.00020516
 35:  0.00000000   0.00000000   0.00016330   0.00016598
 36:  0.00000000   0.00000000   0.00013250   0.00013428
 37:  0.00000000   0.00000000   0.00010470   0.00010863
 38:  0.00000000   0.00000000   0.00008470   0.00008789
 39:  0.00000000   0.00000000   0.00007130   0.00007110
 40:  0.00000000   0.00000000   0.00006020   0.00005752
 41:  0.00000000   0.00000000   0.00004570   0.00004654
 42:  0.00000000   0.00000000   0.00003840   0.00003765
 43:  0.00000000   0.00000000   0.00003030   0.00003046
 44:  0.00000000   0.00000000   0.00002290   0.00002464
 45:  0.00000000   0.00000000   0.00001940   0.00001994
 46:  0.00000000   0.00000000   0.00001640   0.00001613
 47:  0.00000000   0.00000000   0.00001410   0.00001305
 48:  0.00000000   0.00000000   0.00001040   0.00001056
 49:  0.00000000   0.00000000   0.00000810   0.00000854
 50:  0.00000000   0.00000000   0.00000700   0.00000691
 ...
Expectations:
      3.99887870   4.00000000   6.00064220   5.99999355
(lines 51 to 78 elided)

(The Calc H/H expectation is off a little since only the first 78 terms were summed. Although it is even closer to 6 if I let it add up the first 92 terms, after that the fibs overflow 64 bits.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include <iomanip>
#include <random>
#include <chrono>

const unsigned Reps = 10000000;
const unsigned CountSize = 100;

unsigned seed = std::chrono::system_clock::
                now().time_since_epoch().count();
auto rng = std::default_random_engine(seed);
auto dist = std::uniform_int_distribution<>(0, 1);

void doTrial(bool TwoHeadsInARow, unsigned counts[], unsigned& max_cnt) {
    for (unsigned i = 0; i < Reps; ++i) {
        unsigned cnt = 0;
        bool prevWasHead = false;
        while (true) {
            ++cnt;
            bool head = dist(rng);
            if (prevWasHead && head == TwoHeadsInARow)
                break;
            prevWasHead = head;
        }
        if (cnt >= CountSize)
            std::cerr << "counts overflow: cnt=" << cnt << '\n';
        else {
            ++counts[cnt];
            if (cnt > max_cnt) max_cnt = cnt;
        }
    }
}

int main() {
    using std::cout; using std::setw;

    unsigned counts[2][CountSize] {}, max_cnt = 0;
    doTrial(false, counts[0], max_cnt);
    doTrial(true, counts[1], max_cnt);

    unsigned long long F[CountSize] = {0, 1};
    double expect[4] = {0}, pow2 = 2;

    cout << std::fixed << std::setprecision(8);
    cout << "      H/T          Calc H/T     H/H          Calc H/H\n";
    for (unsigned i = 2; i <= max_cnt; ++i) {
        pow2 *= 2;
        double calc[4] = {
            double(counts[0][i]) / Reps,
            double(i - 1) / pow2,
            double(counts[1][i]) / Reps,
            double(F[i - 1]) / pow2
        };
        F[i] = F[i - 2] + F[i - 1];
        expect[0] += i * calc[0];
        expect[1] += i * calc[1];
        expect[2] += i * calc[2];
        expect[3] += i * calc[3];
        cout << setw(3)  << i << ":  "
             << setw(10) << calc[0] << "   "
             << setw(10) << calc[1] << "   "
             << setw(10) << calc[2] << "   "
             << setw(10) << calc[3] << '\n';
    }

    cout << "Expectations:\n      ";
    cout << expect[0] << "   "
         << expect[1] << "   "
         << expect[2] << "   "
         << expect[3] << '\n';
}


Thank you so much lastchance, doug4, and Ganado, for giving both a (semi-!) intuitive description and a mathematical analysis of this problem. (I'm still scratching my head about how those Fibs crept in!)
I'm wondering about that Fibonacci thing, too, amazing lastchance. Never even heard the word fibonacci in a probability course. Definitely will look into that when I have free time.

Apparently this SO link sort of illuminates this:
https://math.stackexchange.com/questions/81805/how-to-show-that-this-binomial-sum-satisfies-the-fibonacci-relation

PS:
The difference comes when you fail to match on the succeeding throw. When you are looking for H/T, a failure means you are now sitting on a H, so you still need to only throw a T. If you are looking for H/H, a failure means you are sitting on a T, and you need to throw the first H again. So after you attain your first match, a failure on the subsequent throw differs between still needing to match the second or having to start over.

Really like this explanation.
Last edited on
Thanks for the testing problem and the coding, @Dutch.

@lastchance, The Fibonacci series?! How do you figure this shit out! Anyway, here's the result. It's a perfect fit.


There has to be a better way of deriving it, but if you can read my awful handwriting then enjoy:
https://imgur.com/a/BvAQZnc

Basically, probability = (number satisfying criterion) / (number of total sequences), and the denominator is 2n since you have n choices of H or T.

Then apart from the initial "edge cases" where there is only one sequence,
HH
THH
then for n>= 4 need the count
(number of n-3 length sequences with NO consecutive H)THH

The Fibonacci numbers come into that last bracketed count.

Last edited on
Firstly, that's a nice piece of work. I enjoyed studying it. Thank you very much. :-)

I had to read your explanation through a few times in order to not bother you with silly questions. But I'm still a little confused about the meaning of C(-1). Could you explicate that a little?

Here is the transcribed image. I very slightly reworded it in a couple of places and changed n to m for use with C until the end. If you want to fix it up and repost it I'll delete this version so as not to confuse anyone.

Total number of sequences of length n in H and T is 2^n.

Each probability is:

        number satisfying criterion
P(n) =  ---------------------------
           total number (= 2^n)

The criterion is that the sequence ends in HH.

Edge case n=2: single sequence HH,  P(2) = 1/2^2 = 1/4
Edge case n=3: single sequence THH, P(3) = 1/2^3 = 1/8

All others are of form:
    (n-3 values with no consecutive H's) | THH     (*)

Let C(m) = count of sequences of length m with no consecutive H's
           (the left side of (*) with m = n - 3)

Ordering by first H:
    TT ... T                     (m T's; no H's)
    HT | (C(m-2) sequences)
    THT | (C(m-3) sequences)
      .
      .
      .
    TT ... THT | (C(0) = 1)
    TT ... TTH | (C(-1) = 1)

Then
  C(m)   = 1 + C(m-2) + C(m-3) + ... + C(0) + C(-1)     m >= 2
  C(m-1) = 1          + C(m-3) + ... + C(0) + C(-1)     m >= 3

Subtract:
  C(m) - C(m-1) = C(m-2)                (for m >= 3)
therefore C(m) = C(m-1) + C(m-2)        Fibonacci type

But  C(-1) = 1, C(0) = 1, C(1) = 2, ...
and  F(1)  = 1, F(2) = 1, F(3) = 2, ...
Hence, C(m) = F(m+2)

Then from (*), P(n) = C(n-3)/2^n = F(n-1)/2^n

Checking shows that it also works for the edge cases.

Hi dutch.

C(0) and C(-1) are really just placeholders. The sequence has already finished at the bar |, and I just wrote them there to put the sum in a common form. You could have just left them out, with 1 for each of them in the sums for C(m) and C(m-1). I have borrowed your write-up to do that below. Perhaps it would have been clearer not to have included them in the first place:
Ordering by first H:
    |TT .....................TT|  (m T's; no H's)
    |HT |  (C(m-2) sequences)  |
    |THT | (C(m-3) sequences)  |
        .
        .
    |TTTTT ......THT| C(1) seq |
    |TTTTTTT................THT|
    |TTTTTTTTT...............TH|

Then
  C(m)   = 1 + C(m-2) + C(m-3) + ... + 1    + 1         m >= 2
  C(m-1) = 1          + C(m-3) + ... + 1    + 1         m >= 3


I only put C(0) and C(-1) in to see where in the Fibonacci series you were at this point. To avoid this you could have started with C(1)=2, C(2)=3:
C(1) = 2 (i.e. single-letter sequences, just H or T)
C(2) = 3 ( sequences are TT, HT, TH)
You could then match these to
F(3) = 2
F(4) = 3
The Fibonacci relation comes out straightforwardly, but the starting values are a little fiddly.

For the edge cases at the start I couldn't do anything better than check that they satisfied
the same relation.

Thanks for typesetting my awful scribble. You did a great job of putting it in a more readable form.


Given how simple the final expression is, I can't help wondering if someone can find a simpler argument to get to it ...
Last edited on
I guess the -1 can be interpreted as "subtracting" (removing) the final T at the end of the last sequence since all the others have it. And there's only one way to do that.

Good lesson in probability, which as you've demonstrated is, in this case at least, a matter of coming up with a way of systematically enumerating/counting possibilities.
Topic archived. No new replies allowed.