Birthday Paradox coding

Hi,

I currently teaching myself C++, and am building a simple code that given a set class number of people, will work out the probability of at least two of those people sharing a birthday, also known as the birthday problem/ paradox. I have done this problem analytically, and thought it would be a cool program to begin learning c++.

It works fine, however due to the size of the numbers used to calculate the probabilities, c++ does not seem to recognise 365! as it's too large of a number. Is there an alternative form of this, or is there a way to allow the program to handle large numbers?

To work out the factorial of 365, I have use a for loop to do this, which just multiples the current number by i at each iteration. Is there a more efficient way to doing this?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include "stdafx.h"
#include <iostream>
using namespace std;

int fact(int a_1);
int fact(int b_1, int b_2);
float paradox(int x, int y, int z);

int main()
{
	int a = 1;
	int A = fact(a);

	int b = 1;
	int N;

	cout << "On the planet of Zog, there are only 12 days in a year..." << endl;
	cout << "Choose a class size: ";
	cin >> N;

	int B = fact(b, N);

	float C = paradox(A, B, N);

	cout << endl << "The probability of a class full of " << N;
        cout << "people having at least one shared birthday is: " << C << "%" << endl;

    return 0;
}

int fact(int a_1) {
	a_1 = 1;
	
	for (int i = 1; i <= 12; i++) {
		a_1 = a_1 * i;
	}

	return a_1;
}

int fact(int b_1, int b_2) {
	b_1 = 1;

	for (int i = 1; i <= 12 - b_2; i++) {
		b_1 = b_1 * i;
	}

	return b_1;
}

float paradox(int x, int y, int z) {
	float birthday = (1 - ((x) / (pow(12, z)*(y))));
	birthday *= 100;
	return birthday;
}

Last edited on

c++ does not seem to recognise 365!


Hi,

Look at the range of values section - factorials get large quickly.

http://en.cppreference.com/w/cpp/language/types
Hi,

Thank you for your response. Yep, I can see the issue there. Currently, there does not seem a way to display a number to 778 figures. :p I guess in my program, a year will have to remain considerably less than 365 days. :p
Currently, there does not seem a way to display a number to 778 figures.


Not with the built-in C++ types, but a Google search is revealing :+)


http://stackoverflow.com/questions/15400031/c-what-variable-type-for-extremely-big-integer-numbers#15400100

boost is a very useful library, lots of really good stuff in there :+)
Last edited on
Is there any benefit in displaying 778 figures? - if it was me, I would use the boost multi-precision library with a floating point type (so one can get the exponent) , and just display 6 significant figures.

Maybe you just meant having a type that could handle a number that large?
Or, just take a different approach to the problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>

int main()
{
    auto year_length = 365.00;

    auto group_size = 2u;

    for (auto j = 0; j < 40; ++j)
    {
        auto percent_all_different = 1.0;

        for (auto i = 0u; i < group_size; ++i)
            percent_all_different *= (1 - i / year_length);


        std::cout << "The probability that a group of size ";
        std::cout << group_size++ << " has at least one birthday in common is ";
        std::cout << (1 - percent_all_different) * 100 << "%\n";
    }
}
Thank you all for your help. I will indeed have a look a the boost library, but as I am relatively new to c++ and aim to simplify my project, I would rather not add in extra library at this stage. I will take cire's advice and re-evaluate the code. Though I don't really like the way the code above outputs the problem, but it has given me an idea of what I can do, which if works will bypass having to use the factorial.
Last edited on
It works!! :D

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Birthday Paradox v.2.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <iostream>
using namespace std;

float probNot(float a);
float prob(float b);

int main()
{
	float N;
	cout << "Choose a class size: ";
	cin >> N;

	float A = probNot(N);
	int B = prob(A);

	cout << endl << "Probability of at least 2 people 
        sharing a birthday in a class of " << N << " is: " << B << "%" << endl;

    return 0;
}

float probNot(float a)
{
	float prob_not = 1;
	for (float i = 365; i>=(365-(a-1)); i--) {
		prob_not *= (i/(365));
	}

	return prob_not;
}

float prob(float b)
{
	float probability = (1 - b) * 100;
	return probability;
}
Last edited on
Hi,

One more thing - prefer to use double rather than float, the precision of float is easily exceeded, that is why double is the default for literals. It might not make any difference to this particular problem, but it is a good habit to get into. float can do 6 or 7 significant figures, while double can do 15 or 16. One might use float if a library required it - typically for a graphics library. Otherwise if you have billions of them and rquire extra performance because of that - but then there are a bunch of other considerations in that case.

Just noticed you are using float in the for loop on line 29. Don't do that - floating point numbers are not represented exactly, so this can cause off by 1 errors. Always use integral types in for loops. For example variable i 365.0 (as a double) could actually be 365.000000000003 , so when it reaches what you think is zero, it is still 0.000000000003 meaning the end condition of the for loop is still true, so the loop executes 1 more time. On a different implementation variable i could start out as 364.999999999997, so the for loop would execute the correct number of times - but it means the code now has different behaviour - which is bad.

You could use a while loop instead of a for loop.
Last edited on
TheIdeasMan wrote:
One might use float if a library required it - typically for a graphics library. Otherwise if you have billions of them and rquire extra performance because of that - but then there are a bunch of other considerations in that case.

That's good advice generally, but it's worth noting that for practical purposes we can assume "small" integer values that are double (or float) literals are exactly representable.

http://stackoverflow.com/questions/759201/representing-integers-in-doubles

http://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e
@ cire Ok. You are of course correct. Something must have changed somewhere along the line, I was sure that in the past the number 2.0 didn't come out exactly, and that was due to the mantissa being calculated with binary fractions. Now it seems that if the exponent is zero, the mantissa is basically just an integer representation? Maybe the IEEE spec changed at some point? I guess I have always stuck to the tried and true methods.


Line 29 has variable a which is also float, but should have been unsigned int, so the only line which needed a floating point is line 30.
I initially had an 'int i;' in the for loop, but at every iteration, the probability was always 100%?

I do not understand why this was happening, but swapping it to a float or a double seemed to "fix" the issue, but of course for the reason you stated, it rendered an extra iteration due the condition, hence I had to use the condition i>=(365-(a-1)) instead of i>=(365-a). This is probably bad practice, but I don't really understand the full workings of int's and float's to explain this.

EDIT

Could this may be due to the "prob_not" variable? Does setting i as int somehow set "prob_not" as an int, which will always be less than 1? Hence going to 0, thus the probability will always be 100%.

Last edited on
Hi,

Maybe you should study cire's code closely. It is really quite simple - there is no overly tricky math. Perhaps the auto keyword is making the types more difficult to understand? auto in this context deduces the type from the declaration, I have annotated cire's code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>

int main()
{
    auto year_length = 365.00;  // double because of digits after the decimal point

    auto group_size = 2u; // unsigned int - the u in the literal

    for (auto j = 0; j < 40; ++j)  // int
    {
        auto percent_all_different = 1.0; // double

        for (auto i = 0u; i < group_size; ++i)   // unsigned int - the u in the literal
            percent_all_different *= (1 - i / year_length);


        std::cout << "The probability that a group of size ";
        std::cout << group_size++ << " has at least one birthday in common is ";
        std::cout << (1 - percent_all_different) * 100 << "%\n";
    }
}


About int and double: An int does not have a fraction part, a double is a number with a decimal fraction and possibly an exponent. So there are some calculations that can be done with just an int, and others where double is required.

I will mention your code just to point out some things about how the types work, but you should do something near to what cire has. In your code a double type is necessary because of the division. Integer division is truncated because it can't have a fraction part. In this code if variable i was an int:

prob_not *= (i/(365));

Then one can just change the 365 to 365.0. Because of the literal 365.0 is a double, and the type promotion rules, this makes the result a double.

prob_not *= (i/(365.0));

So the variables in the for loop could and should be integral. But there is nothing stopping one from converting variables to a double type inside the loop body.


Another really good thing to learn is the naming of your variables - make them meaningful and don't abbreviate them. Instead of N I would have used ClassSize. People have different preferences for things like upper or lower_case with underscore, PascalCase, or camelCase.Good names can make your code self documenting - the code should read like a story of what is happening. Here is a comical example of what I mean:

http://www.cplusplus.com/forum/lounge/176684/#msg872881

I like to use the same variable name in the function call and in the function definition - it reinforces the idea that they are the same thing.

With doubles, I always put numbers before and after the decimal point - it reinforces the idea that it is a double, do this:

1
2
double a = 1.0;
double b  = 0.3;


Not this:

1
2
double a = 1;
double b  = .3;


More modern with auto and braces:

1
2
auto a {1.0};
auto b {0.3};


Last edited on
Something must have changed somewhere along the line, I was sure that in the past the number 2.0 didn't come out exactly, and that was due to the mantissa being calculated with binary fractions


If it's the result of calculation, you shouldn't count on it being exact. On the other hand, if it's a literal there's no possibility of a rounding error. You probably didn't bother differentiating between those cases, as there aren't many practical reasons for doing so.
Hey guys,

Again thanks for your help. I have taken your advice, and have fully got my code to work! I noticed that my code took an extra iteration of the loop because I foolishly set it to (going from 365 to 355 for a class size of 10, which counts both 365 AND 355).

I'll also rename the variables in my code to make it more readable. I kind of assumed N is always used for size (well for me at least), well coming from a Physics background, we used N a lot for "number of".

Thank's for annotating the code to, Cire's seems like a much easier way to code the problem.
cire wrote:
You probably didn't bother differentiating between those cases, as there aren't many practical reasons for doing so.


I meant that I had a different understanding of how a decimal number is converted to floating point. This is how I thought that it occurred - basically that is. For example, take the number 750 - first it is converted to 0.75; then to binary fraction 0.11 ; then the exponent is adjusted so that it means 750 and not 75 for example. The point was that numbers exist that don't resolve exactly using binary fractions with a limit. But I gather this is not what happens for a whole number that is sufficiently small enough: the mantissa is stored like an int with an exponent of zero - which makes a lot more sense to me :+) And I was wondering whether this methodology had changed at some point, or maybe it wasn't explained correctly to me all those years ago (1990), or I was discombobulated from the start :+D

@moodydog

..... well coming from a Physics background, we used N a lot for "number of".


I should have said that there are some valid exceptions, namely if the variable names match the mathematical documentation. ( It is a good idea to put a link to documentation as a comment, even copy/paste relevant parts of the documentation as comments. ) So, not many would have a problem with factorial(n) as one example. In a similar fashion, things like x = r * cos(a); But even then, some might argue that XDelta = Radius * cos(Angle); is better. I guess no one really wants to see a whole page of code with lots of single char variable names. Sometimes poor naming can lead to confusion while the code is being written.

The big thing to remember is that you might not be the only one to be looking at the code. Many a time there have been stories of people confused by their own code only a few months after they had written it. Also consider in the work environment, where a poor maintenance coder has to comprehend something written 10 years earlier by a coder they will never meet.

Good Luck, don't be afraid to ask questions :+)
Topic archived. No new replies allowed.