Normalized distribution probability

Pages: 12
So I'm trying to help some guy in another thread, and I have an idea for a solution to his problem, but it involves doing some statistical stuff that I don't know how to do.

Simply put, if I have a normalized distribution (mean=0 stddev=10), and I use C++11's RNG to generate an integer with that distribution... I have a X% chance of that number being 10.

My question is... how do I find that percentage? How do I find X?
I am not sure what you are trying to do, but picking a random number should have an equal chance to be any of the numbers in the range. If your range is 0 or 1, the probability is 50% for either. If your range is 1 - 10, the probability is 10% to get a 10, or any other number in the range. For 1 - 100 you have a 1% probability of getting the 10, or any other number. This all assuming integers of course and only generating the random number once. In other words, the probability is 1/range.

admkrk: That is true for a linear distribution, not a normalized distribution.

Normalized distributions form a bell curve. So the percentages are weighted to be more in the center than they are on the ends.

EDIT:

Actually you might be able to simulate this with a sine wave. While it may not be the same as mean/stddev, it ought to work similarly.
Last edited on
Lachlan: Aha! That's what I was looking for! Thanks!
Oh well, I am barely passing probability after all. lol
Lachlan: Yes that does the trick. Although it doesn't quite give a PERCENTAGE, but it is good enough for my purposes. Thanks again.

Testbed program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

#include <iostream>
#include <cmath>
#include <iomanip>

using namespace std;

namespace c
{
    const double e =  2.71828182845904523536;
    const double pi = 3.14159265358979323846;
    const double rad2pi = std::sqrt(2*pi);
}

double getPercentage(int val, int mean, int stddev)
{
    double dif = val - mean;
    return 1.0/stddev * std::pow(c::e,-0.5*dif*dif) / c::rad2pi;
}

int main()
{
    int mean = 10;
    int stddev = 3;
    int drawcount = 20;

    cout << "mean    : " << mean << "\n";
    cout << "stddev  : " << stddev << "\n\n";

    for(int i = 0; i < drawcount; ++i)
    {
        cout << setw(2) << i << " -> " << fixed << getPercentage(i,mean,stddev) << '\n';
    }

    cout.flush();
}

And output:


mean    : 10
stddev  : 3

 0 -> 0.000000
 1 -> 0.000000
 2 -> 0.000000
 3 -> 0.000000
 4 -> 0.000000
 5 -> 0.000000
 6 -> 0.000045
 7 -> 0.001477
 8 -> 0.017997
 9 -> 0.080657
10 -> 0.132981
11 -> 0.080657
12 -> 0.017997
13 -> 0.001477
14 -> 0.000045
15 -> 0.000000
16 -> 0.000000
17 -> 0.000000
18 -> 0.000000
19 -> 0.000000
That doesn't actually give you the probability.

The integral from a to b of the density function gives you the probability of a random variable from the distribution being between a and b ( this is a continuous distribution ).

In fact, the probability of getting exactly 10 is 0. You have to take the probability of getting between 9.5 and 10.5 or something along those lines. And that would be the integral from 9.5 to 10.5 of the density function.
Last edited on
If I'm not mistaken again, those numbers are the Z-scores, which equals a 55.17% chance to get a 10.

Edit @ htirwin, it is not continuous since he's using integers between 0 and 19. It is still an integral, but I cannot remember exactly what it is an integral of.
Last edited on
htirwin:

Yeah you're probably right, but what I have is good enough for my purposes. It's generating a reasonable bell curve based on the given mean/stddev... which is all I really want.

So I'm going to mark this as solved. Thanks everyone.
> std::pow(c::e,-0.5*dif*dif)
http://www.cplusplus.com/reference/cmath/exp/
std::exp( -0.5*dif*dif )

> And output:
¿don't you find it weird? It gets to 0 really quick.
That's because you've used \sigma = 1
1
2
// \frac{ x-\mu }{ \sigma }
double dif = (val - mean) / (double) stddev;



> it is not continuous since he's using integers between 0 and 19.
I don't see anywhere where he stated the range of the rng.
Last edited on
ne555: Excellent! Yeah that does not only seem to make it better, but actually does generate a true percentage (ie: sums to 1)

Thanks!


Also, whoops @ exp. I forgot that function existed.
Hi Disch, your solution **is not** good enough: you cannot replace the integral of a function with the function itself and expect any good results. Please don't use your work as is!

You need to tell us a few important things:
1. When you generate your number, do you generate a double with the specified parameters, and then round it off?
2. How do you round off - nearest integer?
3. The function capital Phi (cumulative distribution function - see wikipedia link from code comments) measures the chance of getting a **double** equal to or smaller than val (using normalized bell curve).
4. If you are aiming to get the number 10, and you are rounding off to nearest integer, then what you need is
Phi(10.5)-Phi(9.5). Phi(10.5)-Phi(9.5) measures the chance of getting a number less than 10.5 (all numbers that will get rounded to 10 or less) minus chance of getting a number less than 9.5 (all numbers that will get rounded to 9 or less). However you have to tell us exactly how you generate your number (if you floor it, or if you round it, the computation will change essentially).
5. The above considerations were for normalized distribution. So if you use round of a double, what you will need in the end is
1
2
3
4
int goal=10;
double whatYouNeedIfUsingRound=
getPercentageNormalDistribution( ((double) goal)+0.5, yourMean, yourDeviation)- 
getPercentageNormalDistribution( ((double) goal)-0.5, yourMean, yourDeviation);


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
namespace c
{
  const double e =  2.71828182845904523536;
  const double pi = 3.14159265358979323846;
  const double rad2pi = std::sqrt(2*pi);
  const int numIterations =100;
}

double getPercentageNormalDistribution(double val, double mean, double stddev) //all arguments should be floating point, int is no good!
{ return getNormalDistributionPhi((val-mean)/stddev);
}

double getNormalDistributionPhi(double x)
{ //reference: http://en.wikipedia.org/wiki/Normal_distribution#Cumulative_distribution_function
  //code copied from pascal from Wikipedia and translated to c++. I have not made any checks for errors.   
  //Please give the code a second look. 
  //Original pascal code from wikipedia:
  //begin
  //  sum:=x;
  //  value:=x;
  //  for i:=1 to 100 do
  //    begin
  //      value:=(value*x*x/(2*i+1));
  //      sum:=sum+value;
  //    end;
  //  result:=0.5+(sum/sqrt(2*pi))*exp(-(x*x)/2);
  //end; 
  double value=x;
  double sum=x;
  for (int i=1; i<c::numIterations; i++)
  { value*=x*x/(2*i+1);
    sum+=value;
  }
  return 0.5+(sum/std::sqrt(2*pi))*std::exp(-(x*x)/2);
}
Last edited on
Hi Disch, your solution **is not** good enough: you cannot replace the integral of a function with the function itself and expect any good results. Please don't use your work as is!


Keep in mind I'm not necessarily going for a true normalized distribution. I'm just going for some kind of controllable bell curve. The solution posted (after ne555's input) effectively accomplishes that, even if it isn't perfect.


But I appreciate the input for sure. I'll try your approach tomorrow (too late to fiddle with it tonight) and compare it to the existing output.

Thanks. =)
Hey Disch,

I made a few important corrections in my post, possibly after you went to bed if you are in the US, so you may want to take a second look.

If this is, say, game-related, sure, go ahead with a simple approximate solution (although doing things with great mathematical precision will give you a warm fuzzy feeling inside). However, if this is gambling-related, definitely go with the most accurate math you can :)
Cheers,
tition
Last edited on
I've never been any good with probability, but doesn't std::normal_distribution produce a bell curve? http://www.cplusplus.com/reference/random/normal_distribution/
Yes.

And <cmath> has the error function std::erf() (the CDF of the normal distribution in a a different form).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <cmath>
#include <iomanip>

double phi( double x, double mu, double sigma )
{ return 0.5 +  0.5 * (  std::erf( (x-mu) / ( std::sqrt(2.0) * sigma ) ) ) ; }

int main()
{
    const double mu = 10 ;
    const double sigma = 3 ;

    std::cout << "\nmu: " << mu << " sigma: " << sigma << std::fixed 
              << "\n\npercent chance of number being bwtween 9.5 and 10.5 is: " 
              << std::setprecision(8) << 100 * ( phi( 10.5, mu, sigma ) - phi( 9.5, mu, sigma ) ) << "\n\n" ;             
    
    // test harness: verify agreement with the three-sigma rule
    for( int i = 1 ; i < 4 ; ++i )
    {
        std::cout << "within +- " << i << " sigma: " << std::setprecision(8) << std::setw(12)  
                  << phi( mu + i*sigma, mu, sigma ) - phi( mu - i*sigma, mu, sigma ) << '\n' ;
    }             
                  
}

http://coliru.stacked-crooked.com/a/c3ac43dde78109bd
So I tried titon's and JLBorges latest postings as a comparison -- and I think I must have screwed up tition's code because it doesn't work properly at all. I did not bother to try and correct it because I don't care enough about it and early morning math gives me a headache.

JLBorges is comparable to Lachlan/ne555's suggested code, but not different enough to warrant a change to the slightly more complicated code, IMO. If I was super concerned about accuracy I'd consider it, but for this application Lachlan/ne555's approach is fine.

Test:
http://ideone.com/kQy7D7

output:
Lachlan/ne555 - JLBorges - tition
Last edited on
Hi Disch, my code contains a brilliant error: I wrote

 
sum+=sum; //multiplies sum by two, brilliant 


instead of
 
sum+=value; // changing this fixes the result. 


My code is now fixed:

http://ideone.com/wLP72l

JLBorges and my code produce the same results: he just used the built-in erfi function to compute phi, while I implemented phi (with an error) directly because I wasn't aware of the built-in arsenal. I vote for JLBorges' code of course (much shorter to use std::erf instead of manually re-implementing it).

You will notice that the error in your original approach was quite noticeable: you were getting 0.000067 instead of 0.000078 - this is a difference of more than 10%.

My opinion: since you already have JLBorges' code and that is a more correct approach than the original, why not use it? Even if being 10% off doesn't matter for what you are doing, still, doing it right will have that good warm feeling of having done the right thing. You may even end up bragging about it one day ("I did my research right!").
Last edited on
I'm just going for some kind of controllable bell curve.

The density function you are using is what gives the correct bell curve, it's just that a bell curve doesn't plot probabilities.

It's the area under the curve that gives actual probabilities. The density function always integrates to 1 ( which is 100% probability ), but the density can be greater. For example, a density in the form of a rectangle that is 10 high, and 1/10 wide has an area of 1.
Pages: 12