Weighted random distribution

Hi all,

I'm trying to generate the numbers 0 .. 9 in a random fashion but with different probabilities for the individual digits.

The distribution comes from all the digits of the time values from 00:00 to 23:59
9 : 264 (4.6%)
8 : 264 (4.6%)
7 : 264 (4.6%)
6 : 264 (4.6%)
5 : 504 (8.8%)
4 : 504 (8.8%)
3 : 564 (10%)
2 : 804 (14%)
1 : 1164 (20%)
0 : 1164 (20%)

The PRNG is supposed to generate random numbers with 20% zeros, 20% ones, 14% twos and so on.

My current solution is pretty naive:
1
2
3
4
5
6
7
8
9
10
11
12
int v = std::rand() % 1000;

     if(v < 200) return 0;
else if(v < 400) return 1;
else if(v < 540) return 2;
else if(v < 640) return 3;
else if(v < 728) return 4;
else if(v < 816) return 5;
else if(v < 862) return 6;
else if(v < 908) return 7;
else if(v < 954) return 8;
else             return 9;


I'm not the big math geek, but I'm sure there's a neater solution. (no C++20 please)

Also, please don't frown upon std::rand(), that's just for testing. I'm going to use a proper RNG later.

And no, it's not an assignment, it's for a nifty screensaver.
Maybe? Use rand(0 .. 9) inclusive

1. Set up an empty array of 1000 (say) integers, fill each with -1
2. Using the number 9 (say), 'fire' 46 of them at randomly selected elements
3. If the element is empty put 9 there otherwise put it in the next available spot
5. Do the same for 46 8's etc blah blah

You now have a sequence of 1000 pseudo-random (rand) 0 .. 9's
Man, that was quick. Awesome. Thanks.
don't frown upon std::rand(), that's just for testing. I'm going to use a proper RNG later.

Why not start your testing with "a proper RNG" from the beginning instead of using using random number functions that are less than adequate?
your original is fine, apart from needing a good uniform generator to pick the number for the conditions.
if you need to do an unholy number of them, you can mess with alternates.
one way is to approximate the curve, get an equation for it, generate an x and compute y=f(x) to get your answer. This seems like a lot of trouble. It can be approximate if its a fun screen saver, no one will notice a .001 or something % miss in your random data. Its probably easier to approximate if you move it around so you have a near bell curve on the %s
again, it does not need to be ultra precise here ... don't overthink it.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <ctime>

#include <iomanip>

int main()
{
    const int LIMIT{500};// NEED AT LEAST ~500 - ?ROUNDING
    int array[LIMIT];
    
    for(int i = 0; i < LIMIT; i++)
    array[i] = -1;
    
    double distribution[]
    { 0.20, 0.20, 0.14, 0.10, 0.088, 0.088, 0.046, 0.046, 0.046, 0.046 };
    
    srand( time(nullptr) );
    int count{0};
    int cell{0};
    for(int number = 0; number < 10; number++)
    {
        for(int no_shots = 0; no_shots < LIMIT * distribution[number]; no_shots++)
        {
            cell = rand() % LIMIT;
            
            while( array[cell] >= 0 )
            {
                cell++;
                cell = cell % LIMIT;
            }
            array[cell] = number;
            
            count++;
        }
    }
    
    for(auto i:array)
        std::cout << std::setw(4) << std::left << i ;
    
    std::cout << "\nCOUNT = " << count << '\n';
    
    return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream>
#include <ctime>

#include <iomanip>

int main()
{
    const int LIMIT{1000};// NEED AT LEAST ~500 - ?ROUNDING
    int array[LIMIT];
    
    for(int i = 0; i < LIMIT; i++)
    array[i] = -1;
    
    double distribution[]
    { 0.20, 0.20, 0.14, 0.10, 0.088, 0.088, 0.046, 0.046, 0.046, 0.046 };
    
    int check_count[10]{0};
    
    srand( time(nullptr) );
    int count{0};
    int cell{0};
    for(int number = 0; number < 10; number++)
    {
        for(int no_shots = 0; no_shots < LIMIT * distribution[number]; no_shots++)
        {
            cell = rand() % LIMIT;
            
            while( array[cell] >= 0 )
            {
                cell++;
                cell = cell % LIMIT;
            }
            array[cell] = number;
            check_count[number]++;
            
            count++;
        }
    }
    
    int col_count{0};
    for(auto i:array)
    {
        std::cout << std::setw(4) << std::left << i ;
        
        col_count++;
        if(col_count % 20 == 0)
            std::cout << '\n';
    }
    
    std::cout << "\nCOUNT = " << count << '\n';
    
    for (int i = 0; i < 10; i++)
    {
        std::cout  << i << std::setw(5) << std::right <<check_count[i] << '\n';
    }
    
    return 0;
}



2   2   4   4   1   0   3   3   1   1   5   0   2   5   5   1   5   5   4   6   
0   4   0   1   1   1   1   1   2   3   0   2   5   1   0   2   3   5   1   5   
1   2   6   1   6   6   1   7   3   2   5   7   7   4   0   1   2   1   2   1   
3   1   4   0   0   5   6   4   6   6   2   3   7   2   7   2   4   0   1   0   
7   7   2   4   0   1   2   3   5   3   1   1   2   6   1   2   2   1   5   2   
8   2   4   1   1   1   2   2   8   8   0   5   6   4   0   1   0   4   0   2   
6   6   7   2   5   8   8   2   8   4   9   2   0   0   0   1   6   7   5   6   
1   0   0   7   7   0   0   4   4   5   6   1   3   5   9   9   5   0   0   1   
2   3   4   0   1   0   3   9   4   1   3   4   0   0   2   1   2   3   5   5   
3   0   0   5   4   5   1   5   2   0   0   3   3   0   0   3   1   1   1   0   
5   4   5   0   0   1   2   3   2   2   3   5   5   1   0   3   1   2   0   1   
3   4   4   5   0   4   2   4   0   0   0   1   2   3   0   4   4   0   0   4   
1   0   2   0   3   4   4   5   5   0   1   5   0   4   5   6   6   2   7   1   
0   2   3   7   1   0   1   3   4   2   6   0   1   3   3   3   0   2   0   3   
6   7   0   1   1   1   2   4   5   2   7   3   1   1   7   0   1   7   2   7   
0   1   2   2   4   4   0   3   6   8   8   8   0   8   8   2   4   0   1   2   
3   2   0   0   2   1   0   3   2   0   0   0   1   0   1   1   2   3   2   4   
5   1   0   1   2   2   0   0   4   5   0   1   1   1   6   0   6   0   5   6   
2   5   5   7   1   3   7   2   3   0   4   5   2   8   8   1   9   5   9   9   
0   0   1   5   5   4   4   5   0   3   9   4   2   0   9   9   9   9   9   4   
5   9   0   1   5   9   1   1   2   2   6   0   2   0   1   1   1   5   4   7   
1   6   3   2   0   1   8   8   2   2   5   8   8   3   9   9   3   2   2   4   
3   9   3   3   2   9   5   0   0   1   1   7   9   5   1   0   1   5   3   1   
0   4   0   1   1   2   7   3   9   3   2   9   0   2   6   1   9   9   9   0   
9   0   1   1   5   7   8   0   1   2   3   1   2   1   3   2   1   3   1   4   
0   0   0   0   1   7   7   5   3   1   0   2   1   1   3   0   1   2   3   4   
4   2   2   5   6   2   3   4   1   8   3   9   8   9   1   5   9   0   0   0   
1   1   2   0   2   3   5   3   0   3   4   4   4   0   0   1   2   4   5   2   
1   4   5   6   0   1   3   2   6   0   7   9   0   9   0   2   9   3   1   1   
5   7   9   4   0   2   1   2   3   2   1   0   1   3   4   4   0   2   4   1   
0   0   0   4   5   2   5   6   6   2   1   2   5   6   4   7   8   8   9   2   
9   0   0   1   4   6   2   0   0   0   0   6   1   9   0   1   2   9   1   0   
0   1   2   0   3   4   8   8   8   9   9   1   7   5   9   0   9   7   1   2   
2   5   1   1   0   0   1   1   1   7   0   3   3   3   0   3   1   3   1   5   
6   1   7   0   1   1   1   1   1   1   0   2   3   4   5   2   5   7   2   8   
5   0   2   1   4   5   6   7   0   6   7   4   8   1   2   8   4   1   2   2   
7   0   0   0   1   2   6   7   1   0   4   7   2   0   1   0   1   3   2   3   
6   1   1   1   2   1   0   2   7   4   4   5   7   1   7   3   0   0   3   1   
1   1   8   1   3   3   0   2   0   2   4   3   5   1   1   5   8   0   0   1   
3   5   8   8   2   1   0   0   0   1   2   9   0   8   0   1   4   4   1   2   
1   1   1   4   5   0   6   3   7   0   1   8   8   0   1   2   0   3   5   0   
1   4   5   6   3   0   2   4   7   4   4   0   0   6   8   8   0   9   1   0   
1   1   2   3   1   2   5   1   0   2   5   0   0   0   1   0   2   0   2   4   
0   1   0   2   2   5   0   6   2   0   0   0   1   0   1   2   2   2   3   0   
0   1   2   4   5   5   5   6   1   8   0   0   2   2   1   3   6   8   1   8   
8   5   8   9   1   0   0   2   5   4   7   8   3   8   1   3   1   1   1   4   
1   7   8   8   0   1   9   2   9   1   5   3   1   3   1   2   3   4   0   4   
0   3   2   5   2   3   3   0   5   5   6   1   0   1   1   3   3   4   4   1   
0   1   0   1   2   2   3   3   3   5   0   6   0   0   1   1   3   3   0   0   
3   4   4   0   4   0   1   1   0   1   1   0   1   0   1   0   2   2   3   1   

COUNT = 1000
0  200
1  200
2  140
3  100
4   88
5   88
6   46
7   46
8   46
9   46
Program ended with exit code: 0
@lastchance

Exactly as indicated in your reference the expectation of drawing goodness of fit from this is irrelevant because @OP requirement has determined nothing more than a means of jumbling pre-determined number occurences. The distibution check is simply an accounting exercise to make sure nothing has been missed.

No conclusions or meaning in relation to any chi-squared or any other least squares approach can or should be drawn.

Similar non-meaning would be arrived at for any predetermined frequency distribution. All it will show, if anything at all, is random might be random.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
#include <iostream>
#include <ctime>

#include <iomanip>

#include <random> // PART 2

int main()
{
    // ALTERNATIVE 1
    const int LIMIT{16000};// NEED AT LEAST ~500 - ?ROUNDING
    int array[LIMIT];
    
    for(int i = 0; i < LIMIT; i++)
    array[i] = -1;
    
    double distribution[]
    { 0.200, 0.200, 0.140, 0.100, 0.088, 0.088, 0.046, 0.046, 0.046, 0.046 };
    
    int check_count[10]{0};
    
    srand( time(nullptr) );
    int count{0};
    int cell{0};
    for(int number = 0; number < 10; number++)
    {
        for(int no_shots = 0; no_shots < LIMIT * distribution[number]; no_shots++)
        {
            cell = rand() % LIMIT;
            
            while( array[cell] >= 0 )
            {
                cell++;
                cell = cell % LIMIT;
            }
            array[cell] = number;
            check_count[number]++;
            
            count++;
        }
    }
    
    int col_count{0};
    for(auto i:array)
    {
        std::cout << std::setw(4) << std::left << i ;
        
        col_count++;
        if(col_count % 20 == 0)
            std::cout << '\n';
    }
    
    std::cout << "\nCOUNT = " << count << '\n';
    
    for (int i = 0; i < 10; i++)
    {
        std::cout  << i << std::setw(5) << std::right <<check_count[i] << '\n';
    }
    
    // ALTERNATIVE 2
    const int nrolls = LIMIT; // number of experiments
    const int nstars = 100;   // maximum number of stars to distribute
    
    std::default_random_engine generator;
    std::discrete_distribution<int> distribution_2
    { 200, 200, 140, 100, 88, 88, 46, 46, 46, 46 };
    
    
    int p[10]={};
    
    for (int i=0; i<nrolls; ++i) {
        int number = distribution_2(generator);
        ++p[number];
    }
    
    std::cout << "a discrete_distribution:" << std::endl;
    for (int i=0; i<10; ++i)
    std::cout << i << ": " << p[i] /*std::string(p[i]*nstars/nrolls,'*')*/ << std::endl;
    
    
    
    return 0;
}


9   9   9   9   0   2   9   4   1   2   8   9   9   9   9   3   9   0   4   0   
0   2   2   0   2   4   1   1   1   1   0   1   2   0   2   2   0   1   4   2   
0   4   1   1   1   2   0   2   0   1   1   4   1   5   2   3   6   6   0   2   
7   7   0   0   1   2   0   2   2   1   3   5   2   2   4   6   0   0   2   3   

    ...
    ...

3   4   5   1   0   1   5   1   2   2   3   3   7   3   2   4   0   7   8   8   
0   9   9   0   9   0   9   9   0   0   0   2   0   2   5   1   5   3   4   1   
4   4   5   0   7   7   2   2   8   0   3   5   1   5   5   3   0   0   1   2   
3   7   1   8   0   5   2   8   2   8   8   9   0   0   2   5   1   1   4   9   

COUNT = 16000
0 3200
1 3200
2 2240
3 1600
4 1408
5 1408
6  736
7  736
8  736
9  736
a discrete_distribution:
0: 3183
1: 3164
2: 2177
3: 1584
4: 1405
5: 1456
6: 779
7: 757
8: 737
9: 758
Program ended with exit code: 0
Last edited on
@againtry

Thanks for your ideas. It's a bit heavy for my purpose because it's just used to initialize an array once when the program (screensaver) starts.
I had a solution like this in mind, but it doesn't need to be a general implementation.

@Furry Guy
I used std::rand() in the past so many times, it just rolls off my hands.
For testing I don't even seed the thing.
Using classes from <random> hasn't made it into muscle memory, yet.

Using classes from <random> hasn't made it into muscle memory, yet.

I have this problem as well, so I wrapped <random> into a class that behaves exactly like rand() so I can have it both ways (uses random, but just as easy as rand). Its actually easier, because I can set the range, so myrand() replaces the "rand()%x + y" since myrand() has the range built into it.

Last edited on
I adapted the simple random toolkit idea from this paper: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3551.pdf

Added a couple of ideas and this was the result, a stand-alone header-only library (random_toolkit.hpp). Not need for class wrapping, just light namespace and function layering over what C++ provides:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/* A simple toolkit to help beginners make using <random> library an easier task */

// shamelessly stolen and adapted from a C++ working paper: WG21 N3551
// http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3551.pdf

#ifndef __RANDOM_TOOLKIT_HPP__
#define __RANDOM_TOOLKIT_HPP__

#include <chrono>
#include <random>
#include <stdexcept>

namespace rtk
{
   static bool seeded = false;

   inline std::default_random_engine& urng()
   {
      static std::default_random_engine URNG { };

      return URNG;
   }

   inline void srand(bool FORCE_SEED = false)
   {
      static const std::seed_seq::result_type seeds[] { std::random_device {}(),
                                                        std::seed_seq::result_type(std::chrono::system_clock::now().time_since_epoch().count()) };
      static std::seed_seq sseq(std::begin(seeds), std::end(seeds));

      // the URNG can't be reseeded unless forced
      if (!seeded || FORCE_SEED)
      {
         urng().seed(sseq);

         seeded = true;
      }
   }

   inline void srand(unsigned seed, bool FORCE_SEED = false)
   {
      // the URNG can't be reseeded unless forced
      if (!seeded || FORCE_SEED)
      {
         urng().seed(seed);

         seeded = true;
      }
   }

   // two function overloads to obtain uniform distribution ints and doubles
   inline int rand(int from, int to)
   {
      static std::uniform_int_distribution<> dist { };

      if (from > to) { throw std::invalid_argument("bad int params"); }

      return dist(urng(), decltype(dist)::param_type { from, to });
   }

   inline double rand(double from, double to)
   {
      static std::uniform_real_distribution<> dist { };

      if (from > to) { throw std::invalid_argument("bad double params"); }

      return dist(urng(), decltype(dist)::param_type { from, to });
   }

   // function for rolling dice, and checking if the # of pips is nonstandard
   inline int roll_die(int pips)
   {
      //check to see if the number of die pips is less than 2
      if (pips < 2)
      {
         return 0;
      }

      return rand(1, pips);
   }
}

#endif 

A somewhat involved test chassis for trying out the various features:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <array>
#include <iostream>
#include <numeric>
#include "random_toolkit.hpp"

int main()
{
   // "create" a random engine and randomize it
   rtk::srand();

   using card = unsigned short;

   // manufacture a deck of cards:
   std::array<card, 52> deck;

   // create the cards, 0 (zero) to 51
   std::iota(deck.begin(), deck.end(), 0);

   // lambdas to display the card in text representation:
   auto rank = [] (card c) { return "AKQJT98765432"[c % 13]; };
   auto suit = [] (card c) { return "SHDC"[c / 13]; };

   card count { };

   for (card c : deck)
   {
      std::cout << rank(c) << suit(c) << ' ';
      count++;

      if (0 == (count % 13)) { std::cout << '\n'; }
   }
   std::cout << '\n';

   // shuffle the deck:
   std::shuffle(deck.begin(), deck.end(), rtk::urng());

   count = 0;

   for (card c : deck)
   {
      std::cout << rank(c) << suit(c) << ' ';
      count++;

      if (0 == (count % 13)) { std::cout << '\n'; }
   }
   std::cout << '\n';

   for (unsigned loop { }; loop < 50; loop++)
   {
      std::cout << rtk::roll_die(6) << ' ';

      if ((loop + 1) % 20 == 0) { std::cout << '\n'; }
   }
   std::cout << "\n\n";

   // lambda to "flip a coin," returning a text representation of coin side
   auto flip_coin = [] () { return (rtk::rand(0, 1) ? "Heads" : "Tails"); };

   for (size_t loop { }; loop < 25; loop++)
   {
      std::cout << flip_coin() << '\t';

      if ((loop + 1) % 8 == 0) { std::cout << '\n'; }
   }
   std::cout << '\n';

   std::cout << "\nLet's see if we can have a non-standard die.....\nA die with 1 side: ";
   std::cout << rtk::roll_die(1) << '\n';

   std::cout << "A die with zero sides: ";
   std::cout << rtk::roll_die(0) << '\n';

   std::cout << "A die with negative sides: ";
   std::cout << rtk::roll_die(-6) << "\n\n";

   // let's try to create a bad distribution
   std::cout << "Creating a bad random distribution, it should be be\n";
   std::cout << "encapsulated within a try/catch block.  Unless you want to crash.\n";

   try
   {
      auto test { rtk::rand(5, 2) };
   }

   catch (const std::exception& e)
   {
      std::cout << "\n>>> A standard exception was caught, with message '" << e.what() << "'\n";
   }
}

Probably not the prettiest bits of code, but so far nothing majorly bad has happened using it. It might not be large company production code worthy, but it works for my needs being a light wrap-around.

I also get better sequences of random numbers, better than the C library can provide. Even the C standard recommends not using rand() if there are alternatives available.
https://web.archive.org/web/20180123103235/http://cpp.indi.frih.net/blog/2014/12/the-bell-has-tolled-for-rand/
the reason I did a class was to allow multiple instances. I found that useful, but in general, that may not be necessary. That ^^ is nicely done.
Last edited on
A class based approach to encapsulating <random>/<chrono> is a valid and good choice as well. I might have gone that route if I hadn't seen that C++ paper.

Many of the additions I made to the proposed "simple tool-kit" were ones I had seen here at CPlusPlus in other random number threads. Others were error-checking ideas/enhancements I thought were needed. Too many newbies like to use srand() multiple times, etc.

I stood on the shoulders of experts to mangle up that header-only file. I don't take credit for the idea. If others find it useful, thank them. If there are problems with the implementation, then that is on me.
Registered users can post here. Sign in or register to post.