Rand Question

Pages: 12
lol, did u even bother to test it??

Your reasoning is fundamentally flawed, because if rand() gave such predictable results, the distribution should be the least of your concerns.

I believe I m being misunderstood :/ I don't intend to make rand() give predictable results. What I want is to make it so that the fact that the distribution is uniform becomes apparent within only few calls. What I wrote does exactly that. Anyway I just came up with another way to do it. Let's say that we want a number between 1 and n. We could make rand() pick a number from a predefined sequence where we'd put m permutations of the multiset {1,...,1,2,...,2,3,...3,n,...,n}, where 1 is contained k times, 2 is contained k times etc. Let's say, for example, that you want to draw a number from 1 to 5 and take m=9 and k=3. A sequence where rand() could draw from could be:
{1,1,1,2,2,2,3,3,3,4,4,4,5,5,5, 1,2,3,4,5,1,1,2,2,3,3,4,4,5,5, 1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,
1,3,5,2,4,1,2,3,4,5,1,3,5,2,4, 1,1,2,3,3,4,5,5,1,2,2,3,4,4,5, 5,4,1,2,3,5,4,3,2,1,2,4,1,3,5,
1,2,2,3,4,4,5,1,1,2,3,3,4,5,5, 1,5,2,4,3,5,1,2,4,3,1,2,3,5,4, 5,5,1,2,3,4,4,2,3,4,3,5,2,1,1}
What do you think about that?
What I want is to make it so that the fact that the distribution is uniform becomes apparent within only few calls.
But the distribution is uniform. That's a known property of LCGs.

I just re-read you previous post, by the way, and there's no such thing as distribution in time.
The distribution is a measure of the relative probability of the possible outputs given n samples. Both "010101" and "000111" have the same distributions. Their patterns are an unfortunate effect of the input being deterministic.
Your idea of "distribution in time" appears to be the average difference between Xn and Xn+1, relative to the size of the output space, which is not at all useful, since a uniform generator will produce output with a distribution in time of 0.5 (since outputting 0 and then 2^32-1 is as likely as outputting 0 and then 0. Or at least it's supposed to be).
Your example failed because you assumed that the generator would produce output with a distribution in time of 0.1.

What do you think about that?
I think you have a problem with producing overcomplicated solutions.
Ok, I decided to take your advice and use the LCG. I modified my program a bit so that the generator uses 5 LCGs and the value returned from each call to my rand() is chosen with the std::rand() from one of these 5 values. Does this make the numbers produced more random??? Is it now safer to use rand()%n than it would be if we simply used the std::rand()???

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <iostream>
#include <cstdlib>
using namespace std;

class RNG
{
      public:
      RNG(unsigned int sp=5u, unsigned int bp=7u){std::srand(time(0));init(sp,bp);}
      ~RNG(){}
      
      unsigned int rand()
      {
            int i;
            unsigned int ret;
            
            for (i=0; i<5; i++)
            {
                seed[i]=(alpha[i]*seed[i]+small_prime)%big_prime;
            }
            
            return seed[std::rand()%5];
      }
      
      unsigned int srand(unsigned int s)
      {
               int i;
               for (i=0; i<5; i++) seed[i]=s;
               
               return rand();
      }
      
      void init(unsigned int sp, unsigned int bp)
      {
           small_prime=sp;
           big_prime=bp;
           
           int i;
           for (i=0; i<5; i++) seed[i]=0;
      }
      
      private:
      unsigned int small_prime;
      unsigned int big_prime;

      unsigned int seed[5];
      static const unsigned int alpha[5];
};

const unsigned int RNG::alpha[5]={2,3,5,7,11};

int main()
{
    RNG rng;
    
    int i;
    unsigned int p, bp,n;
    
    unsigned int *frequency;
    unsigned int rnd;
    
    while (true)
    {
        system("cls");  
        cout << "enter a small prime, a big prime and n (for rand()%n operation): ";
        cin >> p >> bp >> n;
        
        rng.init(p,bp);
        frequency=new unsigned int[n];
        memset(frequency,0,n*sizeof(int));
        
        cout << "{ ";
        
        for (i=0; i<bp; i++)
        {
            rnd=rng.rand()%n;
            cout << rnd << ' ';
            frequency[rnd]++;
        }
        cout << '}' << endl;
    
        for (i=0; i<n; i++)
        {
            cout << "frequency of " << i << ": " << frequency[i] << endl;
        }
    
        delete[] frequency;
    
        cout << "do it again? (0->no, 1->yes) ";
        cin>>i;
        if (i==0) break;    
    }
        
    system("pause");
    return 0;
}
Last edited on
Is it now safer to use rand()%n than it would be if we simply used the std::rand()???
Safer in which sense? I don't think I understand the "problem" you're trying to solve.
The only problem with rand()%n is that it's biased to the lower end when n isn't a power of two, and that already has a partial solution.
The problem with rand()%x is that the lower values are slightly more likely when x is not a power of two. For example, for x=6, the distribution would be 0, 1, 2, 3, 4, 5, 0, 1.

I am a bit confused here. Does this problem occur when x is not a power of two...
or when RAND_MAX+1 is not divisible by x???

suppose your RAND_MAX is 14, so your sequence has 15 numbers
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}

the result modulo 5 is:
{0,1,2,3,4,0,1,2,3,4,0,1,2,3,4}

as you see we have no problem.

same with 3:
{0,1,2,0,1,2,0,1,2,0,1,2,0,1,2}

And why would you even bother with this if your RAND_MAX is big enough? I'd say
that this problem with rand()%x is actually a problem when RAND_MAX is small,
regardless of x. Suppose your RAND_MAX is 4 and your sequence has 5 numbers:
{0,1,2,3,4}

the result modulo 3 would be:
{0,1,2,0,1}

now that's a real problem because 0 and 1 appear twice as frequently as 2!!!

however if your RAND_MAX was like 9 things would go like this:
{0,1,2,3,4,5,6,7,8,9}

modulo 3:
{0,1,2,0,1,2,0,1,2,0} the highest/lowest frequency quotient is 4/3=1.333... < 2

and this continues to drop as RAND_MAX increases.
Am I misunderstanding something? :/
Last edited on
r0shi wrote:
Does this make the numbers produced more random???


It might increase the period. Although the way you're doing it, it also looks like you might be breaking distribution, worsening the problem even more.

I am a bit confused here. Does this problem occur when x is not a power of two...
or when RAND_MAX+1 is not divisible by x???


Technically the latter, but since std::RAND_MAX+1 is pretty much always going to be a power of two, its divisors will always be powers of two.

now that's a real problem because 0 and 1 appear twice as frequently as 2!!!


As you noted, this is more a significant a problem the closer x comes to RAND_MAX.

However I think your attempt at a solution to a problem is ill fated. If an RNG produces an even distribution between [0;RAND_MAX], there is no way (afaik) to retain even distribution for any value of x, no matter how you scale/transform the RNG output.

And if the RNG doesn't produce an even distribution, well then you're screwed anyway.

The only way to really "solve" the problem is to only use a "safe" value for x (ie: a RAND_MAX+1 divisor) and "reroll" if the result is >= desired_x.

But of course that solution is pretty impractical.
or when RAND_MAX+1 is not divisible by x???
But since RAND_MAX is always a power of two minus one...

I'd say that this problem with rand()%x is actually a problem when RAND_MAX is small,
regardless of x.
Come to think of it, yes.
It might increase the period. Although the way you're doing it, it also looks like you might be breaking distribution, worsening the problem even more.

Ok I fixed it, turned the rand definition to this:

1
2
3
4
5
6
unsigned int rand()
{
       unsigned int ret=std::rand()%5;
       seed[ret]=(alpha[ret]*seed[ret]+small_prime)%big_prime;
       return seed[ret];
}


Now, I m increasing the period without breaking any distribution, as I only move forward in the sequence I choose to use.
ok, helios, disch, (and kevin06s, ofc) I want you guys to take a look at this. I made a simple implementation for the thing with the permutations of which I spoke earlier, using my custom RNG (with 10 LCGs for long period and a RAND_MAX of 100109 (that's a prime) -1).

The good things about this is that (1) the distribution is uniform ("in space") and (2) you can guarantee that there never appear sequences of consecutive identical numbers with length (sequence length) more than 2*k (I'd interpret that as the distribution being uniform "in time"), where k is a parameter that you give.

The bad thing is that generating a random permutation is time consuming... :/ do you know of any faster way to do it???

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
#include <iostream>
#include <cstdlib>
using namespace std;

class RNG
{
      public:
      RNG(unsigned int sp=5u, unsigned int bp=7u){std::srand(time(0));init(sp,bp);}
      ~RNG(){}
      
      unsigned int rand()
      {
            unsigned int ret=std::rand()%LCG_count;
            seed[ret]=(alpha[ret]*seed[ret]+small_prime)%big_prime;
            return seed[ret];
      }
      
      unsigned int srand(unsigned int s)
      {
               int i;
               for (i=0; i<LCG_count; i++) seed[i]=s;
               
               return rand();
      }
      
      void init(unsigned int sp, unsigned int bp)
      {
           small_prime=sp;
           big_prime=bp;
           
           int i;
           for (i=0; i<LCG_count; i++) seed[i]=0;
      }
      
      private:
      unsigned int small_prime;
      unsigned int big_prime;

      static const unsigned int LCG_count=10;
      unsigned int seed[LCG_count];
      static const unsigned int alpha[LCG_count];
};

const unsigned int RNG::alpha[RNG::LCG_count]={2,3,5,7,11,13,17,19,23,29};

class PRNG
{
      public:
      PRNG(unsigned int _n, unsigned int _k):rng(100103,100109)
      {      
                    
           n=_n;
           k=_k;
           seed=0;
           cur_perm=new unsigned int[n*k];
           
           //start with {1,1,...,1,2,2,...,...,n,n,...,n}
           int i,j;
           for (i=0; i<n; i++)
           {
               for (j=0; j<k; j++)
                   cur_perm[i*k+j]=i+1;
           }
           
           next_perm();
      }
      ~PRNG(){delete[] cur_perm;}
      
      int draw()
      {
          int ret=cur_perm[seed++];
          if (seed==n*k) {next_perm(); seed=0;}
          return ret;
      }
      
      private:
      RNG rng;
      unsigned int n;
      unsigned int k;
      unsigned int * cur_perm;
      unsigned int seed;
      
      void next_perm()
      {
           
           int i,j;
           unsigned int temp;
           unsigned int size=n*k;
           for (i=0; i<size; i++)
           {
               j=i+rng.rand()%(size-i);
               
               temp=cur_perm[i];
               cur_perm[i]=cur_perm[j];
               cur_perm[j]=temp;
           }
           
      }
};

const unsigned int N=2;
const unsigned int K=5;

int main()
{
    PRNG prng(N,K);
    
    int i;
    int size=10*N*K;
    for (i=0; i<size; i++)
    {
        cout << prng.draw()-1 << ' ';
    }
    
    cout << endl;    
    system("pause");
    return 0;
}
Last edited on
Talk about wasted effort. Mersenne Twister has a period of 2^19937-1 and generates nothing but 32-bit integers each iteration.
Last edited on
ok, I don't doubt that, but that's not what I asked... :/
Topic archived. No new replies allowed.
Pages: 12