Rand Question

Pages: 12
What is the difference between:
 
rand() % 10 + 1;

and
 
rand() % 11;


as far as I understand, there is no difference is there?
Also, I have seen people using (rand() % 10) + 1;, and I am wondering if there is any reason for the explicit parentheses?
rand()%10+1; and (rand()%10)+1; do the same thing but rand()%11; does something different... the first two return a number ranging from 1 to 10 and the last one returns a number from 0 to 10. In general if you want a random number from a to b I would suggest using int(0.5+a+(b-a)*(rand()/double(RAND_MAX)));
Last edited on
Ugh! Conversion back and forth between floating point and integer is a no no.

rand()%(max-min)+min gives a biased number in the range [min;max).
rand*(max-min)/RAND_MAX+min gives a slightly less biased number in the range [min;max).
um... I believe that the casting to double is necessary. You see, all operands are ints so rand()/RAND_MAX would always be 0. Same goes for (max-min)/RAND_MAX in the case that max-min is less than RAND_MAX. (this doesn't mean that things are ok if it's not less...) Even if you define max and min as doubles to have the compiler decide to do the casting, it (the casting) will still occur so what's bad about doing it explicitly? Oh, and the evaluation of rand()*(max-min) could cause an overflow...
Last edited on
As long as RAND_MAX<(unsigned long(1<<(sizeof(int)/4))-1) and (max-min)<RAND_MAX, the multiplication will not overflow.
As for the division, if you first multiply the generated number by at least 2 (which is the smallest value that makes sense to call rand() with. Otherwise you'd just be generating the same number over and over again), the result will be an interpolated value between 0 and (max-min)-1.
For example, given RAND_MAX=15, min=5, and max=14, the result for every possible return value of rand() are: 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 12, 13.
min=5, max=7: 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6.

Even if you define max and min as doubles
OP is clearly not defining them as doubles, so I don't know where you're going.
-.- (unsigned long(1<<(sizeof(int)/4))-1) this is like... 1 and nobody guarantees that (max-min)<RAND_MAX is true... I'm not convinced, try harder... (I got the part with the division though, never actually argued with that at the first place, all I said was that if you calculate rand()*((max-min)/RAND_MAX)+min you would end up with min if (max-min)<RAND_MAX and if u calculate (rand()*(max-min))/RAND_MAX+min you risk having an overflow, I didn't say that it won't work if it just happens that it doesn't overflow)
Last edited on
Whoops. That formula is completely wrong.
RAND_MAX<unsigned long(1<<(sizeof(int)*4))+1

nobody guarantees that (max-min)<RAND_MAX is true
If it's not true, then the more popular formula rand()%(max-min)+min will also not work, because it's trying to get more information out of rand() than it can possibly generate.
By the way, the condition actually is (max-min)<=RAND_MAX.

I'm not convinced, try harder
The formula basically means "as long as rand() can't return a value that has bits in the upper half of an int set". For a 32-bit integer, that would be 2^16-1. When both RAND_MAX and max-min (from now on, x) ==2^16-1, RAND_MAX*x==FFFE0001.
If both RAND_MAX and x were 2^16, then the result is zero ((1<<16)<<16).
ok, I got it now. so, in other words you check if RAND_MAX is less than sqrt(ULONG_MAX) and then check if (max-min)=range is less than RAND_MAX. If both conditions are true then rand()*range can't be more than sqrt(ULONG_MAX)*sqrt(ULONG_MAX)=ULONG_MAX, thus we never have overflow. Ok, to sum up then, if range is bigger than RAND_MAX you use the modulo method, else if RAND_MAX is less than sqrt(ULONG_MAX) and range<=RAND_MAX you use the other method, ELSE (if RAND_MAX>sqrt(ULONG_MAX)>range || RAND_MAX>range>sqrt(ULONG_MAX)) you have to cast something to double :D :P I win!!! `(^o^)' (no offense)
Last edited on
All I can say is, enjoy your horribly slow random number generator.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
#include <ctime>
#include <cstdlib>

unsigned long A(unsigned long min,unsigned long max){
	return 0.5+min+(max-min)*(rand()/double(RAND_MAX));
}

unsigned long B(unsigned long min,unsigned long max){
	return ((unsigned long)rand())*(max-min)/((unsigned long)RAND_MAX)+min;
}

#define N (100*1000*1000)

int main(){
	unsigned long t=time(0),
		t0,t1;
	srand(t);
	int n=0;
	t0=clock();
	for (int a=N;a;--a)
		n+=A(0,0x7FFF);
	t1=clock();
	std::cout <<n<<std::endl; //had to output to trick the compiler into actually calling the function
	std::cout <<t1-t0<<std::endl;
	srand(t);
	t0=clock();
	n=0;
	for (int a=N;a;--a)
		n+=B(0,0x7FFF);
	t1=clock();
	std::cout <<n<<std::endl;
	std::cout <<t1-t0<<std::endl;
	return 0;
}

F:\>g++ test.cpp -o test -O3 && test
2104571572
3250
2104571572
1781

That's a 45% speedup by just changing types.
ok, maybe we could call it a tie... you're definitely faster and I m definitely not going to overflow... and if one called B when range is less than 0x7fff and A when range is more than 0x7fff that would be better. ^^
Maybe I missed something here.

How does the simple rand() % x cause an overflow? Why are you guys proposing these ludicrously complex alternatives? What benefits do they have over the traditional (and simpler and faster) % approach?
Last edited on
ofc rand() % x can't cause an overflow. the problem is to find an effective way to get a random number ranging from integer min to integer max. helios suggested the formula: min+(rand()*(max-min))/RAND_MAX (now, this: rand()*(max-min) could cause an overflow) and I suggested 0.5+min+(max-min)*(rand()/double(RAND_MAX)). In this case (rand()/double(RAND_MAX)) would be a floating point number in [0,1] so there never occurs an overflow scenario. We don't want to use the popular min+rand()%(max-min) because this way uses very little information from rand() when (max-min) is small. Helios' approach proved to be faster but works for a smaller range. Maybe the best thing you can do is use min+rand()%(max-min) when (max-min) is >= than RAND_MAX, else if (max-min) is less than 0x7fff use helios' method, else use mine.
Last edited on
We don't want to use the popular min+rand()%(max-min) because this way uses very little information from rand() when (max-min) is small


That doesn't make any sense. Why does that even matter?

If you only use the low 2 bits of a 32 bit randomly generated number, those 2 bits are still just as random. And assuming the RNG produces a full period of 2^32, then those two bits are just as evenly distributed as doing a complex calculation on the full 32 bit number.

I fail to see why the more popular approach isn't just as good or better than any of these more complex approaches.

To prove my point... here's a simplistic example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <iomanip>

int rnd()
{
    // faux random number with a period of 16
    //  every value between [0..15] appears exactly once
    static const int nums[16] = {4,10,11,15,14,5,13,1,6,9,3,7,8,2,0,12};
    static int pos = 0;

    int ret = nums[pos];
    pos = (pos + 1) & 0xF;
    return ret;
}

int main()
{
    using std::cout;
    using std::setw;

    int min = 2;  // sub in whatever numbers you want
    int max = 7;  //  as long as they're valid

    cout << "output the full 16 period\n";
    cout << "output is in the form of:\n";
    cout << "  unaltered -> r0shi -> helios -> normal";
    cout << "\n_______________\n\n";

    for(int i = 0; i < 16; ++i)
    {
        int r = rnd();  // the random number to use

        // unaltered
        cout << setw(2) << r << " -> ";

        // r0shi
        cout << setw(2) << (int)(0.5+min+(max-min)*(r/double(16))) << " -> ";

        // helios
        cout << setw(2) << (r)*(max-min)/(16)+min << " -> ";

        // normal
        cout << setw(2) << r%(max-min)+min << "\n";
    }

    return 0;
}


Here's the output, followed by my count of the distribution. As you can see, all methods produce equally random results*, and all have equal distribution (except for r0shi's approach which actually yields a 7, which should be impossible):

* helios' approach seems to have "doubles" coming up a lot (ie: 5 followed by another 5), but I think that's a fluke due to rnd() being kind of lame (and having such a small period)

The point I'm making here, anyway, is that the output via the normal method is just as random, just as well distributed, and therefore just as good as either of these more complex approaches.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
 4 ->  3 ->  3 ->  6
10 ->  5 ->  5 ->  2
11 ->  5 ->  5 ->  3
15 ->  7 ->  6 ->  2
14 ->  6 ->  6 ->  6
 5 ->  4 ->  3 ->  2
13 ->  6 ->  6 ->  5
 1 ->  2 ->  2 ->  3
 6 ->  4 ->  3 ->  3
 9 ->  5 ->  4 ->  6
 3 ->  3 ->  2 ->  5
 7 ->  4 ->  4 ->  4
 8 ->  5 ->  4 ->  5
 2 ->  3 ->  2 ->  4
 0 ->  2 ->  2 ->  2
12 ->  6 ->  5 ->  4

r0shi:
  2's:  2
  3's:  3
  4's:  3
  5's:  4
  6's:  3
  7's:  1  (wtf)

helios:
  2's:  4
  3's:  3
  4's:  3
  5's:  3
  6's:  3

normal:
  2's:  4
  3's:  3
  4's:  3
  5's:  3
  6's:  3
Last edited on
The problem with rand()%x is that the lower values are slightly more likely when x is not a power of two. For example, for x=6, the distribution would be 0, 1, 2, 3, 4, 5, 0, 1.
You can't get rid of that bias by just moving the operations about. Some values are always going to be more likely than others. What you can do, however, is redistribute the bias so that it's not clustered at be beginning of the range.

Your logic is wrong. You're dividing by 16, but you're still using r as the randomness source, instead of i.
Last edited on
What you can do, however, is redistribute the bias so that it's not clustered at be beginning of the range.


That does kind of make sense, actually. It doesn't really seem like that much of an improvement, but I guess I understand what the point of all this is now.

Your logic is wrong. You're dividing by 16, but you're still using r as the randomness source, instead of i.


I'm using r because it's the output of rnd(). You could use i instead and the results would be exactly the same, only more ordered (the unaltered output would count up 0-15 instead of being scrambled like it is above)

Although I did mess up and use 16 as RAND_MAX when it really should be 15. Could that be why I got the 7 with roshi's code?
You could use i instead and the results would be exactly the same
No, because the formulas use RAND_MAX to redistribute rand(). If rand() can produce values outside the range, that changes the distribution completely.
Oh, never mind. I didn't see it said rnd(), rather than rand().

Although I did mess up and use 16 as RAND_MAX when it really should be 15.
rand() is supposed to return a value in the range [0;RAND_MAX], so as long as rnd() doesn't return 17, there shouldn't be a problem.
Last edited on
well rnd() returns [0;15] (never 16)

Hence my error.
Helios is right... but there is more.... If we only cared about the distribution in space there would be no problem! However we care about the distribution in time too... Every method one can use to get a random number from min to max, heavily depends on the implementation of the rand() function. Let's say that rand() works giving those 10 numbers periodically:

{0,1,2,3,4,5,6,7,8,9}

in that case rand()%2 would give something like:

{0,1,0,1,0,1,0,1,0,1} (uniform distribution in both time and space, nice!)

while (rand()*2)/9 would give:

{0,0,0,0,0,1,1,1,1,1} (uniform distribution only in space, I don't like this...)

on the other hand, if rand() was using a table like:

{0,2,4,6,8,1,3,5,7,9}

the results for the two methods would be:

{0,0,0,0,0,1,1,1,1,1} for rand()%2 and
{0,0,0,1,1,0,0,1,1,1} for (rand()*2)/9

So, the thing is (and Stroustrup himself says that in his book, you know the one with the ocean wave and the lime colored C++ logo on the front cover) that (rand()*n)/RAND_MAX gives better results (but ofc with the same space distribution) than rand()%n in most implementations of the rand() function.

After making these thoughts it occured to me:
"maybe our problem could be changed into finding a rand() implementation that works fine with rand()%n" so I got down to work and came up with this one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
#include <iostream>
#include <cstdlib>
using namespace std;

class RNG //random number generator
{
      public:
      RNG(unsigned int sp=5u, unsigned int bp=7u){seed_seq=0; init(sp,bp);}
      ~RNG(){if (seed_seq) delete[]seed_seq;}
      
      unsigned int rand()
      {
               
            if (cur_seed >= big_prime) cur_seed%=big_prime;
            
            unsigned int ret=(small_prime*(cur_seed))%big_prime;
            
            cur_seed=seed_seq[cur_seed];
            
            return ret;
      }
      
      unsigned int srand(unsigned int s)
      {
               cur_seed=s;
               return rand();
      }
      
      void init(unsigned int sp, unsigned int bp)
      {
           small_prime=sp;
           big_prime=bp;
           cur_seed=0;
           
           if (seed_seq) delete[] seed_seq;
           seed_seq=new unsigned int[bp];
           
           int i;
           for (i=0; i<bp/2; i++)
           {
               seed_seq[i]=bp-i-1;
           }
           
           for (i=bp-1; i>bp/2; i--)
           {
               seed_seq[i]=bp-i;
           }
           
           seed_seq[bp/2]=0;
           
           //for (i=0; i<bp; i++)
           // cout << "seed[" << i << "]=" << seed_seq[i] << endl;
      }
      
      private:
      unsigned int small_prime;
      unsigned int big_prime;

      unsigned int * seed_seq;
      unsigned int cur_seed;
      
};

int main()
{
    RNG rng;
    
    int i;
    unsigned int p, bp,n;
    
    unsigned int *frequency;
    unsigned int rnd;
    
    while (true)
    {
        system("cls");  
        cout << "enter a small prime, a big prime and n (for rand()%n operation): ";
        cin >> p >> bp >> n;
        
        rng.init(p,bp);
        frequency=new unsigned int[n];
        memset(frequency,0,n*sizeof(int));
        
        cout << "{ ";
        
        for (i=0; i<bp; i++)
        {
            rnd=rng.rand()%n;
            cout << rnd << ' ';
            frequency[rnd]++;
        }
        cout << '}' << endl;
    
        for (i=0; i<n; i++)
        {
            cout << "frequency of " << i << ": " << frequency[i] << endl;
        }
    
        delete[] frequency;
    
        cout << "do it again? (0->no, 1->yes) ";
        cin>>i;
        if (i==0) break;    
    }
        
    system("pause");
    return 0;
}


I tried 5 7 2, 5 13 2, 13 17 2, 79 89 3 and more and everything was fine! ^^
Here is a site with primes so you can do your tests: http://www.prime-numbers.org/prime-number-1-5001.htm
Last edited on
Your reasoning is fundamentally flawed, because if rand() gave such predictable results, the distribution should be the least of your concerns.

And wouldn't it be easier to just write an LCG with your own RAND_MAX, anyway?
1
2
3
4
5
6
7
8
9
10
const ulong var_RAND_MAX=0x7FFF;

template <ulong max=var_RAND_MAX>
inline ulong var_rand(ulong seed=0){
	static ulong var_rand_last=0;
	if (!seed)
		return var_rand_last=(214013*var_rand_last+2531011)%max;
	var_rand_last=seed;
	return 0;
}

I think you have a problem with producing overcomplicated solutions.
Occam's Razor and KISS.
helios wrote:
Your reasoning is fundamentally flawed, because if rand() gave such predictable results, the distribution should be the least of your concerns.


+1
Pages: 12