finding largest prime number in a file

Pages: 12
hello guys,Anyone having an idea of creating a programm that is able to find largest prime number in a file containing 100,000 numbers.
Also,to print out the first ten large prime number.
Open file. Look at each number in turn. If it's a prime number, check if it's the largest prime number you've seen so far.
Break the program into individual challenges.

Challenge 1: Given a number, determine if it is prime
Challenge 2: Open a file and iterate through the numbers in that file
Challenge 3: (Putting it together) Keeping track of a maximum prime number as you iterate over numbers.

Focus on just challenge 1, make a function like
bool isPrime(int n);
that returns whether or not a number is prime. There's plenty of resources out there to help you if you get stuck.

Here's some basic psuedo code that might be helpful:

1
2
3
4
5
6
7
8
9
10
11
12
13
std::ifstream file("filename");
int largest_prime = -1;
int number;
while (file >> number) // while getting a number from the file
{
    if (isPrime(number))
    {
        if (number > largest_prime)
        {
             largest_prime = number;
        }
    }
}


Of course, this is only to find the 1st largest prime number. You'd have to have an array if you want to keep track of the N largest prime numbers.

Alternatively, if you find a prime that is larger than the last prime you found, then add it to an std::vector, and at the end, sort the vector and print the last 10 elements.
Last edited on
Find the largest number in the file.

Create a Sieve of Eratosthenes of that size.

Read off the last 10 primes.
a Sieve of Eratosthenes of that size

So big? IMO up to the sqare root would be sufficient. If you know all primes including 17 it is enough to filter for next primes up to (17+2)2-1 = 360. Or -2 = 359 to be precise (assuming you do not look for even primes > 2).
MikeStgt wrote:
So big?


Yes, so big.

If the largest number in the file is 100 then I require a sieve big enough to tell me that 89 and 97 are prime, not stop at sqrt( 100 ) = 10.
Last edited on
I require a sieve big enough to tell me that 89 and 97 are prime

How do you test a number to be prime? By lookup a list of primes? Or by computing? (Using a significant smaller list of primes.)
1) there are a bunch of "very likely" primality testing algorithms. These have some tiny chance at being wrong.
2) you can sieve it up and see if its in your sieve. typical sieves are a vector of bools, so its literally if(list[value]) then its prime.
a simple version of it...
3) for very small numbers you can use GCD against a constant or two. the constants grow like a factorial and its not as efficient as the bool list, so its of dubious value.

1
2
3
4
5
6
7
8
9
vector<bool> pn(max_size,true);
  for(i = 4; i < pn.size(); i+=2)
	  pn[i] = false;  
  for(i = 3; i < pn.size(); i+=2)
  {
     if(pn[i])
      for(j = i*2; j < pn.size(); j+=i)
         pn[j] = false; 	
  }
Last edited on
1) there are a bunch of "very likely" primality testing algorithms.
Rabin-Miller is one of them. AFAIK, it reports no false positives below 10^10 (at least not on an HP-42S and its emulators).

2) you can sieve it up...
For this you must have sieved up to max value found in list to build the vector of bools.

3) for very small numbers...
A counter proposal:
1
2
3
4
5
6
7
8
9
bool isPrime(int number)
{
    if(number < 2) return false;
    if(number == 2) return true;
    if(number % 2 == 0) return false;
    for(int i=3; (i*i)<=number; i+=2)
        if(number % i == 0 ) return false;
    return true;
}

My idea was to "improve" the loop with a small array of primes p[] to:
for(int i=0; (p[i]*p[i])<=number; ++i)
if(number % p[i] == 0 ) return false;


Well, I have to confess, I forgot the option of a vector of bools what is a grand saving in storage and "lookup" in contrast to an array of primes. In case the list contains values close to max of unsigned long long (80 bits I assume), the vector of bools will use more than 3.8e16 MB of your storage (if compacted), what is not nothing ;)
I assume, depending on the max value option 1) with verification by something like option 3) could be beneficial.

Edit: bits not bytes
Edit: size of vector of bools
Last edited on
Supplement
In 1876 Lucas proved that 2127-1 is prime.
That was some time before unsigned long long :)
Anyone having an idea of creating a programm that is able to find largest prime number in a file containing 100,000 numbers.

Took a while, sorry --
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>	/* cout, cin, endl */
#include <ctime>	/* clock */
#include <cmath>	/* sqrt */
#include <cstdint>      /* __uint128_t */
using namespace std;

/* MRT from https://de.wikipedia.org/wiki/Miller-Rabin-Test#Implementierung */
#include <cstdint>
using std::uint64_t;
bool mrt (const uint64_t n, const uint64_t a) { // n ungerade, 1 < a < n-1
   if (n % 2 == 0) return false;
   const uint64_t m = n - 1;
   uint64_t d = m >> 1, e = 1;
   while (!(d & 1)) d >>= 1, ++e;
   __uint128_t p = a, q = a;
   while (d >>= 1) { // exponentiere modular: p = a^d mod n
      q *= q, q %= n; // quadriere modular: q = q^2 mod n
      if (d & 1) p *= q, p %= n; // multipliziere modular: p = (p * q) mod n
   }
   if (p == 1 || p == m) return true; // n ist wahrscheinlich prim
   while (--e) {
      p *= p, p %= n;
      if (p == m) return true;
      if (p <= 1) break;
   }
   return false; // n ist nicht prim
}

int main()
{
	unsigned long max(~0), delta2(100000), n, cnt(0);

	cnt = 0; clock_t t0 = clock();		/* time(R) */
	for (n = max - delta2; n < max; ++n)
		if (mrt(n,  2) && 
		    mrt(n,  3) &&
		    mrt(n,  5) &&
		    mrt(n,  7) &&
		    mrt(n, 11) &&
		    mrt(n, 13) &&
		    mrt(n, 17) &&
		    mrt(n, 19) &&
		    mrt(n, 23))		cnt++;
	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */
	cout << "MRT counts " << cnt << " primes from " << max - delta2 << " to " << max << " within " << te << " seconds.\n";
}

My focus was on a way to scan 100'000 numbers for primes in some reasonable time, to present the 10 largest found is left as exercise for others.
Last edited on
A hybrid approach would be faster if the range is 0-N ... do the vector bool lookup table sieve as big as you can tolerate it first, nearly instant check for the ones in that table, then do the harder check for anything bigger. MRT there is doing an awful lot of work to see if 11 is prime or not.
A hybrid approach would be faster
Correct. For that I am about to get the speed of different methods at different "spots", 0..100'000, 8000000..8100000, and 4294957295 to 4294967295 so far. See http://www.cplusplus.com/forum/beginner/253928/#msg1115830 A simple primality check seems to be sufficient in the typical number range of pocket calculators.
It is work in progress, as I am not yet familiar enough with C++ to code my ideas right away. So the method using a bool lookup table is currently missing. I've done it, partially, see http://www.cplusplus.com/forum/beginner/253520/#msg1114742 , but to put it as a function so far failed, bitset<z> s; insists of z being a constant.
my code above gives you a bool lookup table. if pn[value] == true then its prime. can tweak it to use unsigned long long or whatever, but the core code should be fine. You may be able to make it faster somehow too. Even cuter you can run it once and save the vector to a file or whatever.
Last edited on
the core code should be fine

The core of your routine is SoE, Sieve of Eratosthenes :)

You may be able to make it faster somehow too.
Yes, I am. Instead of your sieving loop
for(j = i*2; j < pn.size(); j+=i) I suggest
for(j = i*i; j < pn.size(); j+=i+i) Next speed up would be using "wheels".

Even cuter you can run it once and save the vector to a file or whatever.
Exactly that is my goal, for reasons of size I'd prefere bitset instead of bool array. Drawback, its size must be known at compilation already. At least that's what I grasped.
vector bool is optimized to use ~ 1 bit per bool.
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.
found at http://www.cplusplus.com/reference/vector/vector-bool
Not necessarily, may optimize storage is indefinite -- sorry, this is not so stringent as your statement. Things That Make You Go Hmmmm...
In contrast: "The class emulates an array of bool elements, but optimized for space allocation", see http://www.cplusplus.com/reference/bitset/bitset
You need to determine the primality of 100,000 numbers. To make each test fast, it might be useful to keep a collection of primes up to some value. As long as you have primes up to sqrt(N), you can quickly determine if N is prime by seeing if any of those primes is a factor of N.

That helps to check primality of 100,000 numbers, but only if they are relatively small. If they can range up to 1018 Then another test like MRT might be better.

So how big can these numbers get?

It seems to me that this is really an exercise in your ability to analyze a problem and choose an efficient algorithm. In this case, then key factors are how many numbers you must examine (100,000), and the range of values for those numbers.

It seems to me that this is really an exercise

Yes, Dave, I try to take my first steps with it in C++. For sure it is not for the primes, there are several known meanwhile :)

BTW, several years ago there was an information, it is faster to generate the first "few" prime numbers than reading them from a disk file. That was before SSD.
It is still true even on SSD. SSD is (very slightly) slower than ram, and ram is slower than cache. Rotating disk is much slower than SSD, but the delay still exists because ram does not have a file system / OS interface layer to contend with, and they are not always tied to the motherboard as optimally as ram (depends on the system).

'few' is system dependent so its hard to nail that down.

modern PC timings are hard to grasp ... Ive seen too many examples where doing more work is faster than having logic to avoid some of the work. A couple of the primality test examples thrown around suffer from this ... 6+ if statements out the gate take more time than they save, at least on my setup.
Last edited on
Pages: 12