Array Exerercise-Find top N values in array

Hey guys,

So for a course I am taking I am have a question which asks me to do the following.
Given: A two column txt file, with in the first column being strings and the second column being integers,open/read text file into an array and find the top x(50) integers in the second column with its corresponding string.
Note:Txt file have 999 enterties in each column
Condition: Can not sort arrays.

So far: I have successful read the text file and stored each column into an individual array of its corresponding data type. So let first column array be X and second column array be Y.

So I know how I would find the top int value in the array, but my problem exist were how I would obtain the top 50.

What I have considered was to find the max num in the array. Store the int in a top50i int array and the corresponding string in a top50s string array, then assign the int value stored to zero.
Then when I go to find the max value again the previous max value will no longer be there and I will obtain the second max value store assign zero and so on.

Note: I will be using these values later on, so storing them in a separate array I think would be wise.



Cheers!
Sort array by descending and take first 50 rows.
Condition: Can not sort arrays.

Unfortunately you can not sort the arrays.
That would be the obvious and ideal solution.
Oops. I din't read carefully it but answer.Sorry.
This problem is a good example of looking for ways to simplify the problem.
A sorted list of numbers would make this problem easy.
1,6,8,92,145,678,3456

It should be obvious that the top numbers are on the right side of this list.
But you do mention that sorting is not allowed.
So your only option is too scan through the list N times and each time taking the maximum, but ignoring values if they have already been encountered. Which is the solution that you wrote in your post. However, what if these integers are negative numbers? Assigning 0 would create a new maximum. So instead assigning a really large negative number would be better, but there is an even better solution than that. You can make an additional bool array that keeps track of what numbers are in use from the original array.

http://ideone.com/kO4hVv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <iostream>
#include <vector>

int main() {
	int unsorted[] = {5,7,12,-5,-4,3,2,1,56,890,-1043};
	std::vector<bool> tracker(11, false);
	const int n = 5; //we want the top n largest numbers
	std::vector<int> topN;
	
	for(int i = 0; i < n; i++) {
		//an integer needs to store the maximum in the list.
		//but it cannot be one that has already been marked
		//as true in tracker.
		//so first we will loop until the first un-marked number.
		int unmarked_index = 0;
		for(; unmarked_index < tracker.size(); unmarked_index++) {
			if(!tracker[unmarked_index]) {
				//we found one that has not been visited yet
				break;
			}
		}
		//the max variable will hold the maximum after the next
		//loop finishes.  first we must initialize it
		//and the loop we just came from gives us the
		//index to initialize with.
		//also, we must keep track of where the max is found
		//so that it can be marked in tracker after the loop
		int max = unsorted[unmarked_index];
		int max_index = unmarked_index;
		for(int j = unmarked_index+1; j < tracker.size(); j++) {
			if(!tracker[j] && unsorted[j] > max) {
				//if j is not marked and j is > max
				max = unsorted[j];
				max_index = j;
			}
		}
		//time to mark max as used
		tracker[max_index] = true;
		//and add max to topN
		topN.push_back(max);
	}
	
	for(int i = 0; i < topN.size(); i++) {
		std::cout << topN[i];
		if(i != topN.size() - 1) std::cout << ", ";
	}
	std::cout << std::endl;
	
	return 0;
}
Yo shoul not set value to zero. What if user specially enter all values as zeroes?

First iteration: find maximum
1
2
max = 0; 
for(...)(if max < curvalue) max = curvalue;


Second iteration: print all strings with that maximum value. There may be more than one such value.
for (...) if (curvalue == max) print.

Third iteration: Find maximum less then old maximum.
1
2
3
oldmax = max;
max = 0; 
for(...)if (max < curvalue && curvalue < oldmax) max = curvalue;


Fourth iteration: print all strings with that maximum value.
for (...) if (curvalue == max) print.

Fifth iteration: Find maximum less then old maximum.
1
2
3
oldmax = max;
max = 0; 
for(...)if (max < curvalue && curvalue < oldmax) max = curvalue;


And so on until you print 50 values. Count them.
Thanks for the help guys.

Probably an important item I left out is all integers are greater than zero.

That would probably reduce the complexity of the problem.
Also say if you were to have x values being equal to the max.
How would I go about storing the each value x with it corresponding string
in an array of 50 elements.?
My example works even if there are duplicate values in the list (x values equal to max).
Is it written in the task? No. Choose simpler solution.

First time you get first x value as maximum. Next time you get second x value as maximum. Repeat it until you fill all your 50 items array.
Variation on the previous themes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <algorithm>
#include <set>

struct myclass {
    const int * Arr;
    myclass( const int * arr )
    : Arr( arr ) {}

    bool operator() ( int lhs, int rhs ) {
        return Arr[lhs] < Arr[rhs];
    }
};

int main () {
    constexpr size_t TopN = 4;
    int myints[] = {3,7,2,5,9,4,7};
    std::set<int> index( {0,1,2,3,4,5,6}); // indices of the elements of myints
    std::set<int> tops;
    while ( tops.size() < TopN ) {
        auto it = std::max_element( std::begin(index), std::end(index), myclass( myints ) );
        // move index of max element from 'index' to 'tops'
        tops.insert( *it );
        index.erase( it );
    }

    // show TopN values
    for ( auto x : tops ) {
        std::cout << x << ' ' << myints[x] << '\n';
    }
    return 0;
}


How would I go about storing the each value x with it corresponding string

Create a struct that has string and int members, or use std::pair<std:.string,int>
closed account (D80DSL3A)
Yet another approach. Finds the top N values in an array of all positive integers.
It's done in a single pass through the array, so this could be done while reading the file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include<iostream>
#include<vector>
#include <cstdlib>

int main()
{
    const int aSz = 50;
    int a[aSz];

    for(auto& x: a) x = rand()%99 + 1;// fill array with 50 random positive ints
    for(auto x: a){ std::cout << x << ' '; }
    std::cout << '\n';// show the array values

    int N;// the number of highest values to find
    std::cout << "How many largest values? (1 to " << aSz << "): ";
    std::cin >> N;
    // the maxs': mx[0] = highest max, mx[N-1] = lowest max
    std::vector<int> mx( N, 0 );// Initialize all = 0

    for(int i=0; i<aSz; ++i)// for each element in a
    {
        for(int j=0; j<N; ++j)// for each max starting with the highest
        {
            if( a[i] > mx[j] )// 1st one we hit only (hence the break below)
            {
                for(int k=N-1; k>j; --k)// copy them downward from the highest max
                   mx[k] = mx[k-1];
                mx[j] = a[i];// the one we hit
                break;
            }
        }
    }

    std::cout << "\nThe " << N << " highest values are:\n";
    for(auto x: mx) std::cout << x << ' '; std::cout << '\n';

    return 0;
}
Last edited on
boost has a somewhat underappreciated library for that, fun2code's example can be shortened to just

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include<iostream>
#include<vector>
#include <cstdlib>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/tail.hpp>

namespace ac = boost::accumulators;
int main()
{
    const int aSz = 50;
    int a[aSz];

    for(auto& x: a) x = rand()%99 + 1;// fill array with 50 random positive ints
    for(auto x: a){ std::cout << x << ' '; }
    std::cout << '\n';// show the array values

    int N;// the number of highest values to find
    std::cout << "How many largest values? (1 to " << aSz << "): ";
    std::cin >> N;

    ac::accumulator_set<int, ac::features<ac::tag::tail<ac::right>>> acc (
        ac::tag::tail<ac::right>::cache_size = N
    );
    acc = std::for_each(std::begin(a), std::end(a), acc);

    std::cout << "\nThe " << N << " highest values are:\n";
    for(auto x: ac::tail(acc)) std::cout << x << ' '; std::cout << '\n';
}
closed account (D80DSL3A)
@Cubbi. That's cool. Can it work as values are read one by one from a file? Or, is it necessary to read and store all 999 values from the file into an array, then process this array?
My method is meant to find the N highest values as they are found while reading a file, or while taking user input.
Generating values on the fly instead:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include<iostream>
#include<vector>
#include <cstdlib> //for rand and srand

int main()
{
    const int aSz = 50;

    int N;// the number of highest values to find
    std::cout << "How many largest values? (1 to " << aSz << "): ";
    std::cin >> N;
    // the maxs': mx[0] = highest max, mx[N-1] = lowest max
    std::vector<int> mx( N, 0 );// Initialize all = 0

    for(int i=0; i<aSz; ++i)// for each element in a
    {
        int value = rand()%99 + 1;// generate values on the fly
        std::cout << value << ' ';
        
        for(int j=0; j<N; ++j)// for each max starting with the highest
        {
            if( value > mx[j] )// 1st one we hit only (hence the break below)
            {
                for(int k=N-1; k>j; --k)// copy them downward from the highest max
                   mx[k] = mx[k-1];
                mx[j] = value;// the one we hit
                break;
            }
        }
    }

    std::cout << "\nThe " << N << " highest values are:\n";
    for(auto x: mx) std::cout << x << ' '; std::cout << '\n';

    return 0;
}

EDIT: Explaining method.
I wrote that code in response to a thread where the problem was to find the two highest values in an array (which is why an array was involved).
The code is just a generalization of:
1
2
3
4
5
6
7
if( value > firstMax )
{
    secondMax = firstMax;
    firstMax = value;
}
else if( value > secondMax )
    secondMax = value;
Last edited on
Solution 1. Indirect using sorting. One scanning of input file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <fstream>
#include <string>
#include <vector>

using std::string;
using std::ifstream;
using std::ofstream;
using std::vector;
using std::move;
using std::endl;

// This struct keep our pairs
struct S {
	string s;
	int x;
	//this is used for indirect sorting resulting array by descending
	bool operator<(const S& param) const
	{
		return x > param.x;
	};
};

int main()
{
	ifstream in("input.txt");
	ofstream out("output.txt");
	vector<S> v;
	for (;;) {
		S add;
		in >> add.s >> add.x;
		if (!in) break;
		vector<S>::iterator pos = lower_bound(v.begin(), v.end(), add);
		v.insert(pos, add);
		if (v.size() > 50) v.resize(50);
	}
	for (S s : v)
		out << s.s << ' ' << s.x << endl;
}


Solution 2. No sorting. Many scannings of input file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <fstream>
#include <string>

using std::ifstream;
using std::ofstream;
using std::string;
using std::endl;

int main()
{
	ifstream in("input.txt");
	ofstream out("output.txt");
	int cnt = 0;
	int oldmax = 0;
	while (cnt < 50) {
		int max = 0;
		for (;;) {
			string foo;
			int curval;
			in >> foo >> curval;
			if (!in) break;
			if (max < curval && (curval < oldmax || !oldmax)) max = curval;
		}
		if (!max) break;
		in.clear();
		in.seekg(0, in.beg);
		for (;;) {
			string foo;
			int curval;
			in >> foo >> curval;
			if (!in) break;
			if (max != curval) continue;
			out << foo << ' ' << curval << endl;
			if (++cnt >= 50) break;
		}
		oldmax = max;
		in.clear();
		in.seekg(0, in.beg);
	}
}

One scan, no sort. Requires a container "foo" to store/refer the TopN values.

Scanning has to happen in two parts. The first TopN input values are simply added to the foo.

The second part scans the remaining input. Each new value is compared against the then smallest element of foo and that element is updated if the new value is larger.
1
2
3
4
while ( in >> value ) {
  auto it = std::min_element( std::begin(foo), std::end(foo), comp );
  if ( comp( *it, value ) ) *it = value;
}

The 'comp' is a predicate functor.


Note: the Top3 of {2,2,1,2,2,3} are {3,2,2} so only half of the elements with value 2 are included in Top3. The various methods presented in this thread will select different elements from those four candidates.
The standard library will do this for you:
http://www.cplusplus.com/reference/algorithm/nth_element/

Algorithms to do this are pretty cool. There may be others but I remember learning one that's very similar to quicksort, except that the goal is to find a pivot point at the n'th element.
The OP wanted a solution that did not use sorting.
Although I find this to be odd because essentially this is sorting the way it has been done so far.
Partitioning is a better more efficient way to go. I saw something somewhere that split a trillion elements let me find that.

Edit: I found the link finally
http://matpalm.com/median/algorithm.html
Basically you need to find the kth order statistic where k is the length of the list - N + 1
And everything to the right of that is part of topN. Including the kth order statistic also.
Last edited on
nth_element() doesn't sort the data, it only partitions it, so I believe it meets the criteria.
@dhayden: I had not seen your post. I was merely commenting that most of the solutions including mine were giving sorted output when actually this is a partitioning problem that does not require output to be sorted.
Topic archived. No new replies allowed.