How can I make my solution more efficient?

I'm having trouble with a competitive programming problem, here it is.
Farmer John's N cows (1 <= N <= 100,000) are standing at various positions along
a long one-dimensional fence. The ith cow is standing at position x_i (an
integer in the range 0...1,000,000,000) and has breed b_i (either 'G' for
Guernsey or 'H' for Holstein). No two cows occupy the same position.

FJ wants to take a photo of a contiguous interval of cows for the county
fair, but we wants all of his breeds to be fairly represented in the photo.
Therefore, he wants to ensure that, for whatever breeds are present in the
photo, there is an equal number of each breed (for example, a photo with
all Holsteins is ok, a photo with 27 Holsteins and 27 Guernseys is ok, but a
photo with 10 Holsteins and 9 Guernseys is not ok). Help FJ take his fair
photo by finding the maximum size of a photo that satisfies FJ's
constraints. The size of a photo is the difference between the maximum and
minimum positions of the cows in the photo. It is possible that FJ could
end up taking a photo of just a single cow, in which case this photo would
have size zero.

PROBLEM NAME: fairphoto

INPUT FORMAT:

* Line 1: The integer N.

* Lines 2..1+N: Line i+1 contains x_i and b_i.

SAMPLE INPUT:

6
4 G
10 H
7 G
16 G
1 G
3 H

INPUT DETAILS:

There are six cows with breeds (from left to right) G, H, G, G, H, G.

OUTPUT FORMAT:

* Line 1: A single integer indicating the maximum size of a fair
photo.

SAMPLE OUTPUT:

7

OUTPUT DETAILS:

The largest fair photo Farmer John can take is of the middle 4 cows,
containing 2 Holsteins and 2 Guernseys.

My solution utilizes prefix sums.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include <iostream> 
#include <algorithm>
using namespace std;
struct data {
    char cow;
    int pos;
    int bin;
    int psum;
} cows [100000];
bool comp (data a, data b) {
    return a.pos<b.pos;
}
int main () {
    int n;
    cin>>n;
    for (int i=1; i<=n; i++) {
        cin>>cows[i].pos>>cows[i].cow;
    }
    for (int i=1; i<=n; i++) {
        if (cows[i].cow=='G')
        cows[i].bin=1;
        else {
            cows[i].bin=-1;
        }
    }
    sort (cows, cows+n+1, comp);
    cows[0].psum=0;
    for (int i=1; i<=n; i++) {
        cows[i].psum=cows[i].bin+cows[i-1].psum;
    }
   int first [n+1];
   int last [n+1];
   fill (first, first+n+1, -1);
   for (int i=0; i<=n; i++) {
       if (first[cows[i].psum]==-1)
       first[cows[i].psum]=cows[i].pos;
       if (first[cows[i].psum]!=-1){
           if(first[cows[i].psum]>cows[i].pos)
           first[cows[i].psum]=cows[i].pos;
           }
       }
    fill (last, last+n+1, -1);
    for (int i=0; i<=n; i++) {
       if (last[cows[i].psum]==-1)
       last[cows[i].psum]=cows[i].pos;
       if (last[cows[i].psum]!=-1){
           if(last[cows[i].psum]<cows[i].pos)
           last[cows[i].psum]=cows[i].pos;
           }
       }
    int temp[n+1];
    for (int i=0; i<=n; i++) {
        temp[i]=cows[i].pos;
    }
int dist [n+1];
    for (int i=0; i<=n; i++) {
        int ind=upper_bound(temp, temp+n+1, first[i])-temp;
        dist[i]=last[i]-cows[ind].pos;
    }
    
    int max=dist[max_element(dist, dist+n+1)-dist];
    int cd [n+1];
    for (int i=1; i<=n; i++) {
        int j=i;
        while (j<n&&cows[j].cow==cows[j+1].cow) {
            j++;
        }
        cd[i]=cows[j].pos-cows[i].pos;
    }
    int maxx=cd[max_element (cd, cd+n+1)-cd];
        if (maxx>max)
        cout<<maxx<<endl;
        else {
            cout<<max;
        }
}

My solution utilizes prefix sums. What I essentially do is denote Guernseys as 1 and Holsteins as -1. Thus, an valid photo of cows with equal numbers of these two breeds will have a sum of zero. Given an array that represents our cows, we can quickly find the sums of various blocks of cows by taking the prefix sums of this array. For each cow at position k, we store a number S_k representing the sum of the cows' numbers from 1 to k. To find the sum of cows' numbers from i to j, we can simply compute S_j - S_(i-1). For a photo with i to k cows, the prefix sums of the i-1th cow and kth cow essentially must be equal.

I then store the leftmost position of the cow with each distinct prefix number, and do the same for the rightmost cow. I then store the difference in these positions in an array and find the maximum distance out of all possible prefix number pairs.

I then tackle the corner case of all cows being the same breed by find largest continued section of same breed and finding the distance accrodingly.

Then I compare the distance for the longest continuous section of same breed with the longest photo with equal numbers of both breeds and output the largest one.

I read the solution after I coded this, but it was in pseudocode, and had many obvious errors, so I did not really understand how to implement some of the things in the solution.
1. There is apparently some way to calculate the difference in distance without actually making an array for the rightmost cow, which would make my solution more efficient, as currently it does not run for very large inputs in time.
2. Apparently I have to double some array value or index thing so that it is 2n? to prevent out of bounds or something. it has to do with the prefix sum being a minimum of -n and max of n (all holsteins or all the other breed).

I know that was a mouthful. Thanks for making it to the end!
Last edited on
you need to understand the algorithm that you saw in pseudo-code or the solution. Making the code you show faster is like trying to optimize a bubblesort for a billion numbers... afterwards its faster than it was, and it still stinks, whereas do-over with a better algorithm is what was needed, right?

I freely admit I don't know how to do this without digging in and doing it for you. But that is the key, ... step away from the code, figure out the algorithm (even if using the one they give you, understand it) then code THAT. Most of the time, even poorly written code with the right algorithm will get full marks on these sites. If it does not, THEN we can try to make it even faster with some c++ tweaks and tricks.

> I then tackle the corner case of all cows being the same breed by find
> largest continued section of same breed and finding the distance accrodingly.
1
2
3
4
5
6
7
for (int i=1; i<=n; i++) {
	int j=i;
	while (j<n&&cows[j].cow==cows[j+1].cow) {
		j++;
	}
	cd[i]=cows[j].pos-cows[i].pos;
}
there is your bottleneck
in the rest of the code you've got a sort O(n lg n) and simple traversals of arrays, with the most expensive operation being `upper_bound()' O(lg n), so you end up with a complexity O(n lg n)
but here, when counting same breed, you've got nested loops, with a worst case of O(n^2)

say that you've got "GGGGGH..."
you start at 0, count 4 equal cows, find the different one, break
continue at 1, count 3 equal cows, find the different one, break
continue at 2, count 2 equal cows, find the different one, break
continue at 3, count 1 equal cows, find the different one, break
...
but you could have started at 0, count 4 equal cows
and then jump directly at position 5 (H) to start another count.



apart from that
1
2
3
4
5
cows [100000];
//...
    for (int i=1; i<=n; i++) {
        cin>>cows[i].pos>>cows[i].cow;
    }
n may be 100000, so out of bounds

dist[max_element(dist, dist+n+1)-dist]; may be wrote as *max_element(dist, dist+n+1);


> Apparently I have to double some array value or index thing so that it is 2n?
> to prevent out of bounds or something. it has to do with the prefix sum being
> a minimum of -n and max of n (all holsteins or all the other breed).
last[cows[i].psum]
¿what values may cows[i].psum hold? ¿how big should `last' be then?
Thanks for all the input!
I've changed everything per your advice, but I think the loop at the end that calculates the longest string of the same breed goes out of bounds and gives some random value sometimes. I don't know where the error is.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <iostream> 
#include <algorithm>
using namespace std;
struct data {
    char cow;
    int pos;
    int bin;
    int psum;
} cows [200002];
bool comp (data a, data b) {
    return a.pos<b.pos;
}
int main () {
    int n;
    cin>>n;
    for (int i=1; i<=n; i++) {
        cin>>cows[i].pos>>cows[i].cow;
    }
    sort (cows, cows+n+1, comp);
    for (int i=1; i<=n; i++) {
        if (cows[i].cow=='G')
        cows[i].bin=1;
        else {
            cows[i].bin=-1;
        }
    }
    for (int i=1; i<=n; i++) {
        cows[i].psum=cows[i].bin+cows[i-1].psum;
    }
    for (int i=0; i<=n; i++) {
        cows[i].psum=cows[i].psum+n;
    }
       int first [2*n+1];
   int last [2*n+1];
   fill (first, first+2*n+2, -1);
   for (int i=0; i<=n; i++) {
       if (first[cows[i].psum]==-1)
       first[cows[i].psum]=cows[i].pos;
       if (first[cows[i].psum]!=-1){
           if(first[cows[i].psum]>cows[i].pos)
           first[cows[i].psum]=cows[i].pos;
           }
       }
    fill (last, last+2*n+2, -1);
    for (int i=0; i<=n; i++) {
       if (last[cows[i].psum]==-1)
       last[cows[i].psum]=cows[i].pos;
       if (last[cows[i].psum]!=-1){
           if(last[cows[i].psum]<cows[i].pos)
           last[cows[i].psum]=cows[i].pos;
           }
       }
    int temp[n+1];
    for (int i=0; i<=n; i++) {
        temp[i]=cows[i].pos;
    }
int dist [n+1];
    for (int i=0; i<=n; i++) {
        int ind=upper_bound(temp, temp+n+1, first[i+n])-temp;
        dist[i]=last[i+n]-cows[ind].pos;
    }
    int max=dist[max_element(dist, dist+n+1)-dist];
    int cd [n+1];
    for (int i=1; i<=n; i++) {
        int j=i;
        while (j<=n&&cows[j].cow==cows[j+1].cow) {
            j++;
        }
        cd[i]=cows[j].pos-cows[i].pos;
        i=j;
    }
     int maxx=cd[max_element (cd, cd+n+1)-cd];
        if (maxx>max)
        cout<<maxx<<endl;
        else {
            cout<<max;
        }
}
cheezed automated the inputs... but output is HUGE. Is that expected???
edit: something is not right. in the shell here I get max = -1 and 6 and small values but the -1 concerns me. @home I get small positive values.

edit 2:
I made this change and got what I think are correct results. I believe this uninitialized array is your malfunction. This array is illegal in c++, by the way: size not known at compile time. But that seems to be a hallmark of coders for these problems.
int cd [n+1] = {0};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include <iostream> 
#include <algorithm>
using namespace std;
struct data {
    char cow;
    int pos;
    int bin;
    int psum;
} cows [200002];
bool comp (data a, data b) {
    return a.pos<b.pos;
}
int main () 
{
	srand(time(0));
    int n = 6;
    //cin>>n;
	char derp[] = "GH";
    for (int i=1; i<=n; i++) {
        //cin>>cows[i].pos>>cows[i].cow;
		cows[i].pos = rand()%20;
		cows[i].cow = derp[rand()%2];
		cout << cows[i].pos << cows[i].cow<<endl;
	    }
	
	
	
    sort (cows, cows+n+1, comp);
    for (int i=1; i<=n; i++) {
        if (cows[i].cow=='G')
        cows[i].bin=1;
        else {
            cows[i].bin=-1;
        }
    }
    for (int i=1; i<=n; i++) {
        cows[i].psum=cows[i].bin+cows[i-1].psum;
    }
    for (int i=0; i<=n; i++) {
        cows[i].psum=cows[i].psum+n;
    }
       int first [2*n+1];
   int last [2*n+1];
   fill (first, first+2*n+2, -1);
   for (int i=0; i<=n; i++) {
       if (first[cows[i].psum]==-1)
       first[cows[i].psum]=cows[i].pos;
       if (first[cows[i].psum]!=-1){
           if(first[cows[i].psum]>cows[i].pos)
           first[cows[i].psum]=cows[i].pos;
           }
       }
    fill (last, last+2*n+2, -1);
    for (int i=0; i<=n; i++) {
       if (last[cows[i].psum]==-1)
       last[cows[i].psum]=cows[i].pos;
       if (last[cows[i].psum]!=-1){
           if(last[cows[i].psum]<cows[i].pos)
           last[cows[i].psum]=cows[i].pos;
           }
       }
    int temp[n+1];
    for (int i=0; i<=n; i++) {
        temp[i]=cows[i].pos;
    }
int dist [n+1];
    for (int i=0; i<=n; i++) {
        int ind=upper_bound(temp, temp+n+1, first[i+n])-temp;
        dist[i]=last[i+n]-cows[ind].pos;
    }
    int max=dist[max_element(dist, dist+n+1)-dist];
	
	cout <<"max:" << max << endl;
    
	
	int cd [n+1];
    for (int i=1; i<=n; i++) {
        int j=i;
        while (j<=n&&cows[j].cow==cows[j+1].cow) {
            j++;
        }
        cd[i]=cows[j].pos-cows[i].pos;
        i=j;
    }
     int maxx=cd[max_element (cd, cd+n+1)-cd];
        if (maxx>max)
        cout<<maxx<<endl;
        else {
            cout<<max;
        }
}


compiled with max optimize, this runs instantly on my machine for 500 random cows in 20k slots.
Last edited on
@jonnin that code does not give the right values for any inputs.
I've written my code to eliminate all the out of bounds errors and made it efficient such that it can accept the largest possible input without problem per @ne555's advice (thank you so much). However, it still does not pass 1 test case out of the ten supplied. It is probably some corner case that I have not considered. I feel it is time to move on to learning another topic, as throughout these 2 days that I've been intermittently trying to fix my errors for this problem, I've easily solved 2 other equally difficult prefix sums problems. I'll see if I can solve this after I come back to this in a year or so haha. Thank you everyone for helping me with my admittedly non-compliant code. It really means a lot to me, as someone who is unable to afford a proper teacher for programming. I truly feel I have found a community online that I can look to for help. My final code is below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#include <iostream> 
#include <algorithm>
using namespace std;
struct data {
    char cow;
    int pos;
    int bin;
    int psum;
} cows [200002];
bool comp (data a, data b) {
    return a.pos<b.pos;
}
int main () {
    int n;
    cin>>n;
    for (int i=1; i<=n; i++) {
        cin>>cows[i].pos>>cows[i].cow;
    }
    sort (cows, cows+n+1, comp);
    for (int i=1; i<=n; i++) {
        if (cows[i].cow=='G')
        cows[i].bin=1;
        else {
            cows[i].bin=-1;
        }
    }
    for (int i=1; i<=n; i++) {
        cows[i].psum=cows[i].bin+cows[i-1].psum;
    }
    for (int i=0; i<=n; i++) {
        cows[i].psum=cows[i].psum+n;
    }
	int first [2*n+1];
   int last [2*n+1];
   fill (first, first+2*n+2, -1);
   for (int i=1; i<=n; i++) {
       if (first[cows[i].psum]==-1)
       first[cows[i].psum]=cows[i].pos;
       if (first[cows[i].psum]!=-1){
           if(first[cows[i].psum]>cows[i].pos)
           first[cows[i].psum]=cows[i].pos;
           }
       }
    fill (last, last+2*n+2, -1);
    for (int i=1; i<=n; i++) {
       if (last[cows[i].psum]==-1)
       last[cows[i].psum]=cows[i].pos;
       if (last[cows[i].psum]!=-1){
           if(last[cows[i].psum]<cows[i].pos)
           last[cows[i].psum]=cows[i].pos;
           }
       }
    int temp[n+1];
    for (int i=0; i<=n; i++) {
        temp[i]=cows[i].pos;
    }
	int dist [n+1];
    for (int i=0; i<=n; i++) {
        int ind=upper_bound(temp, temp+n, first[i+n])-temp;
        dist[i]=last[i+n]-cows[ind].pos;
    }
    int max=dist[max_element(dist, dist+n+1)-dist];
    int cd [n+1];
    fill (cd, cd+n+2, 0);
    int j;
    for (int i=1; i<n; i++) {
        j=i;
        while (j<n&&cows[j].cow==cows[j+1].cow) {
            j++;
        }
        cd[i]=cows[j].pos-cows[i].pos;
        i=j;
    }
     int maxx=cd[max_element (cd, cd+n+1)-cd];
        if (maxx>max)
        cout<<maxx<<endl;
        else {
            cout<<max<<endl;
        }   
}
I hope you didn't upload that and try it lol :) That code does NOT ACCEPT inputs. I don't have your test cases, and I am not going to sit around typing stuff in, so I shorted that part out.

Just make the change I said (initialize that array to zeros) in your version and try it.
1
2
    int cd [n+1];
    fill (cd, cd+n+2, 0);
out of bounds.
Your compare function is taking parameters passed by value.
1
2
3
 for (int i=1; i<=n; i++) {
        cin>>cows[i].pos>>cows[i].cow;
    }


Since this ignores cows[0], "sort" will not function correctly.

The array should contain entries from 0 to n-1. Not n.

Skipping 0, and placing an entry at cows[n] is not a format the sort function will "understand" when called with:

sort (cows, cows+n+1, comp);

This call will inform sort to start at position 0, but is also tells sort to include cows at n, which is more cows that were entered, and one of those cows will be invalid data.

That's not all that needs attention, but it jumps out at me.

Based on @ne555's last post, it seems you have a tendency to start arrays at 1 (which is it's own problem), but then to "pad" the parameters "just in case", which isn't correct either.

If there are 100 entries in an array, they run from 0 to 99. Measure, process and read from arrays exactly in that manner, and drop the habit of "padding" parameters "just so you get everything"....because, you're not getting everything, you're missing 0 and processing invalid entries in the array.

Last edited on
@niccolo I start my array at 1 in this problem because the prefix sum at position zero is zero, so I need to have that in the array.
So, you're saying zero holds a special value, but is that value supposed to be moved by sort?

If not, don't provide that to sort.
Topic archived. No new replies allowed.