simple function performance

Hi,

I am coding a simple task by both C++ and C#,and compare the results.
Each time I amend the algorithm to achive better, C++ takes the lead being 20-30% faster.

Last amend I did involved use of stacks/vectors, and C++ code went 10 times slower.
Problem is I can not see why, as code is identical for such drastic difference.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
static void ExaminePairOfRows(int i, int j)
{
	long long tempSum = 0;

	// tempSum accepts all sums of intermediate rows
	tempSum = buckets[j - 1][sqrtN - 1] - buckets[i][sqrtN - 1];

	vector<StackItem> leftVector;
	// construct left vector
	for (int k = sqrtN - 1; k >= 0; --k)
	{
		long long tempLeftSum = buckets[i][sqrtN - 1] - (k == 0 ? i == 0 ? 0 : buckets[i - 1][sqrtN - 1] : buckets[i][k - 1]);
		while (leftVector.size() != 0 && leftVector.back().sum > tempLeftSum)
		{
			leftVector.pop_back();
		}

		leftVector.push_back(StackItem(k + i * sqrtN, tempLeftSum));
	}

	vector<StackItem> rightVector;
	// construct right vector
	for (int k = 0; k < sqrtN && buckets[j][k] < ((long long)1 << 40); ++k)
	{
		long long tempRightSum = buckets[j][k] - buckets[j - 1][sqrtN - 1];
		while (rightVector.size() != 0 && rightVector.back().sum > tempRightSum)
		{
			rightVector.pop_back();
		}

		rightVector.push_back(StackItem(k + j * sqrtN, tempRightSum));
	}
	vector<StackItem> reversedrightVector;
	while (rightVector.size() != 0)
	{
		reversedrightVector.push_back(rightVector.back());
		rightVector.pop_back();
	}

	while (leftVector.size() != 0) // greedy run over left and reversed-right stacks
	{
		StackItem tempLeftItem = leftVector.back();
		leftVector.pop_back();
		StackItem tempReversedRightItem = StackItem(-1, 0);
		while (reversedrightVector.size() != 0 && (tempLeftItem.sum + tempSum + reversedrightVector.back().sum <= M))
		{
			tempReversedRightItem = reversedrightVector.back();
			reversedrightVector.pop_back();
		}

		if (tempReversedRightItem.index != -1)
		{
			int tempCount = tempReversedRightItem.index - tempLeftItem.index + 1;
			if (maxProperties< tempCount)
			{
				maxProperties = tempCount;
				firstIndex = tempLeftItem.index;
			}

		}
	}



}


the data is in 2 dim. long long array, after calculation of preffix sums in a row of numbers.

the above method loads row "i" right to left, in a leftVector,
loads row "j" left to right, in a rightVector,
and reverses the right one to reversedRightVector.

And then makes some check working with both lefr and rightReversed vectors...
There will be a lot of resizing going on with those vectors and stacks. What if you write it such that there will be no resizing? Try making them big enough at the start to hold all the values.
How does the C# code look like ?
a simple function, lol. My favorite is
long long tempLeftSum = buckets[i][sqrtN - 1] - (k == 0 ? i == 0 ? 0 : buckets[i - 1][sqrtN - 1] : buckets[i][k - 1]);

at the top, yeah, consider reserving space in advance to set the capacity, although not exactly sure on the resizes after removes
1
2
3
vector<StackItem> leftVector, rightVector;
leftVector.reserve(sqrtN);
rightVector.reserve(sqrtN);


instead of
1
2
3
4
5
6
        vector<StackItem> reversedrightVector;
	while (rightVector.size() != 0)
	{
		reversedrightVector.push_back(rightVector.back());
		rightVector.pop_back();
	}


you want
vector<StackItem> reversedrightVector(rightVector.rbegin(), rightVector.rend());

and since it seems you're not actually indexing things in the middle but adding/removing doing stuff at the end, you probably just want a std::stack
std::stack has implementation-defined performance characteristics. In the best case std::stack::push() is equivalent to std::vector::push_back(). In the worst case, it's equivalent to std::list::push_back().
@icy1, he may as well just go through the original vector forwards.

I don't know if I've done this right, but you could try this version

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
static void ExaminePairOfRows(int i, int j) {
    // sum accepts all sums of intermediate rows
    long long sum = buckets[j - 1][sqrtN - 1] - buckets[i][sqrtN - 1];

    long long leftSum = buckets[i][sqrtN - 1] - buckets[i][k - 1];
    vector<StackItem> left;
    left.reserve(sqrtN);
    // construct left vector
    for (int k = sqrtN - 1; k > 0; --k) {
        while (left.size() != 0 && left.back().sum > leftSum)
            left.pop_back();
        left.push_back(StackItem(k + i * sqrtN, leftSum));
    }
    leftSum = buckets[i][sqrtN - 1] - (i == 0 ? 0 : buckets[i - 1][sqrtN - 1]);
    while (left.size() != 0 && left.back().sum > leftSum)
        left.pop_back();
    left.push_back(StackItem(k + i * sqrtN, leftSum));

    vector<StackItem> right;
    right.reserve(sqrtN);
    // construct right vector
    for (int k = 0; k < sqrtN && buckets[j][k] < (1LL << 40); ++k) {
        long long rightSum = buckets[j][k] - buckets[j - 1][sqrtN - 1];
        while (right.size() != 0 && right.back().sum > rightSum)
            right.pop_back();
        right.push_back(StackItem(k + j * sqrtN, rightSum));
    }

    // greedy run over left and reversed-right stacks
    size_t i = 0;
    while (left.size() != 0) {
        StackItem *rightItem = nullptr;
        while (i < right.size() && left.back().sum + sum + right[i].sum <= M)
            rightItem = &right[i++];
        if (rightItem) {
            int count = rightItem->index - left.back().index + 1;
            if (maxProperties < count) {
                maxProperties = count;
                firstIndex = left.back().index;
            }
        }
        left.pop_back();
    }
}

Last edited on
@helios never mind, i misunderstood the internals of std::stack. I thought it'd be a cut-down STL container, but it's actually using vectors or similar internally with a reduced public API.
Kinda weird.
Last edited on
thank you all guys, appreciate your help.

I tried with stack container first,
but when poor result came, I thought to use the underlying vector directly,
and switched to vectors. Thought also of the container resizing overhead,
but obviously the way I did it was insufficient ( I tried vector<StackItem> leftVector(sqrtN); )

I was unawere of 'reserve' way to do it, so I amended it this way now.

The reversed iterator instead of copying right to rightReversed vectors is also nice idea,
but I will put it at work later, just to have both # and ++ codes identical.

Summing it up, slight improvement from 120sec to 96sec execution time( C++ ),
away from 1.8sec ( C# ). Despite that, I still believe the bottle neck lies in this function - if I commenent it out, C++ takes it's regular 20-30% speed advantage.

Last edited on
I thought also, about the access to this 'buckets' array, which is feeding the stacks -
did it once by static, and once by dynamic allocation, it was the same.
I want to see the C# implementation. Something must be seriously wrong for a nearly 100x slowdown.
is this thing called once with big data or many times with smaller chunks?
Are you using threads?
Last edited on
here is the c# text of the same idea:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
public static void ExaminePairOfRows(int i, int j)
    {
        long tempSum = 0;

        // tempSum accepts all sums of intermediate buckets
        tempSum = buckets[j-1,sqrtN-1]-buckets[i,sqrtN-1];

        Stack<StackItem> leftStack = new Stack<StackItem>(sqrtN);
        // construct left stack
        for (int k = sqrtN - 1; k >= 0; --k)
        {
            long tempLeftSum = buckets[i, sqrtN - 1] - (k == 0 ? i==0 ? 0: buckets[i-1, sqrtN-1] : buckets[i, k - 1]);
            while (leftStack.Count != 0 && leftStack.Peek().sum > tempLeftSum)
            {
                leftStack.Pop();
            }

            leftStack.Push(new StackItem(k+i*sqrtN, tempLeftSum));
        }

        Stack<StackItem> rightStack = new Stack<StackItem>(sqrtN);
        // construct right stack
        for (int k=0; k < sqrtN && buckets[j, k] < ((long)1 << 40); ++k)
        {
            long tempRightSum = buckets[j, k]-buckets[j-1,sqrtN-1];
            while (rightStack.Count !=0 && rightStack.Peek().sum > tempRightSum )
            {
                rightStack.Pop();
            }

            rightStack.Push(new StackItem(k+j*sqrtN, tempRightSum));
        }

        Stack<StackItem> reversedRightStack = new Stack<StackItem>(sqrtN);
        while (rightStack.Count != 0)
        {
            reversedRightStack.Push(rightStack.Pop());
        }

        while(leftStack.Count!=0) // greedy run over left and reversed-right stacks
        {
            StackItem tempLeftItem = leftStack.Pop();
            StackItem tempReversedRightItem = new StackItem(-1,0);
            while (reversedRightStack.Count!=0 && ( tempLeftItem.sum + tempSum + reversedRightStack.Peek().sum <=M ) )
            {
                tempReversedRightItem = reversedRightStack.Pop();
            }

            if(tempReversedRightItem.index!=-1)
            {
                int tempCount = tempReversedRightItem.index-tempLeftItem.index+1;
                if ( maxProperties< tempCount)
                {
                    maxProperties = tempCount;
                    firstIndex = tempLeftItem.index;
                }

            }
        }


the program reads data from a text file which cosists of 2 rows : 1st row has 2 integers, 2nd row has 500,000 integers limited within -1,000,000 to +1,000,000.



Those 500,000 integers are stored in the buckets[708][707] array ( 500,000 items divided at sqrt(500,000) rows and columns, but not as they come from the text file - I stored them like a preffix array precalculated sums( array[j]-array[i] gives me the sum from i-th integer to j-th integer as they come from the text file ), idea is not sum numbers all the time, but to have sums as O(1).

up to this moment C# performs for about 50-60ms, C++ performs bit faster.

Then I use loop to call this function for each pair of buckets row i and j, where i<j,
to make my formal checks. There is one more method in the loop, which processes single row of the table, but it works fine - when I leave him alone ( the problematic one with stacks/vectors put in comments ) C++ behaves faster as expected.

I started to investigate my Visual Studio settings, is it possible to have some king of restriction there?
What's the type of 'buckets'? Show its initialization, as well.
Also, the type of 'buckets' in the C++ version.
Last edited on
Here is the table in the C# code:
Globals.buckets = new long[Globals.N / Globals.sqrtN +(Globals.N % Globals.sqrtN >0 ? 1:0), Globals.sqrtN];

Here it is in the C++ code:
static long long buckets[708][707];
, as I wrote up, I did it also as **buckets=new ... - worked the same as this one

and it's initialization ( equivalent in C# )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
static void InitBuckets()
{
	int k = 0;
	for (int i = 0; i < N / sqrtN + 1; ++i) // N/sqrtN + 1 rows ( buckets )
	{
		if (k < N)
		{
			long long temp = 0;
			fscanf_s(In, "%lld", &temp);
			buckets[i][0] = temp + (i == 0 ? 0 : buckets[i - 1][sqrtN - 1]);
			k++;
		}
		else
		{
			buckets[i][0] = (long long)1 << 40;  // completion of last bucket with items > 1,000,000 * 500,000
		}

		for (int j = 1; j < sqrtN; ++j) // sqrtN colums ( element in a bucket )
		{
			if (k < N)
			{
				long long temp = 0;
				fscanf_s(In, "%lld", &temp);
				buckets[i][j] = buckets[i][j - 1] + temp;
				k++;
			}
			else
			{
				buckets[i][j] = (long long)1 << 40;  // completion of last bucket with items > 1,000,000 * 500,000
			}

		}

	}
}
yes visual studio has a lot of settings that can affect performance. Did you compile it in 'release' mode? That is the first thing to check, as debug mode is often significantly slower than release mode.
I think I have tested it with release debug option in Visual Studio,
but I am still not that well familiar with that IDE tricks and treats - that was not the problem.

I thought also of StackItems improper memory management eventually -
as far as I followed it constructors/ destructors it seems ok,
but anyway, here is it's code, might be I can't see something in front of me:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class StackItem
{
public:

	int index;
	long long sum;

	StackItem()
	{
		this->index = -1;
		this->sum = (long long)1 << 40;
		//fprintf_s(stdout, "default constructor used\n");
	}

	StackItem(int i, long long s)
	{
		this->index = i;
		this->sum = s;
		//fprintf_s(stdout, "custom constructor used\n");
	}

	~StackItem()
	{
		//fprintf_s(stdout, "destructor used\n");
	}

	StackItem(const StackItem &Other)
	{
		this->index = Other.index;
		this->sum = Other.sum;
		//fprintf_s(stdout, "copy constructor used\n");
	}


};
just give a fully working mini-example, including mini-example of the file (or just put in sample values for InitBuckets() ) and the loop that calls the function.

Something like this: https://repl.it/@icy_1/UsedCheapVendors

If you start writing destructor or copy constructor, then the compiler will no longer provide a move constructor by default --> this may make a slight difference when you emplace_back instead of push_back. There's nothing special about your StackItem class, so just the constructors are enough.

Once there's a way to reproduce expected results, it may be clearer for ways to refactor.

As others have said, I also think you didn't build in release mode. DEBUG and RELEASE_WITH_DEBUG is not quite RELEASE
Topic archived. No new replies allowed.