sorting a heap

I've been trying to get a functional Heap Sort working, and well, it is functional, but the function to sort the heap takes too long, I could really do with some help on that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
sf::Time heapify(std::vector<int>& heapArray, size_t heapSize)
{
    heapSize--;
    int holdParentIndex = -1;
    sf::Clock clock;
    int loop = 0;

    while (isHeap(heapArray, heapSize + 1) == false)
    {
        //std::cout << "loop " << loop << std::endl;
        for (unsigned int i = heapSize;; i--)
        {
            if (i == 0) break;

            holdParentIndex = parentIndex(i);

            if (i % 2 == 0) //if right child
            {
                if (heapArray[i] > heapArray[holdParentIndex]) //if bigger than parent
                {
                    swapHeapValues(i, holdParentIndex, heapArray);
                }
                else if (heapArray[i - 1] > heapArray[holdParentIndex]) //otherwise if brother bigger than parent
                {
                    swapHeapValues(i - 1, holdParentIndex, heapArray);
                }
            }
            else if (i == heapSize) //otherwise if the last value in the heap. doesn't have a brother.
            {
                if (heapArray[i] > heapArray[holdParentIndex]) //if bigger than parent
                {
                    swapHeapValues(i, holdParentIndex, heapArray);
                }
            }
        }
        loop++;
    }
    return clock.getElapsedTime();
}


that makes a heap, this sorts it

1
2
3
4
5
6
7
8
9
10
11
12
13
sf::Time heapSort(std::vector<int>& heapArray)
{
    size_t unsortedSize = heapArray.size();
    sf::Clock clock;

    while (unsortedSize > 0)
    {
        swapHeapValues(0, unsortedSize - 1, heapArray);
        heapify(heapArray, unsortedSize - 1);
        unsortedSize--;
    }
    return clock.getElapsedTime()
}


the heapify function is quick, 12.5 seconds to heapify a random vector of 100 million ints, whereas the heapSort function is slow, 12.5 seconds to sort a heap of 18500 ints
Last edited on
You should not be heapifying in your loop. Do that only once, at the beginning of your function. After that, you only need a very inexpensive push-down/sink/apply heap property/whatever on the root node each time through the loop to maintain the heap.

Your post was moved and edited, apparently, so it no longer has your function to get a child parent.

[edit] You don't need to find the parent. You need to find children. Apply the heap property (that is, "sink" or "push down" or whatever) to a parent. The following commentary about number of operations still applies: [/edit]

As you had it, you were performing way too many operations to determine the child index. As explained with the Heapsort FAQ
http://www.cplusplus.com/faq/sequences/sequencing/sort-algorithms/heapsort/#heap-structure
you only need an inexpensive shift and add to get the left child's address:
child = (parent << 1) + 1;
(which is the same as:
child = (parent * 2) + 1;
). Since you are using a binary heap, all you need to access the children is heapArray[child] and heapArray[child+1]. You do need to make sure not to access an invalid child, BTW (that is, make sure that child < unsortedSize and (child + 1) < unsortedSize before you try to access them.

Hope this helps.
I deleted my post from that thread, as I realized I should have started a new one anyway.

1
2
3
4
5
6
7
8
9
int parentIndex(unsigned int i)
{
    if (i % 2 == 0) //if i is a rightChild(even index)
        return (i / 2) - 1;
    else if (i > 0) //if i is a leftChild(odd index) and not root
        return (i / 2);
    else
        return -1; //root has no parent
}


Because arrays start at 0, not 1, i / 2 isn't a concrete formula for getting the parent, anyway that seems to be the most optimized way of doing it.

I can't reply to the rest of your post for now, I've got to go, I'll read it and try to get a better implementation of sorting the heap going.

I appreciate it, thanks.
Last edited on
Ah, good thinking about where to post.

To calculate parent index:
1
2
3
4
inline unsigned parentIndex( unsigned i )
{
    return (i - 1) / 2;
}

This really is the most optimized, concrete way to do it in C and C++, where arrays start at zero.

There is also a std::swap() algorithm that might be useful to you:

std::swap(heapArray[i], heapArray[holdParentIndex]);

Good luck!
cool, didn't think of doing (i - 1) / 2 instead of (i / 2) - 1, that solves the problem of odd/even, I know about inline but I don't bother, the compiler will end up deciding, me typing it is a waste of 0.5940(recurring)..2 seconds.

and now using std::swap ^^

1
2
3
4
5
6
int parentIndex(unsigned int i)
{
    if (i != 0) return (i - 1) / 2;

    return -1;
}


I still want -1 for root, as it has no parent.

It's much faster, but it doesn't sort properly, there's 5-13 situations in a 100 int vector where parent < child, and I can't seem to figure out why.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
sf::Time heapify(std::vector<int>& heapArray)
{
    sf::Clock clock;
    int holdParentIndex = -1;
    size_t heapSize = heapArray.size() - 1;

    while (isHeap(heapArray) == false)
    {
        for (unsigned int i = heapSize;; i--)
        {
            if (i == 0) break;

            holdParentIndex = parentIndex(i);

            if (i % 2 == 0) //if right child
            {
                if (heapArray[i] > heapArray[holdParentIndex]) //if bigger than parent
                {
                    std::swap(heapArray[i], heapArray[holdParentIndex]);
                }
                else if (heapArray[i - 1] > heapArray[holdParentIndex]) //otherwise if brother bigger than parent
                {
                    std::swap(heapArray[i - 1], heapArray[holdParentIndex]);
                }
            }
            else if (i == heapSize && heapArray[i] > heapArray[holdParentIndex]) //otherwise if the last value in the heap. doesn't have a brother.
            {
                std::swap(heapArray[i], heapArray[holdParentIndex]);
            }
        }
    }
    return clock.getElapsedTime();
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sf::Time sink(unsigned int index, std::vector<int>& heapArray, size_t heapSize)
{
    sf::Clock clock;
    heapSize--;

    if (index >= heapSize)
        return clock.getElapsedTime();

    unsigned int leftChild;

    while (true)
    {
        leftChild = index * 2 + 1;

        if (leftChild > heapSize)
        {
            break;
        }
        else if (leftChild + 1 <= heapSize && heapArray[leftChild + 1] > heapArray[leftChild])
        {
            std::swap(heapArray[index], heapArray[leftChild + 1]);
            index = leftChild + 1;
            continue;
        }
        std::swap(heapArray[index], heapArray[leftChild]);
        index = leftChild;
    }

    return clock.getElapsedTime();
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
sf::Time buildHeap(std::vector<int>& heapArray)
{
    sf::Clock clock;

    if (heapArray.size() == 0)
        return clock.getElapsedTime();

    size_t unsortedSize = heapArray.size() - 1;

    heapify(heapArray);
    while (unsortedSize > 0)
    {
        std::swap(heapArray[0], heapArray[unsortedSize]);
        unsortedSize--;
        sink(0, heapArray, unsortedSize + 1);
    }
    return clock.getElapsedTime();
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int main()
{
    srand(time(0));

    std::vector<int> heapArray;

    for (int i = 0; i < 100; i++)
    {
        heapArray.push_back(rand() % 1000);
    }

    std::cout << "building" << std::endl;
    sf::Time buildElapsed = buildHeap(heapArray);
    std::cout << "it took " << buildElapsed.asSeconds() << " seconds to build " << heapArray.size() << " elements" << std::endl;

    std::cout << sortErrors(heapArray) << std::endl;

    if (heapArray.size() <= 100)
    {
        for (int heapData : heapArray)
        {
            std::cout << heapData << ", ";
        }

        std::cout << std::endl;
    }
}



building
it took 0.000296 seconds to build 100 elements
4 (errors, parent < child)
<prints array...>


is it not sinking properly?

Last edited on
The number of errors you get should be about random.

I haven't spent very much time looking at your code, but I am tired right now. You are not applying the heap property correctly. The heapify() function really ought to be using your sink() function instead of whatever it is you are doing.

Your sink() function needs a little more thought also. It behaves badly if there is only one child, and you are duplicating effort.

You really ought to spend some time over at the FAQ page looking at the pretty animations I spent hours creating for you, and work on the algorithm listed there.
http://www.cplusplus.com/faq/sequences/sequencing/sort-algorithms/heapsort/#heap-property

Good luck!
Okay, I've decided to try this again, but I haven't even had a chance to get into sorting the heap because heapifying it the new way is twice as slow.

this is the old way, 8 and a half seconds to heapify 10 million ints in Debug with debugging symbols and code profiling (-g; -pg)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
sf::Time heapify(std::vector<int>& heapArray)
{
    auto heapSize = heapArray.size() - 1;
    int holdParentIndex = -1;
    sf::Clock clock;
    int loop = 0;

    while (isHeap(heapArray) == false)
    {
        //std::cout << "loop " << loop << std::endl;
        for (unsigned int i = heapSize;; i--)
        {
            if (i == 0) break;

            holdParentIndex = parentIndex(i);

            if (i % 2 == 0) //if right child
            {
                if (heapArray[i] > heapArray[holdParentIndex]) //if bigger than parent
                {
                    std::swap(heapArray[i], heapArray[holdParentIndex]);
                }
                else if (heapArray[i - 1] > heapArray[holdParentIndex]) //otherwise if brother bigger than parent
                {
                    std::swap(heapArray[i - 1], heapArray[holdParentIndex]);
                }
            }
            else if (i == heapSize) //otherwise if the last value in the heap. doesn't have a brother.
            {
                if (heapArray[i] > heapArray[holdParentIndex]) //if bigger than parent
                {
                    std::swap(heapArray[i], heapArray[holdParentIndex]);
                }
            }
        }
        loop++;
    }
    return clock.getElapsedTime();
}


now this is the new code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sf::Time heapify(std::vector<int>& heapArray)
{
    sf::Clock clock;

    //to calculate the last node of a heap which has at least 1 child, divide the heap by 2 and round down

    for (auto i = (heapArray.size() - 1) / 2;; --i)
    {
        sink(i, heapArray);

        if (i == 0) break;
    }

    return clock.getElapsedTime();
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
sf::Time sink(unsigned int index, std::vector<int>& heapArray)
{
    sf::Clock clock;
    unsigned int leftChild = 0;
    unsigned int biggestChild = 0;
    auto lastHeapIndex = heapArray.size() - 1;

    while (true)
    {
        //formula for finding the left child of a parent node
        leftChild = (index * 2) + 1;

        //if left child is out of bounds to the heapArray, index has no children, so cannot be sunk
        if (leftChild > lastHeapIndex)
        {
            break;
        }

        //assume the left child is the biggest
        biggestChild = leftChild;

        //if there is a right child
        if (leftChild + 1 <= lastHeapIndex)
        {
            //if the right child is bigger than the left child
            if (heapArray[leftChild + 1] > heapArray[leftChild])
            {
                biggestChild = leftChild + 1;
            }
            //otherwise, biggest is already set to the left child, no need to set it again
        }

        //check if biggest child is bigger than the parent
        if (heapArray[biggestChild] > heapArray[index])
        {
            std::swap(heapArray[biggestChild], heapArray[index]);
            index = biggestChild; //keep sinking
        }
        else
        {
            break;
        }
    }
    return clock.getElapsedTime();
}


the new code looks much better, and I honestly do not see why it is so much slower, it takes 15 seconds for it to do its thing on 10 million ints instead of the old ways 8.5 seconds, running the code::blocks code profiler doesn't reveal anything amazingly helpful either.
Last edited on
Topic archived. No new replies allowed.