memory use in recursive tree algorithm

I've got a code where I need to create a map with key values as double (value of the f-test between two clusters, I need to calculate the residual sum of squares for this) and the mapped value of cluspair which is pair of the class Cluster that I created. Map aims to store the F-test values of each cluster with the cluster that gives the minimum f-value for it. Therefore I would not need to do the calculation again and again in every step. BTW cluster is a tree structure where every cluster contains two subclusters and the stored values are 70-dimensional vectors.

Problem is, in order to calculate the RSS, I need to implement a recursive code where I need to find the distance of every element of the cluster with the mean of the cluster and this seems to be consuming an enormous amount of memory. When I create the same map with the key values being the simple distance between the means of two clusters, the program uses minimal memory so I think the increase in the memory use is caused by the call of the recursive function RSS. What should I do to manage the memory use in the code below? In its current implementation the system runs out of memory and windows closes the application saying that the system ran out of virtual memory.

The main code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
    map<double,cluspair> createRSSMap( list<Cluster*> cluslist )
    {
            list<Cluster*>::iterator it1;
            list<Cluster*>::iterator it2;

            map<double,cluspair> rtrnmap;


            for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
            {
                it2=it1;
                ++it2;
                cout << ".";

                list<Cluster*>::iterator itc;
                double cFvalue=10000000000000000000;
                double rIt1 = (*it1)->rss();

                for(int kk=0 ; it2!=cluslist.end(); it2++)
                {

                    Cluster tclustr ((*it1) , (*it2));
                    double r1 = tclustr.rss();
                    double r2= rIt1 + (*it2)->rss();
                    int df2 = tclustr.getNumOfVecs() - 2;

                    double fvalue = (r1 - r2) / (r2 / df2);

                    if(fvalue<cFvalue)
                    {
                        cFvalue=fvalue;
                        itc=it2;
                    }
                }


                cluspair clp;
                clp.c1 = *it1;
                clp.c2 = *itc;


                bool doesexists = (rtrnmap.find(cFvalue) != rtrnmap.end());

                while(rtrnmap)
                {
                    cFvalue+= 0.000000001;
                    rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end());
                }

                rtrnmap[cFvalue] = clp;


            }

            return rtrnmap;
    }


and the imlementation of the function RSS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
double Cluster::rss()
{
    return rss(cnode->mean);
}

double Cluster::rss(vector<double> &cmean)
{
    if(cnode->numOfVecs==1)
    {
        return vectorDist(cmean,cnode->mean);
    }
    else
    {
        return ( ec1->rss(cmean) + ec2->rss(cmean) );       
    }
}

Much thanks in advance. I really don't know what to do at this point.
Last edited on
Btw below is the code that I use to create a map with keys being simple euclidian distance between two cluster means. As I've said above, it is quite similar and uses minimal memory. It only differs in the calculation of the fvalue. Instead of the recursive calculation, there is the calculation of simple distance of means of two clusters. Hope it helps to identify the problem

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
map<double,cluspair> createDistMap( list<Cluster*> cluslist )
{
        list<Cluster*>::iterator it1;
        list<Cluster*>::iterator it2;

        map<double,cluspair> rtrnmap;


        for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
        {
            it2=it1;
            ++it2;
            cout << ".";

            list<Cluster*>::iterator itc;
            double cDist=1000000000000000;

            for(int kk=0 ; it2!=cluslist.end(); it2++)
            {
                double nDist = vectorDist( (*it1)->getMean(),(*it2)->getMean());
                if (nDist<cDist)
                {
                    cDist = nDist;
                    itc=it2;
                }
            }   

            cluspair clp;
            clp.c1 = *it1;
          clp.c2 = *itc;



            bool doesexists = (rtrnmap.find(cDist) != rtrnmap.end());

            while(doesexists)
            {
                cDist+= 0.000000001;
                doesexists  = (rtrnmap.find(cDist) != rtrnmap.end());
            }

            rtrnmap[cDist] = clp;

        }

        return rtrnmap;
}
Last edited on
You should check this line:

for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)

If they are even sized, they are going to pass each other.

Say begin is 0 and end is 2, for the first run..

it1 is 0 and it checks against 1 (because it is reduced before the check).

For the next pass, it1 is 1 and is checked against 0 now, so they will pass each other and you have a problem. Plus, if begin == end, you have a problem also. You need to rethink your for loops from what I can see.

EDIT: Scratch most of that.... I was treating end() like a var. If end == begin though, there might still be a problem.
Last edited on
Thanks for the input. You are right, if begin is same with end this would be a problem but the algorithm deals with a large amount of data and therefore I didn't think I should be worrying for that case. It just seemed straigtforward to implement in such a way and as you said, end() is not a variable. Therefore I think the implementation of the for loop is ok.
Last edited on
Topic archived. No new replies allowed.