Linked List Union and Intersection problem

I am working on a program that will have several sets of data (in files) and will have to evaluate operations given to it by a script file. The operations are union (+) and Intersection (*). Union is all the words that appear in either set, Intersection is only the words that appear in both sets.

I have gotten the program mostly done, the only problem is that, for some reason, when the sets get fairly large (10000 words or more) it seems to stop working properly, but it works fine with slightly smaller sets. I would post the sets and the output, but they are fairly large and I don't want to clog up the post.

The specific problem is that the expected output of the case that I am working on should have 2584 words, but my output has 3317. The extra words are not duplicates, they are just extras that don't belong there. I thought that it might be either my union or intersection functions, but I have tried 3 different intersection functions and 3 different union functions that all give me the same wrong answer.

Here is the script file I am reading:
1
2
3
4
5
6
7
8
read(A,'set6.txt')
read(B,'set7.txt')
read(C,'set8.txt')
read(D,'set9.txt')
R1=(A+B)*C
R2=B*(C+D)
R3=R1+R2
write(R3,'out6.txt')


My program successfully reads this, stores all the names and files, and sends the expressions to my evaluate function, then stores the answers into an array of linked lists that I have. Each text file is also stored into a linked list that it inside an array of linked lists that contains all the data sets.

Here is my favorite union and intersection functions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
template <class M>
LinkedList<string> LinkedList<M>::Intersection2(LinkedList<string> L2)
{
	   LinkedList<M> Intersection;

	   Node<M> *temp;
	   Node<M> *temp2;

	   temp = head;
	   temp2 = L2.head;

	   while(temp != NULL)
	   {
		   if(temp->data < temp2->data)
			   temp = temp->next;
		   else if(temp->data > temp2->data)
			   temp2 = temp2->next;
		   else if (temp->data == temp2->data)
		   {
			   Intersection.push_back(temp->data);
			   temp = temp->next;
			   temp2 = temp2->next;
		   }
	   }

	   return Intersection;
}

template <class M>
LinkedList<M> LinkedList<M>::Union(LinkedList<M> L2)
{
	LinkedList<string> merged;
	Node<M> *temp;
	Node<M> *temp2;
	temp = head;
	temp2 = L2.head;

	while(temp != NULL)
	{
		merged.push_back(temp->data);
		temp = temp->next;
	}

	while(temp2 != NULL)
	{
		merged.push_back(temp2->data);
		temp2 = temp2->next;
	}

	//merged.noDuplicates();
	merged.mergeSort();
	//merged.removeDuplicates();
	merged.noDups();

	return merged;
}


That intersection function seems to be my most efficient, but the problem is that I have to make sure I pass it the largest and smallest set in the right order, and i have no way to check that in my program.

Here is my noDups() function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
template <class M>
void LinkedList<M>::noDups()
{
	 Node<M> *Curr;
	 Node<M> *trail;

    Curr=head;

    while(Curr->next != NULL)
	 {
	    trail = Curr;
	    Curr = Curr->next;

	    if(trail->data == Curr->data)
	    {
	       trail->next = Curr->next;
	       delete Curr;
	       Curr = trail;
	    }
	 }

}


And here is my evaluate function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
LinkedList<string> Evaluate(string exp, LinkedList<string> set[], string fnames[], LinkedList<string> expr[], string exprNames[])
{
	cout << "Expression: " << exp << endl;
	cout << "J: " << j << endl << "M: " << m << endl << "L: " << l << endl << "O: " << o << endl;

	Stack<LinkedList<string> > temp(100);

		int size = exp.length();

		ofstream imp("testing.txt");


		//cout << "l: " << l << endl << "J: " << j << endl << "o: " << o << endl;

		for (int p = 0; p < j; p++)
		{
			if(exp.find(fnames[p]) != string::npos)
			{
				cout << "Found: " << fnames[p] << endl;
				temp.push(set[p]);
			}
		}

		for(int y = 0; y < l; y++)
		{
			if(exp.find(exprNames[y]) != string::npos)
			{
				cout << "Found: " << exprNames[y] << endl;
				temp.push(expr[y]);
			}
		}

		for (int i = 0; i < size; ++i)
		{

		    if (exp[i] == '+' || exp[i] == '*')
			{
				LinkedList<string> val1 = temp.top();
				temp.pop();
				val1.outFileF(imp);

//			    cout << endl << "Val1: " << endl;
//				val1.displayF();

				LinkedList<string> val2 = temp.top();
				temp.pop();
				//				cout << endl << "Val2: " << endl;
				//				val2.displayF();

				switch (exp[i])
				{
				  case '+':
					  cout << "==========Before union!" << endl;
					  temp.push(val1.Union(val2));
					  //temp.push(val2.getUnion(val1));
					  cout << "==========after union!" << endl;
					  break;
			      case '*':
			    	  cout << "---------before intersection" << endl;
			    	  //temp.push(val1.Intersection(val2));
			    	  temp.push(val2.Intersection2(val1));
			    	  //temp.push(val2.getIntersection(val1));
			    	  cout << "---------after intersection" << endl;
			    	  break;
		        }

			}
		}
		return temp.top();
}


I'm thinking the problem must be somewhere in the evaluate function, but the cout statements tell me it's using the correct sets and everything. The string arrays that are passed to the evaluate function are parallel arrays that contain the names of the sets. I figured that would be an easy way to search for a specific set.

I'm sorry this post is so long, but I wanted to make sure there was enough info for someone to help me. Thanks in advance!
Last edited on
I think that I found the error, but I'm not sure how to fix it.

When an expression like (A+B)*C is passed to it (as it's postfix version: AB+C*), it evaluates it as ABC+*. I believe this is because of how I check to see if a file name or expression name is in the expression at the top. It is checking the entire expression for a file name, then pushing it to the stack, THEN it looks for expression names and then operators. This also means if I pass it R1BC*+, I get BCR1* before it gives an error.

Could anyone give me any suggestions on how to fix this?
I've fixed it by adding this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
stringstream ss(expression);

	ofstream v1("val1.txt");
	ofstream v2("val2.txt");

	string element;

	string elements[100];
	int i = 0;
	while (ss >> element)
	{

		elements[i] = element;
		i++;
	}

	for(int p = 0; p < i; p++)
	{
		int ind1 = bSearch(fnames, j, elements[p]);
		int ind2 = bSearch(exprNames, l, elements[p]);

		if(ind1 != -1)
		{
			cout << "Found: " << fnames[ind1] << endl;
			temp.push(set[ind1]);
		}
		else if(ind2 != -1)
		{
			cout << "Found: " << exprNames[ind2] << endl;
			temp.push(exp[ind2]);
		}
		else if(elements[p] == "+" || elements[p] == "*")
		{


The only problem is that this depends on the expression being separated with spaces, and I can't figure out how to do that properly. I have another post that is specifically about that, so if someone could go take a look at it and help me, I would be very appreciative.
The only problem is that this depends on the expression being separated with spaces


Here is the script file I am reading:
1
2
3
4
5
6
7
8
read(A,'set6.txt')
read(B,'set7.txt')
read(C,'set8.txt')
read(D,'set9.txt')
R1=(A+B)*C
R2=B*(C+D)
R3=R1+R2
write(R3,'out6.txt')


Sorry, I didn't look in detail at the code. However, it seems you are transforming (in some way that I'm not sure of) the above input into something which looks like this:
 
"R2R1R2+*R1+"


Wouldn't it be simpler to insert the spaces during that transformation process, rather than omit them and then try to figure out afterwards where they should have gone?
It probably would be, but I am having trouble figuring out how to do that. The transformation that the expressions are going through is that they are being converted from infix notation to postfix notation.

That R2R1R2+*R1+ actually came from a different script:
1
2
3
4
5
6
7
8
read(A,'set6.txt')
read(B,'set7.txt')
read(C,'set8.txt')
read(D,'set9.txt')
R1=(A+B)*C
R2=A+B*(C+D)
R3=R2*(R1+R2)+R1
write(R3,'out7.txt')


I got my InfixToPostfix function from online, then modified it slightly to detect some errors. I can't remember where I got it....
Here it is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
string InfixToPostfix(string expression)
{
	Stack<char> S(100);

	int balance = 0, size = expression.length();

	string postfix = "";
	for(int i = 0;i < size;i++)
	{
		if(expression[i] == ' ' || expression[i] == ',')
			continue;

		else if(IsOperator(expression[i]))
		{

			if(expression[i] != '+' && expression[i] != '*')
			{
				cout << "INVALID EXPRESSION: invalid operator = " << expression[i] << endl;
			}
			if(IsOperator(expression[i+1]))
			{
				cout << "ERROR: Invalid expression (extra operator)" << endl;
				return "ERROR: Invalid expression (extra operator)";
			}

			else
			{
				while(!S.isEmptyStack() && S.top() != '(' && HasHigherPrecedence(S.top(),expression[i]))
				{
					//postfix += " ";
					postfix+= S.top();
					//postfix += " ";
					S.pop();
				}
				S.push(expression[i]);
			}

		}

		else if(IsOperand(expression[i]))
		{
			postfix += expression[i];
		}

		else if (expression[i] == '(')
		{
			if (balance == 0)
			{
				balance++;
				S.push(expression[i]);
			}
			else
				cout << "The expression " << expression << " is invalid: Extra '('" << endl;
		}

		else if(expression[i] == ')')
		{
			balance--;
			if (balance >= 0)
			{
				while(!S.isEmptyStack() && S.top() !=  '(') {
					postfix += S.top();
					S.pop();
				}
				S.pop();
			}
			else
				cout << "The expression " << expression << " is invalid: Extra ')'" << endl;
		}
	}

	while(!S.isEmptyStack()) {
		postfix += S.top();
		S.pop();
	}

	return postfix;
}


If I pass it something like "C * D" it returns "CD*" which is the correct postfix notation, but it gets rid of the spaces.
Topic archived. No new replies allowed.