Merge Sort Using Getlines

Hey everyone,

I'm having issues with conceptualizing the sort function that I will need for my program. I am able to read both "data1.txt" and "data2.txt" into "merge.txt" However, it does so in the following manner:

data1.txt contains information as follows:

A
C
E


data2.txt contains information as follows:

B
D
F


My current merge.txt is outputting the following data:

A
C
E
B
D
F


However, merge.txt should be outputting:


A
B
C
D
E
F



My current code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/*zeta4321
Program Name
Objectives:
	1. Merge two sorted text files ("data1.txt" and "data2.txt") into a single text file called "merge.txt"
*/

    #include <iostream>
    #include <string>
    #include <fstream>
    using namespace std;
    int main()
    {
    //declare two files for reading
    ifstream read1( "data1.txt");
    ifstream read2("data2.txt");
    //declare file for writing
    ofstream write ("merge.txt");
    string line;
    string line2;
    while ( getline ( read1, line, '\n' ) )
    {
    write << line << endl;
    }
    while ( getline ( read2, line2, '\n' ) )
    {
    write << line2 << endl;
    }
    read1.close();
    read2.close();
    write.close();
    }


TL;DR Could someone please help me with the sort function for this program?
So far you are immediately printing into the new file what you are reading from the first. To control the order of letters printed in the merge.txt file you'll have to add an intermediate step to keep track/organize what is read from data1 and data2.

In summary:

1-read data1.txt and store each letter into a string or array
2-read data2.txt and store each letter at the end of the same string or array
3-make a sort function with temps to order the chars as you wish
4-print the resulting string in merge.txt
I've done some re-thinking of my overall strategy, but I'm still a little confused: can I mix getline with arrays? Here's the pseudocode as to what I'm thinking right now.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ifstream read1( "data1.txt");
ifstream read2("data2.txt");
string line [500];
string line2 [500];
string elem1 [500];
string elem2 [500];

if (!read1.eof()) && (!read2.eof())
  {if elem1[x]<elem2[x]
    getline ( read1, line[x], '\n' ) )
     elem1++

else getline ( read2, line2[x], '\n' ) )
     elem2++}

while (!read1.eof()) && (!read2.eof())
  if one of the input files still contains more elements:
    append all its remaining elements to output


Thanks in advance for your help!
This is a standard file-processing problem.

The logic goes something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    read line1 from file one
    read line2 from file two
    
    while (not eof file one and not eof file two)
    {
        if (line1 <= line2) 
        {
            write line1 to output
            read line1 from file one
        }
        else
        {
            write line2 to output
            read line2 from file two
        }
    }

    while (not end of file one)
    {
        write line1 to output
        read line1 from file one
    }

    while (not end of file two)
        write line2 to output
        read line2 from file two
    }


In the context of C++ streams, rather than testing .eof() it might be better to test .fail().

That's because when reading the last line, data is successfully read into the string, and eof() flag is set. On the next attempted getline, there is no more data to be retrieved, and the fail() flag is set.
Last edited on
@Chervil

I've tried to implement your logic within my program, but I'm still running into a couple of errors. I'll post my updated code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
//declare two files for reading
ifstream read1( "data1.txt");
ifstream read2("data2.txt");
//declare file for writing
ofstream write ("merge.txt");
string line1;
string line2;

while ((!read1.eof()) && (!read2.eof()))
{
if (line1 <= line2)
{
( getline ( read1, line1, '\n' ) );
line1 << endl;
read1 >> line1;
}
else 
{
( getline ( read2, line2, '\n' ) );
line2 << endl;
read2 >> line2;
}
}

while (!read1.eof())
{
( getline ( read1, line1, '\n' ) );
line1 << endl;
read1 >> line1;
}

while (!read2.eof())
{
( getline ( read2, line2, '\n' ) );
line2 << endl;
read2 >> line2;
}

read1.close();
read2.close();
write.close();
}


On lines 20, 26, 34, and 41, my compiler is spitting out error messages that read "Error: no operator '<<' matches these operands." I'm also not sure if I'm implementing the getline functionality correctly.

Could someone please explain to me why I'm getting these errors? I'm absolutely confused, to be honest.
First the logic.
My pseudocode has lines 1 and 2:
1
2
    read line1 from file one
    read line2 from file two

I don't see this expressed in your code.

Somewhere between opening the files on lines 8-11 and the beginning of the while loop at line 15, you need to read the first line from each file, into the two strings which you defined for that purpose.

As for the compiler errors,
your lines 20 and 21:
1
2
line1 << endl;
read1 >> line1;

the first of these two lines is attempting to use the string line1 as if it was an output stream. You should use your output stream write instead.
The second of these lines is reading a single word into the string line1. It would be better to use getline() instead of that second line.

In addition, at line 19 you have a getline() too.

Notice how my pseudocode had two lines of code within that part of the if, that is:
• one output statement
• one input statement.

Your code has three lines:
• an input statement (getline)
• a miscoded output statement
• another input statement.

Take another look at the logic I presented earlier, and see if you understand how your code doesn't follow it.

As a reminder, my original logic was not quite right, you should test .fail() instead of .eof(). Apologies for misleading you there.

@Chervil

Thank you so much for your help! I really appreciate it!

One last question: my merge sort function is now working properly. However, I am having trouble outputting one key piece of information.

Here's a sample of the input data that I'm looking at:

From data1.txt

Edward Letterhead Elegant 96.00 8
Fred Note Imagine 62.50 5
Garrett Letterhead Excelsior 68.50 3
Jim Note Imagine 62.50 12
Karen Note Imagine 62.50 11


From data2.txt


Carl Letterhead Elegant 96.00 10
Dave Note Imagine 62.50 20
John Letterhead Excelsior 68.50 31
Kristen Letterhead Excelsior 68.50 55
Larry Note Imagine 62.50 48


My output file demonstrates that all of the data has been correctly merged and sorted. However, it will not print out each person's name. Instead, the output file looks something like this:


Letterhead Elegant 96.00 10
Note Imagine 62.50 20
Letterhead Elegant 96.00 8
Note Imagine 62.50 5
Letterhead Excelsior 68.50 3
Note Imagine 62.50 12
Letterhead Excelsior 68.50 31
Note Imagine 62.50 11


Here is what my code currently looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
//declare two files for reading
ifstream read1( "data1.txt");
ifstream read2("data2.txt");
//declare file for writing
ofstream write ("merge.txt");
string line1;
string line2;

read1 >> line1;
read2 >> line2; 

while ((!read1.fail()) && (!read2.fail()))
{
if (line1 <= line2)
{
( getline ( read1, line1, '\n' ) );
write << line1 << endl;
read1 >> line1;
}
else 
{
( getline ( read2, line2, '\n' ) );
write << line2 << endl;
read2 >> line2;
}
}

while (!read1.fail())
{
( getline ( read1, line1, '\n' ) );
write << endl;
read1 >> line1;
}

while (!read2.fail())
{
( getline ( read2, line2, '\n' ) );
write << endl;
read2 >> line2;
}

read1.close();
read2.close();
write.close();
}


I'm a bit confused: why is the program skipping over each person's name?
I'm confused.

Why do you have two input statements for every one output statement?

And what purpose do lines 37 and 44 serve here? Instead of writing out any of the input data, you just write out the newline character.

Just an example, lines 34 to 39 currently look like this:
1
2
3
4
5
6
while (!read1.fail())
{
( getline ( read1, line1, '\n' ) );
write << endl;
read1 >> line1;
}

Above, notice you have two input statements. Both lines 3 and 5 are reading data from the input file.

It should look something like this:
1
2
3
4
5
    while (!read1.fail())
    {
        write << line1;
        getline (read1, line1);
    }
Last edited on
I've made some more modifications to my code as you suggested. I believe I was also making the same mistake (two output statements for every one input statement) within my while ((!read1.fail()) && (!read2.fail())) statement.

I'm now able to get the program to read in, merge, sort, and print out all of the data in the correct order. My last hurdle is in regards to formatting.

Here are my edits to my while ((!read1.fail()) && (!read2.fail())) to emulate your previous suggestions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
while ((!read1.fail()) && (!read2.fail()))
{
if (line1 <= line2)
{
write << line1 << endl;

read1 >> line1;
}
else 
{
write << line2 << endl;

read2 >> line2;
}
}




It now prints out each individual piece of data to a new line, which I'll post below:


Carl
Edward
Letterhead
Elegant
96.00
8
Fred
Letterhead
Elegant
96.00
10
Dave
Note
Imagine
62.50
5
Garrett
Letterhead
Excelsior
68.50
3
Jim
Note
Imagine
62.50
12
Karen
Note
Imagine
62.50
11
Note
Imagine 62.50 20
John Letterhead Excelsior 68.50 31
Kristen Letterhead Excelsior 68.50 55
Larry Note Imagine 62.50 48


When I tried removing the endl;, all of the pieces of data that meet that condition are printed out to a single, gigantic line. My while (!read1.fail()) statement prints out nicely.

Any suggestions as to how I can improve the output format?

Thank you so much for your help, by the way! I really appreciate it!
Well, as I suggested, it would be better to use getline(read1, line1); This reads the entire line (up to the next '\n' or end of file) into the string line1.

Currently, you are using read1 >> line1;. This reads just a single word at a time, and stops at the next space, or '\n', or end of file. That is the cause of your problem.

This program has come a long way since the original testing where the file contained just one letter (A, C, E) per line, to several words per line. With the original data, either version of input would work.

The way I see this at the moment, we are not concerned with any of the individual words, it's just the whole lines which need to be read, compared and output. Things might get more interesting if for example each line contained "Firstname Lastname", if the requirement was to sequence the result in order of "Lastname". That's a slightly different problem. But if you needed to do that, I would still read the entire line in one go, and then use a separate function for comparing one line with another. But I think that is going beyond the scope of this problem.

One other comment, in an earlier post above http://www.cplusplus.com/forum/beginner/85518/#msg461786
the algorithm was heading in the correct direction, and using getline(), but somewhere along the line things seem to get mixed up. I'm not sure what happened, but I hope you are back on track now.
After playing around with the code some more, I was finally able to figure out what I was doing wrong. Thank you so much for your help, Chervil! I really appreciate it!
Topic archived. No new replies allowed.