Trying to find the difference between two vectors



What I'm attempting to do is to get two files, the 'raw data' txt file and the 'excel' txt file. I'm trying to compare all the unique IDs from the raw data file and compare it to the excel file, and see which IDs are missing.

At first I tried using two for loops, however, when I tried to run, it would give me everything from the first file - but I'm just trying to find the differences between the two files and output what is 'missing' from the second file. I tried using set_differnce, however, it gave me the same thing again when I tried.

The real file for my project is very long, so I attempted to write my own in order to break it up and test it. This was the example I used:

test:
=frog
=rale
=love
=plat

test2:
rale
plat

That means the output should be 'frog' and 'love' because they don't appear in the second test file. But instead, it outputs "frog, love, plat, rale". I'm not sure what I'm doing wrong. Thank you so much for all the help thus far.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
#include <iostream>
#include <algorithm>
#include <vector>
#include <fstream>
#include <string>

using namespace std;
int main()
{
    vector<string> rawList;
    vector<string> excelList;
    vector<string> missing;

    ifstream infile;
    ifstream excel;
    infile.open("test.txt");
    excel.open("test2.txt");
    if (infile.fail() || excel.fail())
    {
    cout << "Could not open file." << "\n";
    return 1;
    }
    string data;
    string excel2;

//This gets each line from the txt file, lowers them all to lowercase
//Then it finds where the = sign is and gets the position of the next four 
//characters which is the unique ID. 

    while (getline(infile, data))
   {
      transform(data.begin(), data.end(), data.begin(),
      [](unsigned char c) { return tolower(c);});
      size_t pos;

      if ((pos = data.find('=', 0)) != string::npos)
      {
          rawList.push_back(data.substr(pos + 1, 4));
      }
    }

//This takes the first four characters of each line and pushes it into the vector 
//since the unique I.D in this file is at the beginning of each line.
    while (getline(excel, excel2))
   {
      transform(excel2.begin(), excel2.end(), excel2.begin(),
      [](unsigned char c) { return tolower(c);});
      size_t pos = excel2.find(" ");
      if(pos != std::string::npos)
      {
        excelList.push_back(excel2.substr(0,4));
      }
   }


   sort(rawList.begin(), rawList.end());
    auto last = unique(rawList.begin(), rawList.end());
    rawList.erase(last, rawList.end());
  // for (const auto& i : rawList)
  //   cout << i << " " << endl;

sort(excelList.begin(), excelList.end());
  auto opp = unique(excelList.begin(), excelList.end());
  excelList.erase(opp, excelList.end());
  // for (const auto& i : excelList)
  //cout << i << " " << endl;
  //  cout << "\n";
  
//This is to compare the two txt files and see which values are missing from the second file.

/* string found;
  for(int i = 0; i < sizeOfrawList; i++)
  {
      string found = " ";
    for(int k=0; k < sizeOfexcelList; k++ )
     {
          if( rawList.at(i) == excelList.at(k))
          {
            break;
          }
          if(rawList.at(i) != excelList.at(k))
          {
       found = rawList.at(i);
          }
    }
       //cout<< rawList.at(i) << endl;
            missing.push_back(found);

    }
    */

std::set_difference(rawList.begin(), rawList.end(), excelList.begin(), excelList.end(),
std::inserter(missing, missing.begin()));


  for (const auto& i : missing){
    cout << i << " " << endl;
     cout << "\n";
  }

}
Last edited on
a few min debugging this for you and I see that these lines:

//size_t pos = excel2.find(" ");
//if(pos != std::string::npos)

are ensuring that excellist is empty, which means that everything in the other list is indeed different.

if you take those lines out, excellist is no longer empty, but I am not sure what is broken if you do that.

Yep, that was it! Thank you so much.
Topic archived. No new replies allowed.