Same duplicate

HELLOO!
Who can tell me, how to find same duplicate characters from text file and print them in another text file?

I am talking about these "AB", "BB", etc. If they are the same the program would print whole line in the another text file. How it is possible to make it?

Text file look like this:
J1 B1 1998 AB
asas J33 1997 BB
ff fff 1996 BC
PE L5 1995 AB
ff55 ff 1994 AB.

So which of those lines would be printed to another file?

What does "If they are the same" mean? The same regarding what?
If you know there are exactly 4 std::strings in each row, you could loop through the file reading 4 std::strings at each iteration and trying to insert the forth in a std::set.
Each time std::set::insert() fails, just save those 4 std::strings in your output file.

(If this is NOT an assignment, we can give you a working example)
Like others, I'm having trouble understanding the problem. What should the output be for the input text file you've given?
Information in the document file.
(name, surname, date of birth, group)
For example document file look like this:

J1 B1 1998 AB
asas J33 1997 BB
ff fff 1996 BC
PE L5 1995 AB
ff55 ff 1994 AB

first, i need to find out the group which dominate the most, in this case it's group "AB"
second, the program would print all the line if it is in the group "AB".

The result file would look like this:
J1 B1 1998 AB
PE L5 1995 AB
ff55 ff 1994 AB


Thanks for the help!
Last edited on
Again: is this an assignment?
You'll need a function that takes a line of text as input and returns the group:
1
2
3
4
5
// Return the group (4'th field) from a line of space-separated fields
string getGroup(const string &line)
{
    ...
}


There are three parts to this problem, first you have to count how many times each group appears. Second you must select the largest count, and third you must print the lines whose count is largest. The thing that trips people up with a problem like this is that you' looking up groups by value in the first part, then by frequency in the second part.

To do the first part, just use a map.
1
2
3
4
5
std::map<std::string, unsigned> freqMap;  // maps string to frequency
for each line {
    string group = getGroup(line);
    freqMap[group]++;
}


Second, find the group with the highest frequency. Note that this code is untested. I always seem to have trouble iterating through an std::map:
1
2
3
4
5
6
7
8
unsigned theMax {0};  // max frequency
string bestGroup;    // group corresponding to the Max
for (const auto & keyVal : freqMap) {
    if (keyVal.second > theMax) {
        theMax = keyVal.second;
        bestGroup = keyVal.first;
    }
}


Third, go back through the lines, printing ones that match the best group:
1
2
3
4
5
6
for (each line) {
    group = getGroup(line);
    if (group == bestGroup) {
        cout << line << '\n';
    }
}


I haven't specified how to go through each line. If the file isn't too big, you can read it into memory and do all the processing there. If the data is expected to be large, you might be better off making two passes through the file.
Again: is this an assignment?
I am just trying to learn individually.
Last edited on
I think the solution changes according to what you want to do with those data once you have read them.
Do you want make further operations or not?
If so, perhaps you want them stored inside some sort of container; and (again, if so) perhaps the best solution is a std::multimap, where you can use equal_range() to divide them.

Anyway, you could also try something weirder like this:

bambambam.dat:
1
2
3
4
5
J1    B1 1998 AB
asas J33 1997 BB
ff   fff 1996 BC
PE    L5 1995 AB
ff55  ff 1994 AB


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
#include <fstream>
#include <iostream>
#include <map>
#include <sstream>
#include <string>
#include <vector>


// Information in the document file:
// (name, surname, date of birth, group)
struct FourData {
public:
    std::string name;
    std::string surname;
    int birthdate {};
    std::string group;

    FourData() = default;
    explicit FourData(std::string s);

    bool operator< (const FourData& rhs) const;

//friend:
    friend std::ostream& operator << (std::ostream& os, const FourData& rhs);
    friend std::istream& operator >> (std::istream& is, FourData& rhs);
};


FourData::FourData(std::string s)
{
    std::istringstream iss { s };
    iss >> *this;
}


bool FourData::operator< (const FourData& rhs) const
{
    return group < rhs.group;
}


std::istream& operator >> (std::istream& is, FourData& rhs)
{
    is >> rhs.name >> rhs.surname >> rhs.birthdate >> rhs.group;
    return is;
}


std::ostream& operator << (std::ostream& os, const FourData& rhs)
{
    os << rhs.name << ' ' << rhs.surname << ' ' << rhs.birthdate << ' ' << rhs.group;
    return os;
}


using weirdmap = std::map<FourData, std::vector<FourData>>;


weirdmap& extractAll(weirdmap& mymap);
weirdmap& extractDuplicates(weirdmap& mymap);
void printWeirdmap(const weirdmap& mymap);
int printWeirdmapToFile(const weirdmap& mymap, std::ofstream & fout);
std::string findMostFrequent(const weirdmap& mymap);


int main ()
{
    int choice {};
    while (!choice) {
        std::cout << "Would you like to extract all the occurencies of\n"
                     "duplicated lines (first occurency and the duplicates)\n"
                     "or only the duplicates (not the first occurency)?\n"
                     "1) all\n2) duplicates only\n >>> ";
        std::cin >> choice;
        if(choice < 1 || 2 < choice) { choice = 0; }
    }
    weirdmap mymap;
    switch(choice) {
    case 1:
        extractAll(mymap);
        break;
    case 2:
        extractDuplicates(mymap);
        break;
    default:
        std::cout << "A misterius error occurred.\n";
        return 0;
    }

    printWeirdmap(mymap);
    std::cout << "The most frequent group is " << findMostFrequent(mymap) << '\n';
}


weirdmap& extractDuplicates(weirdmap& mymap)
{
    std::string finname { "bambambam.dat" };
    std::ifstream inf(finname);
    if(!inf) {
        std::cout << "Cannnot open " << finname << '\n';
        return mymap;
    }

    std::string foutname { "bambambam_duplicates.dat" };
    std::ofstream outf(foutname);
    if(!outf) {
        std::cout << "Cannnot open " << foutname << '\n';
        return mymap;
    }

    int mycount {};
    for(std::string line; std::getline(inf, line); /**/) {
        FourData fd(line);

        auto myit = mymap.find(fd);
        
        // No duplicates:
        if ( myit == mymap.end()) {
            std::vector<FourData> v { fd };
            mymap.insert ( std::make_pair(fd,v) );
            continue;
        }

        // Duplicate found:
        myit->second.push_back ( fd );
        outf << line << '\n';
        ++mycount;
    }
    std::cout << finname << " has been read and " << mycount
              << " duplicates have been copied into " << foutname << '\n';
    return mymap;
}


weirdmap& extractAll(weirdmap& mymap)
{
    std::string finname { "bambambam.dat" };
    std::ifstream inf(finname);
    if(!inf) {
        std::cout << "Cannnot open " << finname << '\n';
        return mymap;
    }

    for(std::string line; std::getline(inf, line); /**/) {
        FourData fd(line);

        auto myit = mymap.find(fd);
        
        if ( myit == mymap.end()) {
            // No duplicates:
            std::vector<FourData> v { fd };
            mymap.insert ( std::make_pair(fd, v) );
        } else {
            // Duplicate found:
            myit->second.push_back ( fd );
        }
    }

    std::string foutname { "bambambam_duplicates.dat" };
    std::ofstream outf(foutname);
    if(!outf) {
        std::cout << "Cannnot open " << foutname << '\n';
        return mymap;
    }
    int mycount { printWeirdmapToFile(mymap, outf) };
    std::cout << finname << " has been read and " << mycount
              << " duplicates have been found.\nPlease check "
              << foutname << '\n';
    return mymap;
}


int printWeirdmapToFile(const weirdmap& mymap, std::ofstream & fout)
{
    int mycount {};
    for(const auto& a : mymap) {
        if(a.second.size() > 1) {
            fout << "Elements in the group " << a.first.group << ":\n";
            for(const auto& b : a.second) {
                fout << b << '\n';
                ++mycount;
            }
            fout << '\n';
        }
    }
    return mycount;
}


void printWeirdmap(const weirdmap& mymap)
{
    for(const auto& a : mymap) {
        std::cout << "Elements in the group " << a.first.group << ":\n";
        for(const auto& b : a.second) {
            std::cout << b << '\n';
        }
        std::cout << '\n';
    }
}


std::string findMostFrequent(const weirdmap& mymap)
{
    std::map<unsigned, std::string> tmpmap;
    for(const auto& e : mymap) {
        tmpmap.insert(std::make_pair(e.second.size(), e.first.group));
    }
    return tmpmap.crbegin()->second;
}

Thank you! :)
Topic archived. No new replies allowed.