Occurrence of word pairs

Sorry, I need to order the occurrences of the pair, what occurrences of the word pair in txt.

	map<string, int> freq;
	ifstream in("termosPares_small.txt");

	for (string s; in >> s;)
		freq[s]++;                             // frequency table

	vector<PR> pr;
	for (auto e : freq)
		pr.push_back( { e.first, e.second });   // put into data/count container

	sort(pr.begin(), pr.end(), []( PR a, PR b ) {return a.count > b.count;}); // sort descending

	ofstream print("termosPares_small_ordenado.txt");
	if (!print.is_open()) {
		cout << "Falha ao abrir o arquivo" << endl;
	} else {
		for (auto p : pr) {
			print << p.data /*<< '\t' << p.count*/<< '\n';             // output
		}
	}

This code print just:

rice 2
nice 2
word 1
work 1
.
.

I need to print, because my .txt are dispost in word pairs:

1
2
3

rice nice        2 //Occurred 2 times;
great bad      1 //Once;
work word     1 //Once

Last edited on

jonnin (11325)

how is this different from the other post asking the same thing?

just use the pair as the map key and increment an int/data element when you run across it.
its the exact same as doing it word by word, except instead of 1 word, you have 2.
if the order of the 2 words does not matter, sort them alphabetically before making the key. If it does matter, just lump them together as-is

Last edited on

victorio (59)

How I write it?

jonnin (11325)

give it a try. Its very like what you already wrote and posted. If you get stuck, we can take a look when you post your attempt.

victorio (59)

Ok. I don't have a solution, and this error "could not convert '(((void)std::operator>><char, std::char_traits<char>, std::allocator<char> >(in.std::basic_ifstream<char>::<anonymous>, s)), k)' from 'std::__cxx11::string {aka std::__cxx11::basic_string<char>}' to 'bool'".

struct PR {
	string data;
	int count;
};

int main() {
	map<string, string> freq;
	ifstream in("termosPares_small.txt");

	for (string s, k; in >> s, k;) {
		string q=s+k;
		freq[q]++;                             // frequency table
	}
		vector<PR> pr;
		for (auto e : freq)
			pr.push_back( { e.first, e.second }); // put into data/count container

		sort(pr.begin(), pr.end(),
				[]( PR a, PR b ) {return a.count > b.count;}); // sort descending

		ofstream print("termosPares_small_ordenado.txt");
		if (!print.is_open()) {
			cout << "Falha ao abrir o arquivo" << endl;
		} else {
			for (auto p : pr) {
				print << p.data /*<< '\t' << p.count*/<< '\n';         // output
			}
		}
		print.close();
		cout << "ARQUIVOS SALVOS" << endl;
		return 0;
	}

jonnin (11325)

This is about the simplest small example I can make of it.

int main()
{
  ifstream ifs("x5"); //whatever file name.  I assume you have a line with 2 words in it per line. 
  string a,b, tmp;
  int i;
  unordered_map<string, int> freq;
  while( ifs >> a) //read the first word of a line
  {
	ifs >> b;  //trust the next word is the second on this line.  
	tmp = a+" "+b;  //lump the words together as the key. 
	i = freq[tmp];  //get the current count of it.  this is zero if none. 
	freq[tmp] = i+1; //add or increment existing entry
  }
  for ( auto it = freq.begin(); it != freq.end(); ++it )  //print it out, not ordered. 
	  cout << it->first <<" " << it->second << endl;	 //yea its gibberish but when in rome..
}


input file
abc def
def abc
lol xyz
wrd ctr
abc def
jlk 2k+
abc def


output
C:\c>a
jlk 2k+ 1
abc def 3
def abc 1
wrd ctr 1
lol xyz 1

Ill leave it to you from here, to sort the output or do whatever more to it. But that is the gist of what you need.

Last edited on

victorio (59)

There are a problem, my .txt:

pt arroz
pt em
em o
o governo
governo publicada
publicada hoje
hoje revela
revela um
um dado
.
.
.

My result:

rabisco rabisco 1
has has 1
porangaba porangaba 1
orindiuva orindiuva 1
ms-windows ms-windows 1
consumiriam consumiriam 1
kassos kassos 1
recitando recitando 1
nemestrina nemestrina 1
.
.
.

#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <vector>
#include <algorithm>
#include <string>
#include <fstream>
#include <unordered_map>
using namespace std;


int main()
{
  ifstream ifs("termosPares_small.txt"); //whatever file name.  I assume you have a line with 2 words in it per line.
  string a,b, tmp;
  int i;
  unordered_map<string, int> freq;
  while( ifs >> a) //read the first word of a line
  {
	ifs >> b;  //trust the next word is the second on this line.
	tmp = a+" "+b;  //lump the words together as the key.
	i = freq[tmp];  //get the current count of it.  this is zero if none.
	freq[tmp] = i+1; //add or increment existing entry
  }
  ofstream print("termosPares_small_ordenado.txt");
  		if (!print.is_open()) {
  			cout << "Falha ao abrir o arquivo" << endl;
  		} else {
  for ( auto it = freq.begin(); it != freq.end(); ++it )  //print it out, not ordered.
	  print << it->first <<" " << it->second << endl;	 //yea its gibberish but when in rome..
  		}
  	  		print.close();
  	  		cout << "ARQUIVOS SALVOS" << endl;
}

jonnin (11325)

check that it opened ifs properly. if it did, print tmp every loop.
if that matches, try it on a very small file -- its not in order, are the things it listed IN The file later on?

victorio (59)

Really, with small file run normally, but occurrences in crescent order. Any idea for big txt?

jonnin (11325)

If you don't like the order, you can reorder it yourself... either by doing it as you did above with an object and sorting, or copying the map to a vector, or maybe trying it with a std::set, or something along those lines.

here is another quick example to sort it while leaving it in the map structure. You can adjust the the magic number to whatever # of repeats you expect as a maximum or just use push-back there. I find the maps/sets/etc a bit clunky, they always seem to almost do what I want but never quite get it exactly.

vector<vector<string> > vs(1000); //adjust this 1000 here and in for loop below
  for ( auto it = freq.begin(); it != freq.end(); ++it )
	  vs[it->second].push_back(it->first);
  
  for(int i = 999; i>0; i--)
  {
	if(vs[i].size())
	{
		for(int j = 0; j< vs[i].size(); j++)
			cout << vs[i][j] << " " << i << endl;
	}
  }

Last edited on

victorio (59)

There are similar problems, print the same pair word.

int main()
{
  ifstream ifs("termosPares_small.txt"); //whatever file name.  I assume you have a line with 2 words in it per line.
  string a,b, tmp;
  int i;
  unordered_map<string, int> freq;
  while( ifs >> a) //read the first word of a line
  {
	ifs >> b;  //trust the next word is the second on this line.
	tmp = a+" "+b;  //lump the words together as the key.
	i = freq[tmp];  //get the current count of it.  this is zero if none.
	freq[tmp] = i+1; //add or increment existing entry
  }
  ofstream print("termosPares_small_ordenado.txt");
  for ( auto it = freq.begin(); it != freq.end(); ++it )  //print it out, not ordered.
	  print << it->first <<" " << it->second << endl;	 //yea its gibberish but when in rome..
  print.close();
  return 0;
}

Printed

rabisco rabisco 1
has has 1
porangaba porangaba 1
orindiuva orindiuva 1
ms-windows ms-windows 1
consumiriam consumiriam 1
kassos kassos 1

dhayden (5795)

You indicated that your text file has pairs of words on each line. If that's true then all you have to do is change line 4 of your original program to
for (string s; getline(in,s);)

If your goal is to read free-form text and count pairs then you can't read words two at a time. After all, the words "one two three four" contain 3 pairs (one two, two three, three four), not two. To handle free-form text, change the first few lines of your program to:

    string prev;

    cin >> prev;
    for (string s; cin >> s;) {
        string key = prev + ' ' + s;
        freq[key]++;                             // frequency table
        prev = s;
    }

When I make that change and feed it the opening paragraph to Moby Dick:

Call me Ishmael. Some years ago—never mind how long precisely—having
little or no money in my purse, and nothing particular to interest me
on shore, I thought I would sail about a little and see the watery
part of the world. It is a way I have of driving off the spleen and
regulating the circulation. Whenever I find myself growing grim about
the mouth; whenever it is a damp, drizzly November in my soul;
whenever I find myself involuntarily pausing before coffin warehouses,
and bringing up the rear of every funeral I meet; and especially
whenever my hypos get such an upper hand of me, that it requires a
strong moral principle to prevent me from deliberately stepping into
the street, and methodically knocking people’s hats off—then, I
account it high time to get to sea as soon as I can. This is my
substitute for pistol and ball. With a philosophical flourish Cato
throws himself upon his sword; I quietly take to the ship. There is
nothing surprising in this. If they but knew it, almost all men in
their degree, some time or other, cherish very nearly the same
feelings towards the ocean with me.

I get this output:

in my   2
I find  2
is a    2
find myself     2
quietly take    1
part of 1
particular to   1
pausing before  1
people’s hats   1
philosophical flourish  1

...

victorio (59)

Ok, I change for this:

int main()
{
  ifstream ifs("termosPares_small.txt"); //whatever file name.  I assume you have a line with 2 words in it per line.
  //string a,b, tmp;
  for (string s; getline(ifs,s);)
  int i;
  unordered_map<string, int> freq;
  string prev;

  cin >> prev;
  for (string s; cin >> s;) {
      string key = prev + ' ' + s;
      freq[key]++;                             // frequency table
      prev = s;
  }
//  while( ifs >> a) //read the first word of a line
//  {
//	ifs >> b;  //trust the next word is the second on this line.
//	tmp = a+" "+b;  //lump the words together as the key.
//	i = freq[tmp];  //get the current count of it.  this is zero if none.
//	freq[tmp] = i+1; //add or increment existing entry
//  }
  ofstream print("termosPares_small_ordenado.txt");
  for ( auto it = freq.begin(); it != freq.end(); ++it )  //print it out, not ordered.
	  print << it->first <<" " << it->second << endl;	 //yea its gibberish but when in rome..
  print.close();
  return 0;
}

This program doesn't stop to run and the txt is empty.

victorio (59)

Great! Now is running.

struct PR
{
   string data;
   int count;
};


int main()
{
   map<string,int> freq;
   ifstream ifs( "termosPares_small.txt" );

   for ( string s; getline(ifs,s); ) freq[s]++;                             // frequency table

   vector<PR> pr;
   for ( auto e : freq ) pr.push_back( { e.first, e.second } );      // put into data/count container

   sort( pr.begin(), pr.end(), []( PR a, PR b ){ return a.count > b.count; } );     // sort descending
   ofstream print("termosPares_small_ordenado.txt");
   for ( auto p : pr ) print << p.data /*<< '\t' << p.count */<< '\n';
   print.close();
   return 0;// output
}

Enoizat (1342)

A std::map can be ordered ascending or descending. Ascending order is just the default, but that behaviour can be easily modified.

Assuming your termosPares_small.txt is:

pt arroz
pt em
em o
o governo
governo publicada
publicada hoje
hoje revela
revela um
um dado

This code:

#include <fstream>
#include <functional>
#include <iostream>
#include <map>
#include <string>


int main()
{
   std::ifstream ifs( "termosPares_small.txt" );
   if (!ifs) {
       std::cerr << "Cannot open 'termosPares_small.txt' for reading.\nExiting now.\n";
       return 0;
   }

   std::map < std::string, int, std::greater<std::string> > freq;
   for ( std::string s; std::getline(ifs, s); /**/) {
       ++freq[s];
   }

   std::ofstream print("termosPares_small_ordenado.txt");
   if (!print) {
       std::cerr << "Cannot open 'termosPares_small_ordenado.txt' for "
                    "reading.\nExiting now.\n";
       return 0;
   }

   for ( const auto& e : freq ) {
       print << e.first << '\t' << e.second << '\n';
   }

   return 0;
}

outputs it in descending order (termosPares_small_ordenado.txt):

um dado	1
revela um	1
publicada hoje	1
pt em	1
pt arroz	1
o governo	1
hoje revela	1
governo publicada	1
em o	1

Topic archived. No new replies allowed.

C++

Forum

Occurrence of word pairs