Help finding most frequent co-occuring words?

Hi. I am writing psuedocode as an exercise. If I am given an article and need to find the most frequent co-occurring word, how would I go about this? co-occurring words are any two words that appear in a sentence. My method is to
find all possible word pairs and add them into a vector. I am thinking of using a struct so I can add word1, word2, and frequency into the vector. I can then easily find the most frequent co-occurring word in the vector. First of all, is this a good approach? Second of all, how can I find all possible word pairs in a sentence?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Problem 1: given an article, find the most frequently co-occurring word-pair.
	   The function sentenceSplitter(article) is accessible. 

Output: the most frequent co-occurring words.

Algorithm 1:
//create struct for paired words
struct wordVector{
   word1;
   word2;
   int count;	
};


//split the article into sentences
sentenceSplitter(Article); 
return sentences;



string word;
while(cin){ //split the sentences into words
   //iterate through sentence	
   for(each word in sentence){
	word_pairs; //find all possible pairs;
	wordVector.push_back(word_pairs)//store all possible pairs into the vector;
	//if word pairs are already in vector, increase count by 1
	if(word pair is already in vector){
		count+=1;
	}
		
   }
	getMostFrequentPair(); //calls function to find most frequent co-occurring words
} 	

char getMostFrequentPair(){
	//iterate through the vector to find highest frequency
	for(all pairs in vector){
	   if(wordVector[i].count > max)
		max = wordVector[i];
	}
}
I would use a std::map<std::set<std::string>, std::size_t> - the inner sets each have two words (the only reason they are sets is so they are sorted to remove duplicates) and the outer map maps word pairs to their number of occurrences.
Topic archived. No new replies allowed.