boost regex

I want to read in an article from a text file,no shorter than 500 words using boost regex I want to locate and tag locations,like <Houston> and the print out the article with the locations tagged. I can use a list of common locations but I have to look at words before and after the locations to find uncoomon locations.so cities, states would be on the list but locations like "store " would be uncommon.I have never used boost regex. Any ideas at all? The program doesn't have to be perfect just a high percentage of correctness.
Is it an XML file? If so, then why not just use an XML library?
It's a text file copied from the net. It's a random file. I've got it to identify a word and highlight it but only because I've tried the whole word, in this case California. At least i know regex search and format works. I was wondering can regex search also use a list to search? Say i have a list of all states in a list, it will tag those.
So you want to search a list of words in a string, and you want "California" to match "California", but not "Californian". Is that right?
yes exactly.At the same time, I cannot rely solely on the list. I have to try and see if there is a possible way to identify locations by the words that precede it as well as come after it.Thats a more daunting task but for now, searching a list to check if any of the words in it match a text file would be good. thanks for helping.

Here's what I'd do:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//std::vector<std::string> list;
//std::string file;
std::vector<std::pair<size_t,size_t> > matches;
for (size_t a=0;a<list.size();a++){
	size_t first=0,
		second;
	do {
		first=file.find(list[a],first);
		if (first==file.npos)
			break;
		second=first+list[a].size()-1;
		if (first>0 && is_part_of_word(list[first-1]) || second+1<file.size() && is_part_of_word(list[second+1]))
			matches.push_back(std::make_pair(first,second));
		first=second+1;
	} while (first>=file.size());
}
You'll have to define is_part_of_word() based on where you consider a word to end. For example, is "3foo" a single word? Is "#foo"? What about "afoo"?
Topic archived. No new replies allowed.