Questions about Boost::Regex

Hi everyone!

My goal is: to read the spesific part of the text from whole text.
E.g: To get all links in the html page.

If I use CLR Windows Forms Application (.Net) it would be something like this:

1
2
3
4
5
6
7
using namespace System::Text::RegularExpressions;
...
Regex^ re( "<a href=\"([^\"]*)\"" );
MatchCollection^ matches = re->Matches(fulls); //fulls contains html page source
for each (Match^ match in matches){
  MessageBox::Show(match->Groups[1]->Value);
}


How can I do the same thing in MFC using Boost::regex?

I think I can use regex_token_iterator for this, but how to get results in CString using regex_token_iterator?

Any help would be greatly appreciated.
Last edited on
I found a solution(but I don't think it's the best one):

1
2
3
4
5
6
7
8
9
10
CString fulls=html;//the full page source
boost::tregex re(L"<a href=\"([^\"]*)\"");
boost::tregex_iterator begin(fulls.GetString(),fulls.GetString()+fulls.GetLength(),re), end;
CString sub;
for (;begin!=end;++begin){
	boost::tmatch const &what = *begin;
	sub=CString(what[0].first,what[0].length());
	if (sub!="")  //to avoid the empty results
	AfxMessageBox(sub);
}


I'm getting empty results for each charachter where regex doesn't match. That's why I'm using if(sub!="") in the 8th line.

Does anyone have a better solution?
Check what[0].matched before assigning the string.
Thanks, PanGalactic.

I tried what[0].matched instead of if (sub!=""), everything works fine, except I'm getting an empty result in the end.

From Boost::regex documentation:
m[0].matched --
true if a full match was found, and false if it was a partial match (found as a result of the match_partial flag being set).


Since I'm not using flag match_partial, I think, there's no need to check for what[0].matched.

I figured it out: I just need to use match_not_null(match can't be null) flag and I don't need to check for empty results.
One more question, I need to read everything(even nested in tags) between the starting and ending of a tag. I need a regex pattern to do that.

So for example I have a html page source:

...
<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b>
<i>blah-blah-blah</i>
<div>adsasd</div>
</div>
...


I want to read everything marked in bold.

This is a pattern I got:
<div id=\"ReadEverythingInThisTag\">(.*(<[^>/]*>.*(</[^>]*>)*).*)</div>


It works fine with that source above, but when it comes with line breaks(<br>), lines(<hr>) it's failing:

<div id="ReadEverythingInThisTag">
<b>blah-blah-blah</b><br>
<i>blah-blah-blah</i><hr>
<div>adsasd</div>
</div>


Any ideas?
Last edited on
Ok, I figured it out:

1
2
3
CString altag=L"((<img[^>]*>)*?(<br[^>]*?>)*?(<hr[^>]*?>)*?)*?";
CString Parrern=L"<div>(?<id>"+altag+L".*?(<[^>/]*>"+altag+L".*(</[^>]*>)*)*"+
         altag+L".*?)</div>";



Now I got another one :) Please, someone help me with this problem:

I want to give a name to group in regex:
Using .net it would be like this:

"(?<letters>[a-zA-Z]*)(?<numbers>[0-9]*)"

I could access those groups like this:
1
2
String^ letters=Match->Groups["letters"]->Value;
String^ numbers=Match->Groups["numbers"]->Value;


How can I do that using Boost::Regex libraries?




Last edited on
Topic archived. No new replies allowed.