Multiple matches from a regex
Apr 10, 2013 at 11:47am UTC
Hi,
I use boost libraries since my debian gcc version 4.7.2 isn't compiled with the latest c11++ standard.
My question is : why cannot I retrieve all the results of a string out of a regex (i read a lot about tokens, but don't get it, I started c++ coding 3 days ago :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
typedef std::istreambuf_iterator<char > iter;
string c;
std::ifstream input_file("myfile.txt" );
iter file_begin(input_file);
iter file_end;
static const boost::regex first_regex("(tommy=^[\"](\w)*[\"]$)*" );
boost::smatch str_matches;
for (iter i = file_begin; i != file_end; ++i)
c+= *i;
if (boost::regex_search(c, str_matches, first_regex))
{
cout << "ok" ;
}
here is myfile.txt content :
tommy="hello" ;tommy="byebye" ;
I should get two "ok", but only get one...
Why ??
Thanks,
Larry
Last edited on Apr 10, 2013 at 11:50am UTC
Apr 10, 2013 at 12:20pm UTC
This might help :
1 2 3 4 5 6 7 8 9
std::string text("coucou='abc' coucou abf lol abd" );
boost::regex regex("coucou=[']*(ab[cz])[']*." );
boost::sregex_token_iterator iter(text.begin(), text.end(), regex, 0);
boost::sregex_token_iterator end;
for (; iter != end; ++iter ) {
std::cout<<regex<<'\n' ;
}
But I cannot retrieve the capture only...
Apr 10, 2013 at 3:55pm UTC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main()
{
std::string text("coucou='abc' coucou abf lol abd" );
boost::regex regex("coucou=[']*(ab[cz])[']*." );
boost::sregex_token_iterator iter(text.begin(), text.end(), regex, 0);
boost::sregex_token_iterator end;
for (; iter != end; ++iter ) {
std::cout << *iter << '\n' ;
// std::cout<<regex<<'\n';
}
}
In your first post, you only search one time. Thus, you can only have one "ok"
Last edited on Apr 10, 2013 at 3:57pm UTC
Apr 10, 2013 at 5:11pm UTC
Thanks cire,
The cout << regex was a typo of mine, too much code :)
I strengthen my sword on a real world example : a mere html parsing.
<img src=\"myfirst123\"/><img src=\"mysec567\"/>
my code :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
boost::regex e("\<img src\=(?=.*\"([a-zA-Z])*([0-9])*\")(?!.*(/.*>))" ) ;
string input = "<img src=\"myfirst123\"/><img src=\"mysec567\"/>" ;
boost::match_results<std::string::const_iterator> what;
boost::regex_search(input, what, e);
if (what[0].matched)
{
cout << what.suffix();
}
else
cout << "nope" ;
I am trying to extract myfirst123 and mysec567 only.
My code gets everything after myfirst123, hence
"myfirst123" /><img src="mysec567" />
The first one is ok, but obviously not the second one..
When I change .suffix with what[1] or what[2], I only get "7" !!
It becomes so difficult to manage..
Any help ?
Larry
Apr 10, 2013 at 8:14pm UTC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main()
{
try {
boost::regex exp( "<img src=(\"[[:alnum:]]*\")/>" ) ;
std::string input = "<img src=\"myfirst123\"/><img src=\"mysec567\"/>" ;
boost::match_results<std::string::const_iterator> what;
std::string::const_iterator start = input.begin() ;
while ( boost::regex_search(start, input.cend(), what, exp) )
{
std::cout << "Sub-match : " << what[1] << " found in full match: " << what[0] << '\n' ;
start = what[0].second ;
}
}
catch ( boost::bad_expression & ex )
{
std::cout << ex.what() ;
}
}
Apr 10, 2013 at 9:24pm UTC
Wonderful cire,
I had to complete/replace the input.cend with end, and add
std::string::const_iterator end = input.end() ;
to make it compliant with my gcc version on debian (not c11++ enabled).
Many thanks for sharing your knowledge,
Larry
Last edited on Apr 10, 2013 at 9:25pm UTC
Topic archived. No new replies allowed.