Multiple matches from a regex

Hi,

I use boost libraries since my debian gcc version 4.7.2 isn't compiled with the latest c11++ standard.

My question is : why cannot I retrieve all the results of a string out of a regex (i read a lot about tokens, but don't get it, I started c++ coding 3 days ago :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

    typedef std::istreambuf_iterator<char> iter;
    string c;
    std::ifstream input_file("myfile.txt");

    iter file_begin(input_file);
    iter file_end;

    static const boost::regex first_regex("(tommy=^[\"](\w)*[\"]$)*");
    boost::smatch str_matches;
    for (iter i = file_begin; i != file_end; ++i)
    c+= *i;

        if (boost::regex_search(c, str_matches, first_regex))
        {
           cout << "ok";
        }


here is myfile.txt content :
 
tommy="hello";tommy="byebye";


I should get two "ok", but only get one...

Why ??

Thanks,

Larry
Last edited on
This might help :

1
2
3
4
5
6
7
8
9
    std::string text("coucou='abc' coucou abf lol abd");
    boost::regex regex("coucou=[']*(ab[cz])[']*.");

    boost::sregex_token_iterator iter(text.begin(), text.end(), regex, 0);
    boost::sregex_token_iterator end;

    for(; iter != end; ++iter ) {
        std::cout<<regex<<'\n';
    }


But I cannot retrieve the capture only...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
    std::string text("coucou='abc' coucou abf lol abd");
    boost::regex regex("coucou=[']*(ab[cz])[']*.");

    boost::sregex_token_iterator iter(text.begin(), text.end(), regex, 0);
    boost::sregex_token_iterator end;

    for(; iter != end; ++iter ) {
        std::cout << *iter << '\n' ;
        // std::cout<<regex<<'\n';
    }
}


In your first post, you only search one time. Thus, you can only have one "ok"
Last edited on
Thanks cire,

The cout << regex was a typo of mine, too much code :)

I strengthen my sword on a real world example : a mere html parsing.

 
<img src=\"myfirst123\"/><img src=\"mysec567\"/> 


my code :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
boost::regex e("\<img src\=(?=.*\"([a-zA-Z])*([0-9])*\")(?!.*(/.*>))") ;


   string input = "<img src=\"myfirst123\"/><img src=\"mysec567\"/>";

   boost::match_results<std::string::const_iterator> what;
   boost::regex_search(input, what, e);

   if(what[0].matched)
   {
        cout << what.suffix();
   }
   else
    cout << "nope";


I am trying to extract myfirst123 and mysec567 only.

My code gets everything after myfirst123, hence

 
"myfirst123"/><img src="mysec567"/>


The first one is ok, but obviously not the second one..

When I change .suffix with what[1] or what[2], I only get "7" !!

It becomes so difficult to manage..

Any help ?

Larry
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
    try {
        boost::regex exp( "<img src=(\"[[:alnum:]]*\")/>" ) ;

        std::string input = "<img src=\"myfirst123\"/><img src=\"mysec567\"/>";

        boost::match_results<std::string::const_iterator> what;

        std::string::const_iterator start = input.begin() ;

        while ( boost::regex_search(start, input.cend(), what, exp) )
        {
            std::cout << "Sub-match : " << what[1] << " found in full match: " << what[0] <<  '\n' ;
            start = what[0].second ;
        }
    }
    catch ( boost::bad_expression & ex )
    {
        std::cout << ex.what() ;
    }
}
Wonderful cire,

I had to complete/replace the input.cend with end, and add
 
std::string::const_iterator end = input.end() ;


to make it compliant with my gcc version on debian (not c11++ enabled).

Many thanks for sharing your knowledge,

Larry
Last edited on
Topic archived. No new replies allowed.