std::regex problem, why \s\S doesn't work

Hi,
Why this code not working?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <string>
#include <regex>
using namespace std;

int main() {
	string s = "aa  bb";
	regex pattern("aa[\\s\\S]+bb"); // \\s\\S doen't work
//	regex pattern("aa\\s+bb"); // works fine
	smatch m;

	while(regex_search(s, m, pattern)) {
		cout << m[0] << endl;
		s = m.suffix().str();
	}
}

Thanks
Last edited on
[\s\S] ← doesn't it means "any characte which is or not a whitespace" => basically all cahracters?
I mean what do you expected and what you got?

Also you can use R"(<text>)" to avoud need for escaping characters: regex pattern(R"(aa[\s\S]+bb)");
http://en.cppreference.com/w/cpp/language/string_literal
Last edited on
Thanks for the string literal tip.

[\s\S] ← doesn't it means "any characte which is or not is a whitespace" => basicly all cahracters?

yes, in my real case, there're all kinds of characters between aa and bb(including \r \n space tab or normal char)
In that case, why not actually say "any characters" instead of using the trick you're currently using?
Last edited on
I do not have much experience with regular expressions, but it probably because + is possessive. It will match all characters (because previous parts say so) until the end of string. There is no characters left, so attempt to match bb fails. Possessive quantifier does not backtrack, so it ends here. Try to use greedy quantifier (*) or lazy instead of possessive and use dot . for matching any character. Like that: aa.*bb
Last edited on
What about "aa[^b]*bb"? This will match aa, followed by any number of characters that aren't b, followed by bb. It won't match "aaccbccbb" though.
Last edited on
Try to use greedy quantifier (*)

I tried regex pattern("aa[\\s\\S]*bb"); , doesn't work either.

What about "aa[^b]*bb"
I used some syntax like this to solve the problem, but I'm just curious why the [\s\S]+ doesn't work
I found out that you need newlines in your match. I think there is way to force dot to match actully all, but some dialects support [^] as real "all symbols" character class.
Note, that aa[^]*bb will match xxaaxxxxxxbbxxxxbbxx
and aa[^]*?bb will match xxaaxxxxxxbbxxxxbbxx
http://regexr.com?34r01
http://regexr.com?34r04
Choose what do you need.
I'm just curious why the [\s\S]+ doesn't work
http://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers
Last edited on
some dialects support [^] as real "all symbols" character class.

The [^] seems to work fine, I should use that

But is this really caused by the quantifiers? I think if
regex pattern("aa[\\s]+bb");
should work, then
regex pattern("aa[\\sPUT_ANY_THING_HERE]+bb");
should also work, even
regex pattern("aa[\\sb]+bb");
works,
but why [\\s\\S]+ doesn't, \S stands for b here
they have the same quantifier.
Topic archived. No new replies allowed.