need to find this regex expression


I need to find a regex which captures any letter or digit (n times) enclosed with opening and closing parenthesis ["(" and ")"] and quotation marks and a dot. infront of it.

so if the user enters ".(c|cpp)" or ".(abb|4f5)" as command line argument it will check if the string input by the user matches ext.

string input = ".(abb|4f5)" ;

regex ext("\"\\.\([[:alpha:][:digit:]]*\)\"");


if (regex_match(input, ext))
{
// do computations
}


this did not work; i also tried to build the regex this way without the double slash "\\" and without escaping the opening and closing parenthesis "(" and ")", because i am not sure if i have to escape the "(" and the ")". I was getting a warning (unrecognized char escape sequence )when i escaped the "(" and the ")" :

regex ext("\".([[:alpha:][:digit:]]*)\"");

do i need to include the double slash \\ in my regex ?
and escape the opening and closing parenthesis "(" and ")" ?

Last edited on
> without the double slash "\\"
¿why do you use a double back slash? In regex world that would mean that you want to catch one backslash, however in c++ you've got backslash hell.
To avoid that, I recommend you to use raw string literals.
http://en.cppreference.com/w/cpp/language/string_literal
The regex would be regex ext(R"raw("\.\([[:alpha:][:digit:]]*\)")raw");


Now, your input does not match the regex, as it's missing the quotations characters and has and illegal bar |


> because i am not sure if i have to escape the "(" and the ")"
parenthesis have an special meaning, you need to escape them if you want to capture an actual parenthesis on input.
But maybe you can change that behaviour.
Last edited on
I am trying to get a better understanding of regex.

according to http://www.tenouk.com/ModuleY.html

a single slash \ and quotation marks needs to be escaped when entering as command line argument.

Therefore i have entered \"\\.(cpp)\" as one of my command line arguments in VS.

the regex i am using is:

regex ext(R"raw("\.\([[:alpha:][:digit:]]*\)")raw");

so it should match if the user enters "\.(cpp)" or "\.(txt)" , but it is not working.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <regex>
#include <string>
#include <iomanip>

int main()
{
    const std::string test_strings[] =
    { R"(\(.cpp))", R"(\(.txt))", R"(\(.abc4d8))", R"(\(.md5))", R"(\(.0012))" } ;

    // the regex consists of:
    // \\  - a literal \
    // \(  - a literal (
    // \.  - a literal .
    // \w+ - any alphanumeric character (\w), one or more times (+)
    // \)  - a literal )
    const std::string regex_string = R"(\\\(\.\w+\))" ;
    const std::regex re(regex_string) ;
    std::cout << "the regular expression is: '" << regex_string << "'\n\n" ;

    for( const std::string& str : test_strings )
    {
        std::cout << "'" << str << "'  matched? " << std::boolalpha
                  << std::regex_match( str, re ) << '\n' ;
    }
}

http://coliru.stacked-crooked.com/a/c2623de7541593e6
For user input with quotes around the string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <regex>
#include <string>
#include <iomanip>

int main()
{
    const std::string test_strings[] =
    { R"!("\(.cpp)")!", R"!("\(.txt)")!", R"!("\(.abc4d8)")!", R"!("\(.md5)")!", R"!("\(.0012)")!" } ;

    // the regex consists of:
    // \"  - a literal "
    // \\  - a literal \
    // \(  - a literal (
    // \.  - a literal .
    // \w+ - any alphanumeric character (\w), one or more times (+)
    // \)  - a literal )
    // \"  - a literal "
    const std::string regex_string = R"!(\"\\\(\.\w+\)\")!" ;
    const std::regex re(regex_string) ;
    std::cout << "the regular expression is: '" << regex_string << "'\n\n" ;

    for( const std::string& str : test_strings )
    {
        std::cout << str << "  matched? " << std::boolalpha
                  << std::regex_match( str, re ) << '\n' ;
    }
}

http://coliru.stacked-crooked.com/a/0e449072d215f539
i modified your regex from the first post , because i need the "." outside the parenthesis "( )".

const std::string regex_string = R"(\\\\.(\w+\))";
const std::regex ext(regex_string);

and it is giving me this error message:

Unhandled exception at 0x7651A9F2 in fileusage.exe: Microsoft C++ exception: std::regex_error at memory location 0x003CEAD8.


is there any reason you have put the dot "." inside the parenthesis ie:

const std::string regex_string = R"(\\\(\.\w+\))"

because i am looking for terms such as "\.(cpp)" or "\.(txt)" . as u can see the dot is outside of the parenthesis.
1
2
// const std::string regex_string = R"(\\\\.(\w+\))";
   const std::string regex_string = R"(\\\.\(\w+\))" ; // dot is outside of the parenthesis 

http://coliru.stacked-crooked.com/a/b617ce677d11bb69
Last edited on
i modified your regex from the second post , because i need the "." outside the parenthesis "( )".

const std::string regex_string = R"!(\"\\\.\(\w+\)\")!";

and then i typed \"\.(cpp)\" in command line arguments , and it worked.

but im still curious, why is that i have to escape the quotation marks when i type in command line arguments , but i don't need to escape the parentheses "(" and ")" and the dot "." with a double slash "\\" ?

and what are the "!" sign infront of the raw string mean ??

const std::string regex_string = R"!(\"\\\.\(\w+\)\")!";



is there anyway i can make it work so that the user does not even have to escape the quotation marks \" , so that the regex works when he just types

"\.(cpp)"


and also how do i incorporate the OR symbol "|" . i would like to have the user to enter something like this "\.(cpp|hpp|h)" . is there an regex expression for that ?
Last edited on
> why is that i have to escape the quotation marks when i type in command line arguments

How the arguments given on the command line are parsed and made available as argv is implementation defined.

In almost all the mainstream command processor implementations:

"The quote characters that were present in the original word shall be removed unless they have themselves been quoted." http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_07

and "A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (")." https://msdn.microsoft.com/en-us/library/17w5ykft.aspx


> what are the "!" sign infront of the raw string mean ??

In a raw string literal, the default delimiters are "(" and ")" ie. R"(....)"
However we are allowed to add our own delimiters before the opening parenthesis and after the closing parenthesis ie. "Rabcd(...)abcd" , "R!(...)!", "Rxy(...)xy" etc.

Here, for clarity, I had used ! as the custom delimiter.

Note that custom delimiters before and after the parentheses are required only if the raw string literal itself contains the character sequence )"

See: http://www.stroustrup.com/C++11FAQ.html#raw-strings


Here, the quote characters in the regex string are escaped and this custom delimiter is not required. If they were not (the quote character has no special meaning in a regular expression; they need not be escaped), custom delimiters would be necessary because the raw string literal would then contain the character sequence )"

1
2
3
4
5
6
    const std::string regex_strings[3] =
    {
        R"!(\"\\\.\(\w+\)\")!", // the ! character as the delimiter is allowed, but not required
        R"(\"\\\.\(\w+\)\")", // the ! at the beginning and end are not really required
        R"!("\\\.\(\w+\)")!", // here they are required, because the " in the regex string is not escaped
    };

http://coliru.stacked-crooked.com/a/5227afdbeb615fda
but i still don't understand why is that i have to escape the quotation marks when i type in command line arguments , but i don't need to escape the parentheses "(" and ")" and the dot "." with a double slash "\\" ?

and also how do i incorporate the OR symbol "|" . i would like to have the user to enter something like this "\.(cpp|hpp|h)" . is there an regex expression for that ?

there must be a regex for that that specifies that the user can enter any number or letter including the single bar "|"
Last edited on
No idea what you are asking.
¿What's the purpose of your program?

> so if the user enters ".(c|cpp)" or ".(abb|4f5)" as command line argument
¿you want the user to enter a regex and you want to validate that regex?
¿just validate it but never run it?

> so if the user enters ".(c|cpp)" or ".(abb|4f5)" as command line argument it
> will check if the string input by the user matches ext.
but then you write string input = ".(abb|4f5)" ;
¿so input is the regex? ¿or input is something like `main.cpp'?


> a single slash \ and quotation marks needs to be escaped when entering as
> command line argument.
> Therefore i have entered \"\\.(cpp)\" as one of my command line arguments in VS.
your shell has defined some special characters with an associated behaviour. For example an space would limit the command line arguments.
If you want to send the character itself instead of invoking its behaviour, you need to escape it.
So if you wrote
$ ./a.out \"\\.(cpp)\"
then your program would have
argv[0] = ./a.out
argv[1] = "\.(cpp)"
Read argv[1] carefully, you have a quote, a backslash, a dot...
Now look at your ¿regex validator? regex ext(R"raw("\.\([[:alpha:][:digit:]]*\)")raw");
It reads, starts with a quote, then comes a dot...

¿do you understand why it failed? Your test string has a backslash that is not matched by the regex.


> const std::string regex_string = R"(\\\\.(\w+\))";
¿what are you doing now? ¡you've got four backslashes in a row!
¿what did you intend to match?


> but im still curious, why is that i have to escape the quotation marks when i
> type in command line arguments , but i don't need to escape the parentheses
> "(" and ")" and the dot "." with a double slash "\\" ?
because the dot has no especial meaning for your shell.

> is there anyway i can make it work so that the user does not even have to
> escape the quotation marks \"
that's the behaviour of your shell, and your program has no right to modify it.
You may just stop using command line arguments and get the input with std::cin


> i would like to have the user to enter something like this "\.(cpp|hpp|h)" .
> is there an regex expression for that ?
¿a regex for what?


Again, ¿what's the purpose of your program?
const std::string regex_string = R"!(\"\\\(\.\w+\)\")!" ;

so why do i have to escape the opening and closing parenthesis in a raw string "(" and ")" and also escape the "\" with a double slash \\ ??

i thought raw string would make things easier by allowing us to type only one backslash to represent a backslash , and also that we do need to escape the "(" and ")".
> why do i have to escape the opening and closing parenthesis
you don't.
(...) and \(...\) have different meaning, use the one that fits your purpose

come back when you figure out what you want.
Topic archived. No new replies allowed.