C++ regex clarification

Can you please clarify if this is the correct approach to retrieve values of "x" and "y" from a given string?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <string>
#include <regex>

int main()
{
    int x;
    int y;
    //This regex may not be accurate yet...
    const std::regex re("[x|y]=[0-9]");
    std::string in;
    in = "<Point x="0" y="0" />";
    
    std::smatch sm;
    if (std::regex_match(in, sm, re))
    {
        //My question is, is this the correct approach to retrieve x and y values?
        x = std::stod(sm[0]);
        y = std::stod(sm[1]);
    }
    
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <string>
#include <regex>

int main()
{
    int x, y;

    // You need to use capture groups (parens) to capture the subexpressions you want
    // You need to use + to allow "one or more" digits.
    const std::regex re("([xy])=\"([0-9]+)\"");

    // double quotes within double quotes need to be escaped with a backslash.
    std::string text = "<Point x=\"0\" y=\"12\" />";

    // Use regex_search and an iterator to search for all matches.
    std::smatch sm;
    for (auto it = text.cbegin();
         std::regex_search(it, text.cend(), sm, re);
         it = sm[0].second)
    {
        // The first capture group ([1]) is our variable name, the second ([2]) is its value
        if (sm[1] == "x")
            x = stoi(sm[2]);
        else if (sm[1] == "y")
            y = stoi(sm[2]);
        else
            std::cout << "unknown variable: " << sm[1] << '\n';
    }

    std::cout << x << ' ' << y << '\n';
}

Raw string literals can help avoid serious cases of leaning toothpick syndrome:
https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

const std::regex re(R"eos(([xy])="([0-9]+)")eos"); // internal quotes not escaped

The raw string has dubious value here, but sometimes it offers big improvements.
Last edited on
Raw strings were probably added mostly for regexes since otherwise they can get pretty ridiculous. However, for this regex it's simpler with a normal string. The text can benefit a little, though.

The "unknown variable" condition can never occur in the code above since only x and y are ever matched. To match general variable names you might do something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <string>
#include <regex>
#include <limits>

const int Unset = std::numeric_limits<int>::min();

int main()
{
    int x = Unset, y = Unset;
    // Variable names start with a letter or underscore.
    // After the first character they can contain letters, underscores, and digits.
    const std::regex re("([A-Za-z_][A-Za-z0-9_]*)=\"([0-9]+)\"");
    std::string text = R"(<Point x="1" k16="2" y="34" />)";

    std::smatch sm;
    for (auto it = text.cbegin();
         std::regex_search(it, text.cend(), sm, re);
         it = sm[0].second)
    {
        if (sm[1] == "x")
            x = stoi(sm[2]);
        else if (sm[1] == "y")
            y = stoi(sm[2]);
        else
            std::cout << "unknown variable: " << sm[1] << '\n';
    }

    if (x != Unset)
        std::cout << "x=" << x << '\n';
    else
        std::cout << "x not found\n";

    if (y != Unset)
        std::cout << "y=" << y << '\n';
    else
        std::cout << "y not found\n";
}

Topic archived. No new replies allowed.