C++ #include <regex> Usage - replace_regex()

Let's say I have a text file with the format:
horsey"test"log

And I want line to be:
"test"

I would use std::regex_replace() for this.

Although I'm having trouble finding (or understanding) the correct syntax for this.

Links I've been using:
- https://en.cppreference.com/w/cpp/regex/regex_replace
- https://stackoverflow.com/questions/11627440/regex-c-extract-substring
- https://stackoverflow.com/questions/2912894/how-to-match-any-character-in-regular-expression/2912904
- https://www.informit.com/articles/article.aspx?p=2064649&seqNum=3
- https://www.geeksforgeeks.org/regex_replace-in-cpp-replace-the-match-of-a-string-using-regex_replace/

My current code (minus the mess of comments) is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
        std::string word;
        while (inFile >> word) {

            std::regex format(".*\".+\".*");
            std::regex desiredFormat("\".+\""); 

            // if the format is:      <any char 0 or more times>"<any char 1 or more times>"<any char 0 or more times>
            // Then replace it to be: "<any char 1 or more times>"

            if ( std::regex_match( word, format ) ) {
                std::regex_replace(word, format, desiredFormat);

                //std::regex_replace( word.begin(), word.end(), format, desiredFormat );
                
                std::cout << "Debug: word = " << word << '\n';
            }
        }


Help would be greatly appreciated. Thanks! :)


Edit:
- Note that I'm using C++17, tried compiling with g++ v9 & Clang v5, neither worked.
- Errors in g++: https://pastebin.com/RtED9m2j

Edit 2:
- If you absolutely need to see my code you may do so with the link below. However, I recommend avoiding it if possible because it's a mess at the moment and comments may not be up to date.
https://pastebin.com/Jv9QuQ3z
Last edited on
> I would use std::regex_replace() for this.
it's overkill to use regex for that task
`string::find()' and `string::rfind()' should suffice


when working with regex, I'll recommend you to use raw literals so you don't have backslash hell.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
std::string text = R"(horsey"test"log)";
std::regex desired(R"(".+")");

std::smatch match;
std::string result1;
//get the first match in the string
if (std::regex_search(text, match, desired))
	result1 = match[0];


std::regex format( R"(.*(".+").*)" ); //note the (group)
//this reads: replace all the matching (in this case the whole string) by the
//contents of the first (group)
std::string result2 = std::regex_replace(text, format, "$1");
note that the resuls have the quotes in them R"("test")", not sure if that's what you wanted
also, those two are not actually equivalent
it depends if you need a greedy or a lazy match for the " character,
for example, ¿what result do you want with R"(hello"oh"brave"new"world)"?
You have turned a problem into two problems...
https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

In any case, what you are looking for is a std::regex_search, matching on a re-pattern of

R"z("([^"]*)")z"

The first sub-match is the quoted text. You can get the location and length from the smatch object. Make sure to look through the docs.

Enjoy!

[edit]
Sorry, was distracted for a bit. Here’s an example you can explore with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <regex>
#include <string>

int main()
{
  std::string s;
  {
    std::cout << "s? ";
    getline( std::cin, s );
  }
  std::string replacement;
  {
    std::cout << "replacement? ";
    getline( std::cin, replacement );
  }
  
  std::regex re{ R"z("[^"]*")z" };
  
  std::cout << std::regex_replace( s, re, "\"" + replacement + "\"" ) << "\n";
}
Last edited on
I'm sorry for my very late reply!

Thank you Duthomhas and ne555 for your responses! Reading them both + doing some searching online allowed me to solve the issue. :)

However, I just realized that I also need to add support for integers and doubles (outside of quotes). I tried adding that in the std::regex desiredFormat line but it gives errors. Here is my current code (minus some notes):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
        std::string line;
        //std::regex desiredFormat(R"("[^"]*")"); // Original line when I was only detecting "contents. like this!" and not ints or doubles outside of quotes.
        
        // Attempted Solution 1:
        std::regex r1(R"("[^"]*")");
        std::regex r2(R"([0-9]*.[0-9]*)");
        std::regex r3("[0-9]*");
        //std::regex desiredFormat(r1|r2|r3); 

        // Attempted Solution 2:
        std::regex desiredFormat(R"("[^"]*")" | R"([0-9]*.[0-9]*)" | "[0-9]*"); 

        // Attempted Solution 3:
        //std::regex desiredFormat((R"("[^"]*")") (R"([0-9]*.[0-9]*)") ("[0-9]*")); 
        
        while (std::getline(inFile, line)) {
            std::cout << "[Debug] Line = \t" << line << "\n";

            // Setup Regex Iterator:
            auto lineBegin = std::sregex_iterator(line.begin(), line.end(), desiredFormat);
            auto lineEnd = std::sregex_iterator();
            // Iterate through line using regex_search():
            for (std::sregex_iterator iter = lineBegin; iter != lineEnd; ++iter) {
                std::smatch match = *iter;
                std::string matchStr = match.str();
                std::cout << matchStr << "\n";
                alphaNums.push_back(matchStr);
            }
        }
        // 5.
        // Closes File:
        inFile.close();
}


Error:
$ g++ -std=c++17 main3.cpp -lstdc++fs -Wall -Wextra -Wshadow -Wnon-virtual-dtor -pedantic
main3.cpp: In function ‘void parseJsonFile(const std::filesystem::__cxx11::path&, std::vector<std::__cxx11::basic_string<char> >&)’:
main3.cpp:269:47: error: invalid operands of types ‘const char [8]’ and ‘const char [14]’ to binary ‘operator|’
  269 |         std::regex desiredFormat(R"("[^"]*")" | R"([0-9]*.[0-9]*)" | "[0-9]*");
      |                                  ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
      |                                  |              |
      |                                  const char [8] const char [14]


Note: If absolutely necessary you may view my full code here but I should warn you it's probably a mess and comments may not be up-to-date:
https://pastebin.com/iFZeY5MS

Help would be greatly appreciated. Thanks!

Edit: Also, the solution I'm trying to do I found from here -
https://stackoverflow.com/a/21983196
And another source I found very helpful was:
https://www.regular-expressions.info/dot.html

Edit 2:
ne555 wrote:
> I would use std::regex_replace() for this.
it's overkill to use regex for that task
`string::find()' and `string::rfind()' should suffice

Thanks for the suggestion! Although I couldn't figure out how to get it to work with any contents inside the quote, without using regex. Some examples:
Line 1:
"Item" this "item:id" ----> "Item" and "item:id" should be stored in a vec.
Line 2:
"Hello World" apple heels seeds "this" ----> Hello World" and "this" should be stored in a vec.
Line 3:
"Cow 32.3" ----> "Cow 32.3" should be stored in a vec.
Last edited on
pos1 = s.find("\"", 0); will give you the first "
then pos2 = s.find("\"", pos1+1); will give you the second
between the first and second position you've got the string that you want
1
2
3
4
5
size_t pos = s.find("\"");
while(pos not_eq std::string::npos){
	std::cout << pos << '\n';
	pos = s.find("\"", pos+1);
}
will show you all the positions where there is a "

> I also need to add support for integers and doubles
the dot "." is a cath-them-all
"[^"]" catch anything that is not "

there is no point in saying catch all and also catch numbers, in your case r1+r2+r3 is simply r1


> R"([0-9]*.[0-9]*)" | "[0-9]*"
the pipe | should be part of the string
R"([0-9]*.[0-9]*|[0-9]*)"
Topic archived. No new replies allowed.