Having issues splitting/formatting a string into a vector based on multiple criteria

I need to split a series of strings that are formatted like this:

ex string 1:<verb> sigh <adverb> ; portend like <object> ; die <adverb> ;
ex string 2:<start> The <object> <verb> tonight. ;


into a vector in which the first nonlimiter (ex:<verb>) is the first element of the vector, and then the rest of the string is broken up into elements by the semicolons.
ex string 1 would result in the vector:
newvec.at(0)= <verb>
newvec.at(1) = sigh <adverb>
newvec.at(2) = portend like <object>
newvect.at(3) = die <adverb>

ex string 2 would result in the vector:
newvec.at(0) = <start>
newvec.at(1)=The <object> <verb> tonight.



I was provided with the following function to break up strings into vectors using regexs as delimiters and it worked fine when I previously needed it, but I'm having trouble figuring out how to use it in this case if I even can.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * Splits the input string based on the (string) regular expression.
 *
 * @param input the string to split.
 * @param regex a string for the regular expression to use to split.
 * @param delim if true, then the regex chars will be returned in the split,
 *              if false, the regex chars are removed.
 * @return a vector of strings that is the input string split based on the
 *         given regular expression.
 */
vector<string> split(const string &input, const string &regex, bool delim = true) {
    std::regex re(regex);

    std::sregex_token_iterator first, last;
    if (delim) {
        first = sregex_token_iterator{input.begin(), input.end(), re};
    } else {
        // the -1 removes the delimiter
        first = sregex_token_iterator{input.begin(), input.end(), re, -1};
    }
    return vector<string>(first, last);
}



Calling new_vec = split(ex_string, ";", false) in my function will split the string into a vector based on the semicolons (and remove the semicolons), but I'm unsure of how to make the first non limiter the first element of the vector. Any help would be greatly appreciated.




Last edited on
Extract the first element, and then use the given split function to split the rest of the string (after the first part).

Something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
int main() {

    const std::string text = "<verb> sigh <adverb> ; portend like <object> ; die <adverb> ;" ;

    // \s* - zero or more white space (leading and trailing)
    // (<.+?>) - one or more characters within '<' and '>' (non-greedy); numbered capture
    const std::regex first_part( "\\s*(<.+?>)\\s*" ) ;
    std::smatch match ;
    if( std::regex_search( text, match, first_part ) ) {
            
        std::vector<std::string> parts { match[1] } ; // match[1] is the captured group

        // split the rest of the string using the given split function
        // match[0].length() is the total number of characters matched by first_part
        // \s*;\s* - zero or more white space, semicolon, zero or more white space
        const auto rest = split( text.substr( match[0].length() ), "\\s*;\\s*", false ) ;

        // insert the parts returned by split into the vector
        parts.insert( parts.end(), rest.begin(), rest.end() ) ;

        for( std::size_t i = 0 ; i < parts.size() ; ++i )
            std::cout << "parts[" << i << "] == " << std::quoted( parts[i] ) << '\n' ;
    }

    else std::cerr << "badly formed text\n" ;
}

http://coliru.stacked-crooked.com/a/84f5ea2c3a2391a7
http://rextester.com/OEFU31685
Topic archived. No new replies allowed.