parsing string with sscanf

Hi everyone,
I'm trying to parse a document. This has been working perfectly for me. Until now :p Because now the documents I want to parse are getting more complex. For example:

A line of the document could be something like this:

"f 1/2/3 2/3/4"

and then this would work:

sscanf(line.c_str(),"%*s %d/%d/%d %d/%d/%d", &i1, &i2, &i3, &i4, &i5, &i6);

But now I have this line:

"f 1//2 2/3/"

You can see that some values aren't filled. The previous code will no longer work because sscanf will stop because it wants to read and integer (%d) where it will find a "/". Does anyone now how I need to do this?

Greetings
genzm
Last edited on
does it have to be done in C, or can you use C++ with its streams, tokenizers, regular expressions, or even complete parsers (such as boost.spirit)
Last edited on
It is in c++.
Could you maybe give me an example of how this would work? Or a link to a clear explenation?
the strtok function seems to do exactly the trick I want :p
Thanks for the tip.
strtok is a pretty bad approach.. The appropriate solution depends on the expected result of the parse: are you constructing an object? populating a struct? populating a vector<int>? What happens when you've read 4 out of 6 numbers: are you just building a vector of four ints, or are you indicating that something wasn't provided? In short, that's not enough information.
Indeed I've run into a problem with strtok. So I'll give you some more information on what I'm trying to do:
I'm working on an parser that reads in data from an .obj file (3D model).
The structure of this file looks like this:
f a/b/c d/e/f g/h/i        // for triangles
f a/b/c d/e/f g/h/i j/k/l  // for quads


The letters refer to the index of vertices, texture coordinates and normals. The parsing of the vertices, textures and normals works perfectly. The problem with the faces is that not all data is always included:
For example: not all files contain normals data. A line would like like this:
f a/b/ d/e/ g/h/      // for triangles
f a/b/ d/e/ g/h/ j/k/ // for quads


or not all faces have texture coordinates:
f a//c d//f g//i      // for triangles
f a//c d//f g//i j//l // for quads


or the file might not contain any data;
f a d g   // for triangles
f a d g j // for quads


Or the file might contain any of the previous lines combined. All this variation makes it difficult to parse it properly.

I have a struct called "face" which looks like this:
1
2
3
4
5
6
typedef struct {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
} face;


And of course I'm loading in all the data from the file into this struct. What I want to achieve is that all values which aren't included in the file (like texture coordinates or normals) should be -1.

Any suggestions on how I can do this?
Any suggestions on how I can do this?

A manual parse, without using any libraries, would look something long and boring like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <tuple>

struct face {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
};

std::tuple<int, int, int> parse_three_ints(const std::string& s)
{
    std::istringstream buf(s);
    int r1 = -1, r2 = -1, r3 = -1;
    std::string token;
    if(getline(buf, token, '/') && !token.empty())
        r1 = stoi(token);
    if(getline(buf, token, '/') && !token.empty())
        r2 = stoi(token);
    if(getline(buf, token, '/') && !token.empty())
        r3 = stoi(token);
    return std::make_tuple(r1, r2, r3);
}

int main()
{
    std::istringstream input("f 1/2/3 4/5/6 7/8/9\n"
                             "f 10/11/12 13/14/15 16/17/18 19/20/21\n"
                             "f 22/23/ 24/25/ 26/27/\n"
                             "f 28/29/ 30/31/ 32/33/ 34/35/\n"
                             "f 36//37 38//39 40//41\n"
                             "f 42//43 44//45 46//47 48//49\n"
                             "f 50 51 52\n"
                             "f 53 54 55 56\n");
    std::vector<face> result;
    std::string line;
    while(getline(input, line)) // process line by line
    {
        std::istringstream buf(line);
        std::string word;
        buf >> word;
        if(word != "f")
        {
            std::cout << "Parse error, line begins with " << word << '\n';
            break;
        }
        // prepare the new face
        face f;
        buf >> word;
        std::tie(f.i1, f.i2, f.i3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.n1, f.n2, f.n3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.m1, f.m2, f.m3) = parse_three_ints(word);
        buf >> word;
        std::tie(f.t1, f.t2, f.t3) = parse_three_ints(word);
        result.push_back(f);
    }

    // output
    for(face& f: result)
        std::cout << "{ " << f.i1 << ',' << f.i2 << ',' << f.i3 << '\n'
                  << "  " << f.n1 << ',' << f.n2 << ',' << f.n3 << '\n'
                  << "  " << f.m1 << ',' << f.m2 << ',' << f.m3 << '\n'
                  << "  " << f.t1 << ',' << f.t2 << ',' << f.t3 << " }\n";
}

online demo: http://liveworkspace.org/code/3975xd

But we have boost.spirit for this kinda thing (which is much faster, too)

It can be done prettier, that BNF is worth structuring, but here's my first attempt that works for this test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>

#define FUSION_MAX_VECTOR_SIZE 12
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_int.hpp>
#include <boost/spirit/include/qi_no_skip.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

struct face {
    int i1,i2,i3;
    int n1,n2,n3;
    int m1,m2,m3;
    int t1,t2,t3;
};

BOOST_FUSION_ADAPT_STRUCT(
    face,
    (int, i1) (int, i2) (int, i3)
    (int, n1) (int, n2) (int, n3)
    (int, m1) (int, m2) (int, m3)
    (int, t1) (int, t2) (int, t3)
)

// have to use this macro instead of the regular auto for named micro-parsers, until Spirit V3
#define BOOST_SPIRIT_AUTO(domain_, name, expr)                                  \
    typedef BOOST_TYPEOF(expr) name##expr_type;                                 \
    BOOST_SPIRIT_ASSERT_MATCH(boost::spirit::domain_::domain, name##expr_type); \
    BOOST_AUTO(name, boost::proto::deep_copy(expr));                            \

int main()
{
    std::string input("f 1/2/3 4/5/6 7/8/9\n"
                      "f 10/11/12 13/14/15 16/17/18 19/20/21\n"
                      "f 22/23/ 24/25/ 26/27/\n"
                      "f 28/29/ 30/31/ 32/33/ 34/35/\n"
                      "f 36//37 38//39 40//41\n"
                      "f 42//43 44//45 46//47 48//49\n"
                      "f 50 51 52\n"
                      "f 53 54 55 56\n");
    std::vector<face> result;

    BOOST_SPIRIT_AUTO(qi, optint, qi::no_skip[qi::int_] | qi::attr(-1));
    BOOST_SPIRIT_AUTO(qi, triple, qi::int_ >> ( ('/' >> optint >> '/') | qi::attr(-1) ) >> optint);

    qi::phrase_parse(input.begin(), input.end(),
                     *(   ('f' >> triple >> triple >> triple >> triple )
                        | ('f' >> triple >> triple >> triple >> qi::attr(-1) >> qi::attr(-1) >> qi::attr(-1) )
                      ),
                     ascii::space, result );

    for(face& f: result)
    std::cout << "{ " << f.i1 << ',' << f.i2 << ',' << f.i3 << '\n'
              << "  " << f.n1 << ',' << f.n2 << ',' << f.n3 << '\n'
              << "  " << f.m1 << ',' << f.m2 << ',' << f.m3 << '\n'
              << "  " << f.t1 << ',' << f.t2 << ',' << f.t3 << " }\n";
}

Last edited on
Topic archived. No new replies allowed.