Converting compound literal array to C++ compliant form.

I have some C code I'm adapting to compile under C++. Currently, it compiles as is only in Clang, but I need the code to be compatible with GCC and MSVC. The only complication is the original code heavily makes use of assignment to arrays through compound literals and designated initializers.

The code in question:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
typedef char tagset[TAG_LAST]; // ends up having length of 151

typedef enum
{
	NAMESPACE_HTML,
	NAMESPACE_SVG,
	NAMESPACE_MATHML
} GumboNamespaceEnum;

#define TAG(tag) [TAG_##tag] = (1 << NAMESPACE_HTML)

InsertionLocation foo()
{
   //...
   if(nodeTagInset(retval.target, 
                   (tagset){TAG(TABLE), TAG(TBODY}, 
                            TAG(TFOOT), TAG(THEAD), TAG(TR)})
       return retval;
}


Is a variadiac template function my best choice for converting this to C++, initially till I can refactor the code to more of a C++ style?
Last edited on
The variadic function I've come up with so far is this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
	
template<typename T, typename ...Args>
tagset &Initialize_tagset(T first, Args... args)
{
	static tagset *array;
	array = &Initialize_tagset(args...);
	*array[first] = (1 << NAMESPACE_HTML);
	return *array;
}
	
template<typename T>
tagset &Initialize_tagset(T tag)
{
	static tagset *array;
	*array[tag] = (1 << NAMESPACE_HTML);
	return *array;
}


Would this end up providing the same functionality at runtime?
This line looks like gibberish to me. There is a missing preprocessor directive somewhere.
TAG(tag) [TAG_##tag] = (1 << NAMESPACE_HTML)

However, keep in mind that the static pointer declared on line 14
a.) has no storage associated with it
b.) is independent of the pointer on line 5;

Line 6 will traverse the array, doing nothing, until the last element is reached, at which point the unary overload will be called and the code will cause undefined behavior.

Try instead by list-initializing a std::vector.
http://en.cppreference.com/w/cpp/language/list_initialization
i fixed the missing define before TAG.

The problem is the tagset array I'm converting from C to C++ needs specific specific array values in specific slots for the rest of the code to work.

With list initialization can I assign values to specific slots of the array?

What's going on with the compound literal is
1
2
3
(tagset){[0] = 1, [5] = 1, [10] = 1,  [25] = 1} // These aren't the literal values being
                                                // assigned. It's just an example of what
                                               // the above code is essentially doing. 


That type of syntax works in clang, and actually hid incompatibilities with other compilers for awhile.

I thought since both pointers are the same type I'd be able to assign them to each other? I'm thinking that the above code would cause problems as well.
Last edited on
Something like this, perhaps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <array>
#include <initializer_list>
#include <utility>
#include <stdexcept>

template < typename T >
using position_value_pairs = std::initializer_list< std::pair<std::size_t,T> > ;

template < typename T, std::size_t N >
std::array<T,N> make_tags( position_value_pairs<T> ilist )
{
    std::array<T,N> tags{} ;

    for( const auto& pair : ilist )
    {
        if( pair.first >= N ) throw std::out_of_range( "invalid position" ) ;
        else tags[pair.first] = pair.second ;
    }

    return tags ;
}

int main()
{
    const auto my_tags = make_tags<char,20>( { { 3, 'A' }, { 7, 'B' }, { 12, 'C' } } ) ;
                                            //  [3]='A'     [7]='B'     [12]='C'

    for( char c : my_tags ) std::cout << ( c == 0 ? '.' : c ) ;
    std::cout << '\n' ;
}
Thanks for the help. Now, if I wanted to I could create the array, and return it from a lambda at the method call right? Thus possibly avoiding the overhead of creating arrays for checks that might not be done.

Of course, I stopped at brace initialization, without taking advantage of the initializer_list template class.
> I wanted to I could create the array, and return it from a lambda at the method call right?
> Thus possibly avoiding the overhead of creating arrays for checks that might not be done.

Even if it is written as a function, the optimiser knows enough about what is going on; it would elide unnecessary copying of arrays and eliminate checks that need not be done as dead code.

g++ 6.3 with -O3 -march=native

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <array>
#include <initializer_list>
#include <utility>
#include <stdexcept>

template < typename T >
using position_value_pairs = std::initializer_list< std::pair<std::size_t,T> > ;

template < typename T, std::size_t N >
std::array<T,N> make_tags( position_value_pairs<T> ilist )
{
    std::array<T,N> tags{} ;

    for( const auto& pair : ilist )
    {
        if( pair.first >= N ) throw std::out_of_range( "invalid position" ) ;
        else tags[pair.first] = pair.second ;
    }

    return tags ;
}

int foo()
{
    auto my_tags = make_tags<char,20>( { { 3, 'A' }, { 7, 'B' }, { 12, 'C' } } ) ;
    return +my_tags[3] + my_tags[5];
}

/*
foo():
        mov     eax, 65
        ret
*/

int bar()
{
    auto my_tags = make_tags<char,20>( { { 3, 'A' }, { 7, 'B' }, { 12, 'C' } } ) ;
    for( int i : { 5, 9, 13, 17 } ) if( my_tags[i] == 0 ) my_tags[i] = 1 ;
    return +my_tags[3] + my_tags[5];
}

/*
bar():
        mov     QWORD PTR [rsp-40], 0
        mov     BYTE PTR [rsp-37], 65
        movsx   eax, BYTE PTR [rsp-37]
        mov     BYTE PTR [rsp-35], 1
        movsx   edx, BYTE PTR [rsp-35]
        add     eax, edx
        ret
*/

// a really good optimiser could have generated (assuming that as before 'A' == 65):
/*
bar():
        mov     eax, 66
        ret
*/
// g++ is not terrible; copying of the array and redundant out of range checks have still been elided 

https://godbolt.org/g/BLneB8

Write simple, transparent code and the optimiser understands precisely what is going on.

Write fiendishly clever macros with shift operations and the like (typical poorly written C code), and chances are that the generated code is less efficient (the optimiser can't figure out what is it that the programmer is trying to do; so it plays safe by not attempting to optimise at all).
This is the end solution I ended up coming up with. I kind of need a lambda, since there is one function that must be 300 lines long using those compound literals.

Here is is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
typedef char tagset[TAG_LAST];
	
template <typename T>
using position_value_pairs = initializer_list<std::pair<std::size_t,T>>;
	
template<typename T>
char *makeTags(tagset tags, position_value_pairs<T> ilist)
{
    for(const auto &pair : ilist)
        tags[pair.first] = pair.second;
    return tags;
}

#define MAKE_TAGS(...) []{ tagset tags; return makeTags<char>(tags, {__VA_ARGS__}); }()

#define TAG(tag) { TAG_##tag, (1 << NAMESPACE_HTML) }

InsertionLocation foo()
{
   // ...
   if(nodeTagInset(retval.target, MAKE_TAGS(TAG(TABLE), TAG(TBODY)))
       return retval;
}


Now, I can feel free to get on with working on other compiler support, and over time refactor the parts of that html parser out, which I don't need and hopefully replace the defines with an std::set.
Last edited on
there is one function that must be 300 lines long using those compound literals.

Sounds like a potential candidate for code generation....

Your lambda function returns the address of a local variable. The array's lifetime is not extended; such a function will always return a dangling pointer.

Use a std::array instead of a char*; the std::array will avoid this problem with zero overhead.

I get that the char* could cause issues if the lifetime has to exist longer than the methods using the char* I pass it into the function and it immediately gets tested against the pointer managing the current location of the parser in the html passed in. If it had to live longer, I'd add it to the stack allocator that code base is using. I mainly want to avoid having to rewrite all the methods taking char* to accept std::array. Anyways, I'm going to be refactoring this out soon.
I get that the char* could cause issues if the lifetime has to exist longer than the methods using the char*

What makes you think the array lives even that long?

It doesn't matter what the char* binds to; the array's lifetime ends at the end of the innermost containing block, i.e., the function's closing brace.

Using the result of the function is undefined behavior, so you better "refactor" it soon. ;)
Last edited on
Yeah, ended up replacing it with some string manipulation. I didn't need a full html dom parser, since the html is always the same, and I can just jump through the string till I find a tag.
Topic archived. No new replies allowed.