MinGW custom codecvt facet VTABLE error

Jan 23, 2010 at 5:17pm
Hey gurus,

I've finally found the little beast I've been looking for to play with UTF conversions: the std::codecvt facets.

I've written a custom template class in an .hpp file (that derives from std::codecvt, of course) and everything seems to compile smoothly but then I get these errors from the linker:
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x10): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_out(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, char const*, char const*, char const*&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x14): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_unshift(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x18): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_in(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t const*, wchar_t const*, wchar_t const*&, char*, char*, char*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x1c): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_encoding() const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x20): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_always_noconv() const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x24): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_length(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t const*, wchar_t const*, unsigned int) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x28): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_max_length() const'
collect2: ld returned 1 exit status

This is infuriating me because I have defined all these objects, exactly as specified, but the linker doesn't seem to like it.

For example, here's my do_length() method:
1
2
3
4
5
6
7
8
9
10
        virtual int do_length(
                state_type&  state,
          const extern_type* from_begin,
          const extern_type* from_end,
                size_t       max
                )            const
          {
          size_t dist = from_end - from_begin;
          return (int)((max < dist) ? max : dist);
          }

Does anyone know what is going on? How do I get the linker to play nice with the vtable?

Thank you for your time.
Jan 24, 2010 at 9:36pm
Have you tried a different compiler? It could be a bug. Would it be possible for you to post the entire code? I'd like to try compiling it myself.
Jan 24, 2010 at 11:58pm
The only time I've gotten those kinds of errors are when I have circular inclusion somewhere, but I doubt you'd have those and not notice them.
Jan 25, 2010 at 2:12am
It appears to be a compiler problem. Borland's bcc32.exe had no problem with it.

I found something online that says GCC has problems with pure virtual functions that are only defined in header files, and I tried putting them in a source file, but as the whole class is a template class, I don't know better what to do. GCC still complains. (Hence this post.)

And no, there are no circular dependencies... :-)

I'll post the whole thing once I'm done, and helios can tell me how to make the GCC behave.
Jan 25, 2010 at 3:36pm
Silly question... and it may not be the solution to your problem. But is the codecvt a C or C++ compiled library?

If C, perhaps try to wrap the header usage and any function forward declarations for that library with:
1
2
3
4
5
6
7
8
9
 #ifdef __cplusplus
 extern "C" {
 #endif 

   // Library include and forward declarations here
 
 #ifdef __cplusplus
 }
 #endif  
Jan 25, 2010 at 4:24pm
Jan 26, 2010 at 2:53am
The more I play with it, the more I think it unlikely that I'll ever make it work.

Borland's C++ doesn't seem to use the codecvt class. So, while it compiles cleanly, it doesn't do anything (not even throw errors).

I can get GCC to compile now, thanks to
Reading UTF-8 with C++ streams
http://www.codeproject.com/KB/stl/utf8facet.aspx
but it seems like so much ad-hockery that I'm thoroughly disgusted. I can't make my type a template on the internal type (which may or may not be wchar_t) without all the obnoxious VTABLE crap, and even when I get it to compile when I finally say:
 
  outf << s << endl;

the program crashes with a bad_cast exception. What!? Is it because I used my own type instead of std::mbstate_t? (None of my custom codecvt class's methods are getting called, so it is happening before.)

Maybe I'll just derive my own fstream class and make it do UTF-8/CESU-8/UTF-16[BE/LE]/UTF-32 conversions without all the silly nonsense.

AAARRRGGHHH!
Hence the downfall of template programming.
I guess I'll complicate my code and use the std::mbstate_t and see if I can't get it to work... Oh wait, emilio already did that! Might as well use his code. Even though I can't stand it.

What a bunch of cruft.
Jan 26, 2010 at 12:56pm
Is whatever advantage this will bring really worth all this trouble?
Jan 27, 2010 at 4:14am
A guilt-free, inline, automatic UTF filter? Of course. :-)

I just found a wonderful site:
http://www.unc.edu/depts/case/pgi/pgC++_lib/stdlibug/def_8655.htm

If the nice explanation there works, I'll post my results so you can all use it.
Jan 27, 2010 at 10:14am
Remember when I said "be clever with your algorithms, not with your syntax" (or something to that effect)? Wouldn't all this effort be better spent on a nice, simple, and robust conversion function?
Jan 27, 2010 at 12:58pm
I already have those.

The problem is that I have to know something about the stream to read it. That is, once I open it, I have to check what kind of stream it is, then everywhere in my code that I read data from said stream, I must select the appropriate transformer and use it. That's a lot of work just to read from any UTF stream.

I'd rather simply open the stream and begin reading.
Or open the stream, specify an encoding, and begin writing.

The C++ standard provides the codecvt facet stuff for that very purpose: transparent I/O transformation. Its structure localizes all the messy details into one spot, so you don't have to deal with it elsewhere. That's a clever algorithm.

I'm trying to learn how to use it.
Jan 27, 2010 at 1:58pm
Alright then.

EDIT: Oh, I just remembered. What MinGW did you try compiling with? 3.x has given me quite a few headaches, that's why I'm asking. You should try 4.x if that's the case.
Last edited on Jan 27, 2010 at 3:06pm
Jan 27, 2010 at 4:38pm
Thanks. I've been using 4.3.0 all this time, alas.
Feb 3, 2010 at 4:29am
Well, this is what I have learned.

The codecvt facet can be modified, and I figured out how to do it in GCC... It required me to first derive from the GXXLIB base class __codecvt_abstract_base, after which I can provide my own derived extension! Like so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#ifdef __GLIBCXX__
namespace std
  {
  using duthomhas::utf::utf_state_t;

  template <typename CharType>
  class codecvt <CharType, char, utf_state_t> :
    public __codecvt_abstract_base <CharType, char, utf_state_t>
    {
    public:
      static locale::id id;

    protected:
      explict codecvt( size_t refs = 0 ):
        __codecvt_abstract_base <CharType, char, utf_state_t> ( refs )
        { }
    };
  }
#endif 
I didn't do that initially, hence the compiler was giving me the vtable errors -- meaning that essentially there was a disconnect between a (missing) base class and my derived class over the templated state type utf_state_t.


Also, I have learned that all this stuff is left to be implementation defined by the C++ standard -- meaning that I can make it work in some specific compiler/libc++ combination, but that it is not possible to do it portably.

Which leaves me where I started: the STL is broken. It provides no internationalization support other than that provided by incompatible (vendor-specific) extensions, and it does not let me provide my own extension (using the proper thing) in a portable way.


At this point, I'm not sure what I will do. Perhaps I will write an iostream wrapper that will do it properly.

Fooey.
Topic archived. No new replies allowed.