what is the size of the string?

The size of string in DEV 5.11 is 8 bytes using sizeof() operator.
But in case of Microsoft visual studio the size of the string is 40 bytes also by using sizeof() operator.

Can anybody tell me why this partiality is taken place.... ;)
You're going to need to show some code that illustrates the problem.

As a guess it looks like in one instance you're getting the size of a pointer and in the other possibly the size of an array of char.

Could well be SSO. SSO, or "Short String Optimization", is a possible optimization that can be used to remove heap allocations for small strings.

On Dev 5.11, probably using 32-bit MinGW or something, the string class likely doesn't have that optimization. Hence, my guess is all it stores is two things: a pointer to an array on the heap (4 bytes), and an integer to keep track of the length of said array (another 4 bytes). (Normally I'd expect there to be a third var, for capacity, but I have no idea where that's gone...)

MSVS, however, may be using SSO in that circumstance. I won't go into the details, but basically the class (in addition to the size+capacity vars) has a union consisting of the pointer to the heap and a (32 byte?) array to store the string. If the length of the string is less than 32, it can store it locally in the array, and get rid all the heap allocations. Only once it gets longer than that does it have to use a pointer instead.

Keep in mind I've probably completely fluffed the explanation/gotten it wrong somehow; look up SSO to get a better idea (I'm just doing this off of vague memories on the top of my head). Still, it's the first explanation that comes to mind.
Last edited on
Anatomy of a C++ string is a great question.

In general, a string is effectively a union between
* an array of char and a length
* vector<char> (allocator, begin pointer, end pointer or size, capacity pointer or size)

some of the four elements of the vector usually stick out of the union to be able to tell whether a given string object is using the array or the vector, but see below for a hack used by libc++

There are only three non-obsolete C++ standard library implementations, here's what they do:

MSVC 2015 for x64 target, Release mode: 32 bytes
0 bytes std::allocator<char> (this allocator is stateless)
16 byte union between:
* a 16-byte char array (to hold small strings, max small string length is 15)
* an 8-byte pointer (to point to long strings)
8 bytes for current length of the string
8 bytes for current capacity of the string

16+8+8 = 32

MSVC 2015 x64 target, Debug mode: 40 bytes (sounds like what you saw)
same as above, except there is an extra 8-byte pointer after the allocator and before the union. This pointer points to something MS calls "container proxy", which is used to do bounds checking on iterators.

GNU libstdc++ 7.1, x64 target: 32 bytes
0 bytes std::allocator<char> (this allocator is stateless)
8 bytes pointer to the beginning of the string (for small strings, it points just 16 bytes down)
8 bytes for current length of the string
16 bytes union between
* a 16-byte char array (to hold small strings, max small string length is 15)
* 8 bytes for current capacity of the string

8+8+16 = 32

LLVM libc++ 4.0, x64 target: 24 bytes
0 bytes std::allocator<char> (this allocator is stateless)
24 bytes union between
* short string representation, consisting of
** one byte size multiplied by 2(!)
** a 23-byte char array (to hold small strings, max small string length is 22)
* long string representation, consisting of
** 8-byte capacity with the least significant bit always set
** 8-byte size
** 8-byte pointer to the start of the string

(to check if the string is long or short, libc++ looks at the least significant bit of the first byte of the union: short string size is always recorded doubled, so for short strings that bit is always clear, while long string capacity always has that put purposefully set)

1+23 == 8 + 8 + 8 == 24


PS "DEV 5.11" is not a compiler, but "8 bytes using sizeof() operator" sounds like it's using a pre-C++11 GNU library, where it was quite different.
Last edited on
All that this shows is that the implementation of std::string is up to the library developer. You cannot rely on the size. You can only rely on the properties that are published in the API.
You cannot rely on the size

of course, but you can tell MS and gcc that they are being wasteful
Topic archived. No new replies allowed.