How to emulate a string with char array without oversizing?

closed account (42TXGNh0)
I am sure it is possible, but I can't think up of a better idea than mallocing recursively, without declaring an oversized array.
Can anyone please tell me a better idea than that?
Last edited on
Well, "recursive" isn't the word, but using dynamic allocation (and deallocation) and an oversized array is probably the best you can do.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <string>
using namespace std;

int main()
{
   string s = "Hello, world!";
   cout << "String size is " << s.size() << '\n';
   cout << "String capacity is " << s.capacity() << '\n';

   for ( int i = 0; i < 20; i++ )
   {
      s += 'a';
      cout << s.size() << "  " << s.capacity() << '\n';
   }
}


I don't know how representative it is (or even if it can be controlled) but on my system this gives successive size and capacity as:
String size is 13
String capacity is 15
14  15
15  15
16  30
17  30
18  30
19  30
20  30
21  30
22  30
23  30
24  30
25  30
26  30
27  30
28  30
29  30
30  30
31  60
32  60
33  60


On CPP.sh it gives
String size is 13
String capacity is 13
14  26
15  26
16  26
17  26
18  26
19  26
20  26
21  26
22  26
23  26
24  26
25  26
26  26
27  52
28  52
29  52
30  52
31  52
32  52
33  52
Last edited on
I meant that, as std::string is already oversizing by default, then if you were to emulate it with a char array then you expect to be oversizing and reallocating when necessary.

I'm surprised that std::string seems to use something as exorbitant as double-in-size at each reallocation, though. A smaller growth factor seems more reasonable.
closed account (42TXGNh0)
Whoops, at first I thought that string does not allocate more space than the text needs but it seems that it has proven my assumption wrong.
These points come to mind on this subject:

A few strings with over-allocated storage is of no concern. It matters when there are lots of strings.

There has to be some maximum. RAM itself is not unlimited, but more practically the vast majority of strings come from sources with some sense of size ranges if not exact size.

The most common sources of strings provide one string which can be tested for length before storage allocation, for which reserve (or one malloc, if that's the approach) should suffice.

Note that string's response to a reserve request may not match the input. On a quick test in debug mode a reserve(50) gives a string capacity of 63.

Strings are nasty anyway. We tend to think naively about string length in terms of bytes, but if UTF-8 encoding is the source there can be multi-byte encodings, meaning that the number of bytes of storage will not always equal the number of characters in the string. Some operating systems GUI and system calls default to UTF-16 (two bytes per character) or UTF-32 (4 bytes per character).

The worst case situation is when concatenating to build a string from source where there really is no way to predict the ultimate length of the string (no matter what the locale might impose).

In all cases, though, there must be some absolutely maximum size to be processed. I suggest a process based on that point, assuming a source for string without specification (implied by the loop in @lastchance's post, though instead of < 20 I assume that would be whatever length the user requested).

Simply put, use a scratchpad to build the string. I would suggest a large buffer matching the maximum possible string to take in, and building the string in that scratchpad. The primary goal is to finally get a length from which a "trimmed" storage can be allocated once, and the result copied.

The scratchpad should persist for re-use, which means in threaded use it should be thread local. This avoids several re-allocations associated with dynamic expansion (of any string storage approach), and obviously depends on a practical absolute maximum string size to process.

Where the priority is to avoid wasting space (by overallocation) and/or avoiding the copying associated with frequent re-allocation for dynamic expansion, a scratchpad workplace makes sense to me.


Last edited on
Topic archived. No new replies allowed.