Allocating too frequently? C string needed?

Objective: learn about parsing in general and strtok() in particular.

Progress: I started with the example on http://www.cplusplus.com/reference/cstring/strtok/ and modified it to read file input and write file output. On that same page I read, "On a first call, the function [strtok()] expects a C string as argument for str...". So I looked up how to convert to a C string and I found http://www.cplusplus.com/reference/string/string/c_str/ then I "wrote" or "copied and modified" the following little program to parse words. Later I intend to expand it to parse sentences and paragraphs.

My concern/question: the program works but I'm thinking that allocating memory (line 19) and deallocating it (line 29) every time I read a line is probably an inefficient practice. What leads me to do it is that I don't know how much memory to allocate until I see how big the line is. Is there a better way to do this? It also seems awkward to have to convert to a C string. Can that be avoided? Thanks, Dane

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
 1  // ParsePlay02
 2  #include <iostream>
 3  #include <cstring>
 4  #include <fstream>  
 5  
 6  int main()
 7  {
 8      std::string stringIn;
 9      char * tok;
10      std::ifstream inText;
11      std::ofstream outText;
12  
13      inText.open("_inText.txt");
14      outText.open("_outText.txt");
15      if (inText.is_open())
16      {
17          while(getline(inText,stringIn))
18          {
19              char * cstringIn = new char [stringIn.length()+1];
20              strcpy (cstringIn, stringIn.c_str());
21              // cstringIn now contains a c-string copy of stringIn
22              tok = strtok (cstringIn," ,.-()");
23              while (tok != NULL)
24              {
25                  outText << tok << std::endl;
26                  tok = strtok (NULL," ,.-()");
27              }
28              outText << "stringIn==>" << stringIn << "<=\n";
29              delete[] cstringIn;
30          }
31      }
32      else
33      {
34          std::cout << "_inText.txt did not open\n";
35      }
36      inText.close();
37      outText.close();
38      return 0;
39  }


Output:

In
the
standard
library
both
representations
for
strings
C
strings
and
library
strings
coexist
and
most
functions
requiring
strings
are
overloaded
to
support
both
stringIn==>In the standard library, both representations for strings (C-strings and library strings) coexist, and most functions requiring strings are overloaded to support both.<=
For
example
cin
and
cout
support
null
terminated
sequences
directly
allowing
them
to
be
directly
extracted
from
cin
or
inserted
into
cout
just
like
strings
stringIn==>For example, cin and cout support null-terminated sequences directly, allowing them to be directly extracted from cin or inserted into cout, just like strings.<=
Yes, it is excess allocation.
You might either store size of current buffer and reallocate only if you need larger storage, allocate large enough buffer for all strings (and use previous solution too, just in case) or use internal storage of std::string itself (guaranteed to work in C++11):
char* cstring = &stringIn[0];
I would question your use of a caracter pointer altogether, especially since you're reading into a std::string to begin with. Look at the member function std::string::c_str ( http://en.cppreference.com/w/cpp/string/basic_string/c_str ).

In any case, if you must allocate, then there is no way to avoid the above code. The alternative is realloc, which essentially does the same thing. The only thing I would change is move the pointer outside the loop. Initialize it to null, and allocate it when you need to use it.
Kudos for thinking about performance.

Yes, it's excess allocation, but that isn't the end of the world. The optimizations aren't worth the extra complexity and chance of introducing a bug, but that's just my opinion.

A couple of suggests:
- use strdup() to duplicate a C string:
1
2
char * cstringIn = strdup(stringIn.c_str());
... free(cstringIn);  // strdup allocates with malloc, so deallocate with free 


Also, consider using a for loop any time you have initialization, test, and increment:
1
2
3
for (tok = strtok (cstringIn," ,.-()"); tok; tok = strtok(NULL, " ,.-()")) {
       outText << tok << std::endl;
}


Finally, be aware that strtok isn't thread safe, meaning that you can't use it in a multi-threaded program. Use strtok_r() instead if it's available.
Thanks to all three of you. I'm glad I asked. I learned much from your replies. I was hesitant to submit this, my first question. I've seen some harsh responses and I'm relieved that yours were a different "h" word: helpful. Best regards, Dane
Topic archived. No new replies allowed.