split a string into substring and return as a 2D char array

hello,

My goal is to make a function that takes a string and some delimites, then it split the string in substring, add them to a 2D char array and finally return a pointer to this 2D array

Sorry but for some reason code formatting is not working!


this is what I was able to do in 3 hours, I am still not very experienced

char **splitString(char str[], const char delimiter[])
{
uint8_t len = strlen(str);
char str_copy[len];
strncpy(str_copy, str, len);

char **keysChar = (char **)malloc(sizeof(char *) * 10);

uint8_t counter = 0;
char *token = strtok(str_copy, delimiter);
while (token != nullptr)
{
DEBUG_PORT << token << "\n";
keysChar[counter] = (char *)malloc(strlen(token));
strcpy(keysChar[counter],token);
token = strtok(NULL, delimiter);
counter++;
}
return keysChar;
}

int main()
{
char *data = "#M=1:T=259:S=31:P=5:A=45:D=78:C=99";

char **tokens = NULL;
tokens = splitString(data, ":=");

for (uint8_t n = 0; n <= sizeof(tokens); n++)
{
DEBUG_PORT << n << ": " << tokens[n] << "\n";
}
}

the output i get is:

#M
1
T
259
S
31
P
5
A
45
D
78
C
99M
0: ;.?.1
1: 1
2: T


I am quite sure that the problem is when i allocate memory for the array:
char **keysChar = (char **)malloc(sizeof(char *) * 10);

how I can correct this? do you think that there are other problems?
Last edited on
Your problem is the terminating 0 or better the lack there of:

1
2
3
uint8_t len = strlen(str);
char str_copy[len]; // Variable length arrays are usually not a good idea (stack overflow)
strncpy(str_copy, str, len); // No terminating 0 is appended 


char *token = strtok(str_copy, delimiter); // This requires the terminating 0


1
2
keysChar[counter] = (char *)malloc(strlen(token)); // It should be strlen(token) + 1
strcpy(keysChar[counter],token); // strcpy((...) appends the terminating 0 --> out of bounds 
Really thanks a lot coder777 your inputs have been perfect, now the function works far away better, It has some little errors to resolve

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>
#include <string.h>

#define LEN(arr) ((int) (sizeof (arr) / sizeof (arr)[0]))

char **splitString(char str[], const char delimiter[])
{
    uint8_t len = strlen(str)+1; // add null terminator
    std::cout << "len: " << (int) len << "\n";
    char str_copy[len];
    strncpy(str_copy, str, len);

    char **keysChar = (char **)malloc(sizeof(char *) * len);

    uint8_t counter = 0;
    char *token = strtok(str_copy, delimiter);
    while (token != nullptr)
    {
        std::cout << token << "\n";
        keysChar[counter] = (char *)malloc(strlen(token)+1); // add null terminator
        strcpy(keysChar[counter],token);
        token = strtok(NULL, delimiter);
        counter++;
    }
    
    keysChar = (char **)realloc(keysChar, sizeof(char *) * counter);
    
    return keysChar;
}

int main()
{
    char *data = "#M=1:T=259:S=31:P=5:A=45:D=78:C=99";
   
    char **tokens = NULL;
    tokens = splitString(data, ":=");
    
    uint8_t size = sizeof (tokens) / sizeof (tokens[0]);
    std::cout << "size: " << (int)size << "\n";
    

    for (uint8_t n = 0; n <= 14; n++)
    {
        std::cout << (int) n  << ": " << tokens[n] << "\n";
    }
}


output:


len: 35
#M
1
T
259
S
31
P
5
A
45
D
78
C
99
size: 1
0: #M
1: 1
2: T
3: 259
4: S
5: 31
6: P
7: 5
8: A
9: 45
10: D
11: 78
12: C
13: 99
14: 


So about the function: there is still a problem with the size of the array
I can choose to set the array size to the maximum possible lenght at the beginning:
char **keysChar = (char **)malloc(sizeof(char *) * len);

then i can reallocate its size to the value of the counter:
keysChar = (char **)realloc(keysChar, sizeof(char *) * counter);
Do you think is a good idea? or should i work harder to find a better way to allocate in the first malloc the exact size?


then the second problem: I am not able to find a way to get the size of the resultin array. I have followed a few answer on stack ovewflow but none of them seems to work.

from https://stackoverflow.com/questions/10274162/how-to-find-2d-array-size-in-c this should work:

uint8_t size = sizeof (tokens) / sizeof (tokens[0]);

but it doesn't
Last edited on
Actually on line 11 you do not copy the 0. User strcpy(...) instead of strncpy(...).

While realloc(...) for a pure c program is ok, in your case line 26 (outside the loop) is wrong. You need to reallocate as soon as counter > len whithin the loop after line 23.

this should work:
It does not work for dynamically allocated arrays.

To determine the end of the array you might allocate one more line than necessary and set it to null. Similar to strings.

Or

you may use a struct like
1
2
3
4
5
struct result
{
  std::size_t size;
  char** data;
};
do everything in your power to minimize calls to realloc. Do not call it in a loop, for example, call it before the loop and add as many as you will loop over (if you know this, and if not, can you find out?). Realloc is very slow, and the more you do it, the more impact it will have on your program's speed. Overalloc the first time (make your best guess at a reasonable starting size) or try to find out how much you need before you get memory when you can. Exact size isn't needed, but if you must guess, then guess bigger than what you expect to need.

size of the array:
you should always know this. Always. Just track it.
It was 10. you realloc and add 3, now its 13. You know this because you called realloc... right there in your code, just lift the resulting size and keep it around. Even if its a variable from reading a file in a loop, somewhere, you can still count and track.

Actually on line 11 you do not copy the 0.


My bad, I didn't notice it!

You need to reallocate as soon as counter > len whithin the loop


Hopefully, if I m not wrong (and today it happened a few times) this won't ever happen, because I set the initial malloc to be the same size of the input string. I will follow jonnin advice to
do everything in your power to minimize calls to realloc.... Overalloc the first time
this way I m over allocating then reducing the size to fit the number of substring.

It does not work for dynamically allocated arrays


I discovered it now!

To determine the end of the array you might allocate one more line than necessary and set it to null



I am not sure if I understood, so if I do something like this: arr[counter++] = "\0“ then I would be able to use the sizeof()?
I am thinking that jonnin idea:
size of the array:
you should always know this. Always. Just track it.... keep it around

is quite good, i rewrote the function to save it, but... now i don't know how to take out the 2D array from it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
void splitString(char** dest_arr, uint8_t* len_dest_arr, char *str, const char delimiter[])
{
    uint8_t str_len = strlen(str) + 1; // add null terminator
    char str_copy[str_len];
    strcpy(str_copy, str); // we work on a copy

    char **sub_string = (char **)malloc(sizeof(char *) * str_len); // over size

    uint8_t counter = 0;
    char *token = strtok(str_copy, delimiter);
    while (token != nullptr)
    {
        sub_string[counter] = (char *)malloc(strlen(token) + 1); // add null terminator
        strcpy(sub_string[counter], token);
        token = strtok(NULL, delimiter);
        counter++;
    }

    sub_string = (char **)realloc(sub_string, sizeof(char *) * counter); // reallocate the right memory
    *len_dest_arr = counter;
}

int main()
{
    char *data = "#M=1:T=259:S=31:P=5:A=45:D=78:C=99";
    uint8_t sz;
    char **tokens = NULL;
    splitString(tokens, &sz, data, ":=");
    
    for (uint8_t n = 0; n <= sz; n++)
    {
        std::cout << (int) n  << ": " << tokens[n] << "\n";
    }
}
Here’s my version of a cstring split function.

This differs from yours (aster’s) in two significant ways:

  • it does not modify the source string
  • it does not use strtok()

split.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#ifndef DUTHOMHAS_SPLIT_CSTRING_H
#define DUTHOMHAS_SPLIT_CSTRING_H

#ifdef __cplusplus
extern "C" {
#endif

char** split( const char* s, const char* delimiters );
/*
  function
    Split a string into substrings by delimiters.
    Consecutive delimiters are treated as a single delimiter.
    Leading and trailing delimiters do not introduce empty substrings.

  arguments
    s          - The string to split into substrings.
                 Must not be NULL.
                 Must be null-terminated.

    delimiters - Null-terminated list of characters used to split the string.
                 May be NULL, in which case the substrings are delimited by
                 whitespace.

  returns
    A NULL-terminated array of char*, one for each substring in s.

    Each substring is duplicated so s is not modified or encumbered
    by references.

    The result must be passed to free() when you are done with it.

  example
    split( "  A  B  C  ", NULL )  →  { "A", "B", "C", NULL }
    split( "", ... )              →  { NULL }
*/

#ifdef __cplusplus
}
#endif

#endif 


split.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <stdlib.h>
#include <string.h>

char** split( const char* s, const char* delimiters )
{
  char** result;
  char** r;
  char*  p = (char*)s;
  size_t n = 0;

  if (!delimiters) delimiters = " \f\n\r\t\v";  // defaults to whitespace

  // First pass: count delimited segments
  p += strspn( p, delimiters );
  while (*p)
  {
    p += strcspn( p, delimiters );  // skip NOT delimiters
    p += strspn ( p, delimiters );  // skip delimiters
    n += 1;
  }

  // Allocate the result array in a single block
  // The result is first an array of char**,
  // immediately followed a copy of the source string s, which we will tokenize.
  result = r = (char**)malloc( sizeof(char*) * (n + 1) + (p - s + 1) );
  for (size_t k = 0; k < n + 1; k++) result[k] = NULL;
  p = (char*)(result + n + 1);
  strcpy( p, s );

  // Second pass: build the result[] array and separate the substrings
  p += strspn( p, delimiters );
  while (1)
  {
    if (!*p) break;  *r++ = p;     p += strcspn( p, delimiters );  // skip NOT delimiters
    if (!*p) break;  *p++ = '\0';  p += strspn ( p, delimiters );  // skip delimiters
  }

  return result;
}


And an example of use:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <stdio.h>
#include <stdlib.h>

#include "split.h"

int main( int argc, char** argv )
{
  FILE*  f;
  char*  s;
  long   size;
  size_t n;
  char** lines;

  if (argc == 1)
  {
    fprintf( stderr, "%s\n", "You must provide a file name!" );
    return 1;
  }

  f = fopen( argv[1], "rb" );
  fseek( f, 0, SEEK_END );
  size = ftell( f );
  fseek( f, 0, SEEK_SET );

  s = (char*)malloc( size + 1 );
  if (s)
  {
    fread( s, size, 1, f );
    s[size] = '\0';

    n = 0;
    lines = split( s, "\r\n" );
    if (lines)
    {
      for (char** line = lines; *line; ++line)
        printf( "%lu: \"%s\"\n", (unsigned long)n++, *line );
      free( lines );
    }

    free( s );
  }
  fclose( f );
}

Heh heh heh... :O)

You can easily modify this to do things like:

  • modify the source string (as your code does) instead of a copy
  • preserve empty substrings
  • split on a specific delimiter sequence instead of any character in the delimiter string

Enjoy. :O)
now i don't know how to take out the 2D array from it
You can still return it:
1
2
3
4
5
6
char ** splitString(char** dest_arr, uint8_t* len_dest_arr, char *str, const char delimiter[])
{
...
    *len_dest_arr = counter;
    return (char **)realloc(sub_string, sizeof(char *) * counter); // reallocate the right memory
}
Yes, I had thought about that (actually now I am doing it until I don't find a better solution) but I would like to understand how to copy sub_string to dest_array

dest_array = sub_string didn't worked
Last edited on
how to copy sub_string to dest_array
The same way you do it with len_dest_arr: You provide a pointer to the data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
void splitString(char*** dest_arr, uint8_t* len_dest_arr, char *str, const char delimiter[]) // Note ***
{
...
    *len_dest_arr = counter;
    *dest_arr = (char **)realloc(sub_string, sizeof(char *) * counter); // reallocate the right memory
}

int main()
{
...
    char **tokens = NULL;
    splitString(&tokens, &sz, data, ":="); // Note &tokens
...
}




Though I recommend that you try to understand what @Duthomhas showed...
Though I recommend that you try to understand what @Duthomhas showed...


I did it, even if I didn't understood all the passages, sometimes there is too much "pointer magic" for my level, I am still learning and this discussion helped me a lot, really thanks to all of you :D

this is the final functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <string.h>

#define print std::cout

void splitString(char*** dest_arr, int* len_dest_arr, char *str, const char *delimiters)
{
    int str_len = strlen(str) + 1; // add null terminator
    char str_copy[str_len];
    strcpy(str_copy, str); // we work on a copy

    char **sub_string = (char **)malloc(sizeof(char *) * str_len); // over size

    uint8_t counter = 0;
    char *token = strtok(str_copy, delimiters); // split until first token
    while (token != nullptr)
    {
        sub_string[counter] = (char *)malloc(strlen(token) + 1); // add null terminator
        strcpy(sub_string[counter], token); // copy token to dest_array
        token = strtok(NULL, delimiters); // continue splitting
        counter++;
    }

    sub_string = (char **)realloc(sub_string, sizeof(char *) * counter); // reallocate the right memory
    *dest_arr = (char **)realloc(sub_string, sizeof(char *) * counter);
    *len_dest_arr = counter;
    //free (sub_string); can't do this because dest_arr point to it
}

int main()
{
    char *data1 = "xM=1:T=259:S=31:P=5:A=45:D=78:C=99";
    int sz1;
    char **sub1 = nullptr;
    
    splitString(&sub1, &sz1, data1, ":=");
    
    print << "\nsize: " << sz1 << "\n";
    for (int n = 0; n < sz1; n++)
    {
        print << n << ": " << sub1[n] << "\n";
    }
    
    print << "\nthanks guys\n";
}


and its output:

size: 14
0: xM
1: 1
2: T
3: 259
4: S
5: 31
6: P
7: 5
8: A
9: 45
10: D
11: 78
12: C
13: 99

thanks guys


Still I am quite sure i can leave without sub_string, I tried in the last 30 minutes to get rid of it but without a lot of success. For my level three-level pointer is head-ache

Inside the function i know that **dest_arr will give me the first element of the array but i am not able to go to the next one
Last edited on
Still I am quite sure i can leave without sub_string, I tried in the last 30 minutes to get rid of it but without a lot of success.
Actually using sub_string is fine. Just line 24 is unnecessary and should be removed.

If you don't want to use sub_string you can replace it with (*dest_arr). Niote the parentheses. The reason for this is that the subscript operator[] has higher precedence than the dereference operator*. See:

http://www.cplusplus.com/doc/tutorial/operators/

Enclosing all sub-statements in parentheses (even those unnecessary because of their precedence) improves code readability.
Is beyond doubt that i wouldn't be able to do it without your help and that i really need to put my hands on a good C/C++ book

I am posting the final version of the function since I hope it may be helpful for others who find this discussion, it should be memory leak free (hope so)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void splitString(char*** dest_arr, size_t * len_dest_arr, const char *str, const char *delimiters)
{
    int str_len = strlen(str) + 1; // add null terminator
    char str_copy[str_len];
    strcpy(str_copy, str); // we work on a copy

    (*dest_arr) = (char **)malloc(sizeof(char *) * str_len); // over size

    uint8_t counter = 0; // limited to 255 sub strings
    char *token = strtok(str_copy, delimiters); // split until first token
    while (token != nullptr)
    {
        (*dest_arr)[counter] = (char *)malloc(strlen(token) + 1); // add null terminator
        strcpy((*dest_arr)[counter], token); // copy token to dest_array
        token = strtok(NULL, delimiters); // continue splitting
        counter++;
    }

    (*dest_arr) = (char **)realloc((*dest_arr), sizeof(char *) * counter); // reallocate the right amount of memory
    *len_dest_arr = counter; // save size
}
Last edited on
really need to put my hands on a good C/C++ book

Books, as in more than one.

https://isocpp.org/wiki/faq/how-to-learn-cpp#buy-several-books

Good online tutorials would help as well:

http://www.cplusplus.com/doc/tutorial/

The tutorial here at cplusplus is outdated, and not likely to ever be updated.

A tutorial that is more up-to-date:

https://www.learncpp.com/
Topic archived. No new replies allowed.