What is a C-string?

Pages: 12
what is a C-string? Is it possible to show mock code for a c-string?
closed account (3qX21hU5)
If you are using C++ don't even bother using them. They are very error prone and lack a lot of features.

And no we won't show mock code for a c-string because 1 this sounds awfully like a homework question and 2 if you did a simple google search you could find hundreds of examples.
char const *MyVeryOwnDangerousAndUglyCString {"Hello, World!"};
C-string is a character array the last element of which is the terminating zero.

In the example shown by @L B

char const *MyVeryOwnDangerousAndUglyCString {"Hello, World!"};

it is "Hello World!" that is a C-string not MyVeryOwnDangerousAndUglyCString.
MyVeryOwnDangerousAndUglyCString is pointer to char.

Last edited on
It's more of a concept than a language element. char const * is just a pointer to a single constant character, but it is almost always associated with C-Strings or raw data.
Last edited on
To quote the C language standard (C99/C11 §7.1.1/1)

A string is a contiguous sequence of characters terminated by and including the first null character.
@Cubbi

To quote the C language standard (C99/C11 §7.1.1/1)

A string is a contiguous sequence of characters terminated by and including the first null character.


By the way in my opinion it is incorrect definition due to omitting the words character array.:) Though an array is indeed "a contiguous sequence of characters" but any "contiguous sequence of characters" is not becessary an array. For example it can be a structure.:)
No, it can't, because padding could make it not contiguous.
Yes, it can be a structure. All that C requires is what Cubbi quoted (what the C language standard specifies).

What vlad from moscow put forth (the language standard is incorrect due to omitting the words character array) is an opinion. An opinion that is incorrect; the standard is correct.

All that C requires is a " contiguous sequence of characters terminated by a null character". The sequence must be contiguous; whether that sequence happens to be elements of an array or members of a struct or something else is irrelevant.

The compiler may generate a few warnings, but this is conforming C code.
And a variable sized contiguous sequence in struct B is standard technique used by (knowledgeable) programmers who program in C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
 
struct A
{
    char a, b, c, d, e, f ;
};
 
#define DYNAMICALLY_SIZED 1
 
struct B
{
    int sz ;
    char name[DYNAMICALLY_SIZED] ;
};
 
struct B* make_b( const char* name )
{
    struct B* pb = malloc( sizeof( struct B ) + strlen(name) ) ;
    if(pb) { pb->sz = strlen(name) ; strcpy( pb->name, name ) ; }
    return pb ;
}
 
int main()
{
    struct A a = { 'H', 'e', 'l', 'l', 'o', 0 } ;
    printf( "%s [%u]\n", &a, strlen(&a) ) ;
 
    struct B* ptr = make_b( &a ) ;
    puts( ptr->name ) ;
    free(ptr) ;
    
    return 0 ;
}


http://ideone.com/LNEndF
Last edited on
Ok, I stand corrected. Padding still scares me though.
@JLBorges
What vlad from moscow put forth (the language standard is incorrect due to omitting the words character array) is an opinion. An opinion that is incorrect; the standard is correct.


If you would read further what is written in this patagraph you would read that

"A pointer to a string is a pointer to its initial (lowest addressed)
character."

First of all pointers have types.:) Secondly you can speak about "initial character (lowest addressed)" only if the notion of "initial character" is defined. Only character arrays have initial characters. All other memory that has no type has no "initial character". Thirdly you may address sequentially only arrays. For example that to define the length of a string the operation of increasing the pointer and its dereferencing shall be allowed.:)

And you are speaking nonsence about structures. Structures are not string. They can be casted to strings.
I advice to read the following definition

1 object
region of data storage in the execution environment, the contents of which can represent
values
2 NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

Also please read Section 7.23 of the Standard

1 The header <string.h> declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.


@JLBorges
What vlad from moscow put forth (the language standard is incorrect due to omitting the words character array) is an opinion. An opinion that is incorrect; the standard is correct.


And please do not associate you personal opinion with what is said in the standard.:)
> Padding still scares me though.

There is nothing scary about the 'contiguous sequence of characters' (that the C standard talks about).

1. A char has no alignment requirements; no padding is added in a struct where every member is a char. Padding applies only when types having alignment requirements greater than one are involved.

2. Therefore, the strict aliasing rules of both C and C++ allow any object to be inspected as a sequence of contiguous bytes; a pointer to any object can be safely used/cast as a pointer to const char (Hence std::memcpy(), std::ostream::write() etc.)


struct B is only a little bit more involved; it relies on this specification in the C standard: within a struct, members ( C++ adds 'with the same access control') declared later have higher addresses than members declared earlier.

So the layout of
1
2
3
4
5
struct B
{
    int sz ;
    char name[DYNAMICALLY_SIZED] ;
};

is: the sz member at an offset of zero, the single char in name after that.

If we allocate more memory, as in
malloc( sizeof( struct B ) + strlen(name) ), the sequence of bytes starting with the single character in name up to the end of the allocation is a contiguous sequence of chars. And if we put a null character in one of those bytes, we have a C-style string: a contiguous sequence of characters terminated by and including the first null character. And after strcpy( pb->name, name ), pb->name is a C-style string.


> "Structures are not string. They can be casted to strings" - vlad from moscow

A C-style string is not a type; it is a concept. Till you understand that, you will continue to have these kinds of problems.
Last edited on
Only character arrays have initial characters. All other memory that has no type has no "initial character".
So you basically telling that using raw memory from malloc to store c-strimgs is illegal? Or several chars which were allocated sequentally using placement new in C++ cannot be used as c-string? As JLBorges said, string in C is a concept, not some solid type.

Point, where is arrays and specific types there:
1
2
3
4
5
6
7
8
9
10
int32_t* s; /*I use it to represent 32bit chars*/
s = (int32_t*)malloc(sizeof(int32_t) * 32); /*Where is arrays here?*/

/*---*/

void mstrcpy(void* dest, void* src, size_t size); /*Generic multibyte string copying*/
/*...*/
const int char_width = 2;
void* str = malloc(char_width * 32); /*Where is the type here?*/
mstrcpy(str, some_other_string, char_width);
Last edited on
Allocating functions allocate raw memory for objects. Any objects has its address. So allocaring functions return the address where an object will be placed. This has nothing common with strings accept that an array of characters are allocated.
One more read this phrase from the standard

The header <string.h> declares one type and several functions, and defines one
macro useful for manipulating arrays of character type and other objects treated as arrays of character type.

and further

but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array.

So you may say about string only in the context of character array and when other objects are interpretated as character arrays. Structure never be character arrays. But they can interpretated as character arrays by using casting or string functions where "void * argument points to the initial (lowest addressed) character of the array."

That it would be more clear relative to allocating functions I will cite other phrase from the standard

The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.


So all allocating functions allocates an array of characters, But because the type of the result object is not known at the moment of the allocation functions return void *. That is any object representation is in fact a character array.
Last edited on
Loks like we are speaking of different things here. Array != array type. Size of array type should be known in compile time, size of an array type can be deduced by using of sizeof() operator, array types allows you to use subscript operator in multidimensional arrays, etc.
You cannot create an array type by malloc(). It gives you memory area which you can use like array type and which have similar behavior. Try to create multidimensional array type you can access using several chained subscript operators (not array of pointers to another array) in heap and you will see this.
If array in your mind is "several consequent elements in memory", then yes, c-string is an array. But it does not have to be an array type.
Last edited on
I think that I already explained clear enough that strings are character arrays that contain terminating zero. You may speak about some entity only in case that this entity (and in this particular case an object) exists. Strings are objects. Each object has type. Witout types you may not speak about objects. Strings are objects of type character arrays. String literals are objects of type const character arrays.

Structures are not strings and never were strings. :)
Last edited on
C standard guarantees that sizeof() operator will return array size in bytes (where byte is a size of char). Please, find size of following memory area which contains c-string (you are saying that it is an array):
1
2
3
4
5
int x = (rand() % 10) + 5;
char* str = (char*)malloc(x);
x = 0;
strcpy(str, "Hi");
//sizeof(str) ??? 
You can acknowlege that str either is not a c-string (And that means that thousand of programmers are useng not c-strings but something completely different) or that str is not an array, and that means that c-strings could be not of array type.
sizeof( x ) == 4 ( usually on 32 -bit systems)
sizeof( char * ) == 4 ( usually on 32-bit systems )
sizeof( str ) == 4.

So it looks like that you do not understand how the sizeof operator works and what are objects.:)
Last edited on
> I think that I already explained clear enough that ...

That is what you think. It is your right to think what you want, and you can continue to wallow in your ignorance. As far as you are concerned, UTF-8, UCS-2 and UCS-4 strings just do not exist. Fine.

Fortunately for the rest of the world, the C language is careful not to specify the concept of a string in terms of what vlad from moscow 'thinks' it should be - arrays. According to C:
A string is a contiguous sequence of characters terminated by and including the first null character.
.
C can take care of localization requirements, it has mbstowcs() (converts a narrow multibyte character string to wide string), wcsrtombs() (converts a wide string to narrow multibyte character string, given state) etc. Thank you very much.
1
2
3
4
5
6
7
8
char[10] x; //<- an array of char
int y[10]; //<- array of int
sizeof(x); //<- 10
sizeof(y); //<- 40 on my machine
char* z; //<-not an array.
z = malloc(10); //z is still not an array, it merely points to allocated memory 
                //which can be safely used in fashion similar to arrays
sizeof(z); //4 because z is not an array 


And yes, i did not thought about multibyte strings. JLBorges explained enough so I will not speak about it again.
Last edited on
Pages: 12