Is there problem to declare an empty char array?

Pages: 12345

There is no "Standard C++ Assembly". It depends on the hardware.
Compiler Explorer (https://godbolt.org/) is a great tool for checking what assembly various compilers generates. Don't forget to pass optimization flags (e.g. -O2) otherwise the generated assembly instructions will be very suboptimal.


But where do I learn how to read the assembly ;p
You can get a description of each instruction in Compiler Explorer by hovering over the instruction name. If something is not clear you can probably find more information by searching for it on the web. If you want to know how the instructions are stored as bytes in memory you can use the 11010 checkbox.

I don't think you need to understand everything in detail. Personally, I don't.
Last edited on
> So using string literals costs you twice the RAM,
> so what would somebody who is really restricted on RAM do (and I mean really really)?

Focus the attention on using appropriate data structures and algorithms that can minimise memory usage.

Use a good compiler and let it do the low level optimisation;
the "as-if" rule gives the compiler a lot of leeway to optimise things.
https://en.cppreference.com/w/cpp/language/as_if

For example, both foo() and bar() below may generate identical code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
int foo( int a )
{
    int b = 2 ;
    int c = 3 ;
    const int& d = b*c ;
    const int& e = b*2 ;

    const int* f = &d ;
    const int* g = &e ;

    const char* h = "abcd4567efgh" ;
    const char* i = h+4 ;
    
    int* j = new int{*i} ;
    const int k = *j - '4' ;
    delete j ;
    
    if( k == 100 ) a += 1234 ;
    else b += k ;

    return a*b + a - a*c + *f - *g - h[4] + h[6] + k ;
}

int bar()
{
    return 4 ;
}

https://gcc.godbolt.org/z/gdtH4d
Going back to an earlier topic in the thread, on how to allocate size for a string when you don't know how big it will be... first, start with a reasonable length that most strings in your program would fall under. It doesn't really matter. But because standard input is buffered, what you can do is read it in a bit at a time, and resize+copy the array as you read in the string. Normally, people suggest doubling the size of the array each time (this is similar to how std::vector works), to make the number of re-allocations logarithmic.

Note: This might not be "good" C code, because I haven't done this before.
I have not exhaustively tested this, so I don't know if it's 100% safe or how it handles invalid stdin.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
///
/// $ gcc -Wall main.c -o main
///

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

///
/// Return value must be free'd by caller if not null
///
/// I HAVE NO IDEA IF THIS IS 100% SECURE,
/// don't use it for serious purposes without more research/testing.
/// feof not handled.
///
char* get_input(void)
{
    int size = 4; // just for demonstration, probably want to increase this to 256 or more.
    int offset = 0;
    char* buffer = malloc(size);
    
    // http://www.cplusplus.com/reference/cstdio/fgets/
    while (fgets(buffer + offset, size - offset, stdin))
    {        
        // "A terminating null character is automatically appended after the characters copied to str."
        size_t len = strlen(buffer);
        if (len == size - 1)
        {
            //
            // Possible there's more input, need to check for newline.
            //
            if (buffer[len-1] == '\n')
            {
                // at the end of input (user entered newline)
                // replace the newline with a null character
                buffer[len-1] = '\0';
                break;
            }
            else
            {
                //
                // There's more input, because we haven't reached a newline yet.
                // We need to double the size of our buffer and accept more
                //
                char* new_buffer = malloc(2 * size);
                for (int i = 0; i < size; i++)
                {
                    new_buffer[i] = buffer[i];
                }
                new_buffer[size] = '\0'; // just in case?
                free(buffer);
                buffer = new_buffer;
                
                // The -1 here is to handle the fact that fgets
                // always leaves a null character in the last spot it filled.
                // We need to overwrite this next iteration.
                offset = size - 1;
                size = 2 * size;
            }
        }
        else
        {
            // at the end of input (wasn't enough data to fill up the buffer).
            // therefore we know the user entered a newline.
            // replace the newline with a null character
            buffer[len-1] = '\0';
            break;
        }
    }
    
    // TODO optional: Re-allocate with a size of strlen + 1 to free unused space.

    return buffer;
}

int main(void)
{
    char* input = get_input();
    printf("input obtained: ");
    printf("%s\n", input);
    printf("characters read (not including newline): %lu\n", (unsigned long)strlen(input));
    free(input);
}


>main
Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes
input obtained: Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes
characters read (not including newline): 79
Last edited on
But where do I learn how to read the assembly

Take a class in it? Or learn yourself by writing a few small assembly programs?
Assembly language is tedious and difficult but you usually only need to look at 10 or so lines at any given time when looking at spew for a function.

an example, to add 2 numbers, you have something like this
move number 1 from ram to a cpu register //ram could be a variable name or its direct address
move number 2 from ram to a cpu register
activate cpu add instruction.
move result from cpu register back to ram

4 lines of code, at least, just to say x = a+b;
if you had pointers or arrays in that it would go up by another 4 or so instructions to locate the actual value by its offset.

most of them are not hard to read but the instructions and side effects of them have to be fully understood and in some cases are really weird. You may have to look up exactly what some of the instructions do to see what is really happening. And you also need to be prepared to think outside the box; shifting is faster than multiplication, so if you multiply by 4 the compiler may decide to shift instead. But you can't shift-multiply by 5, of course, its not a power of 2, so if you multiply by 4 and then later by 5 the instructions could be different. Fun, eh?
Last edited on
Note that just being able to read assembly isn't enough; as Jonnin says, you need to understand something about how the specific CPU works. You need to understand what registers are, which value is used to keep track of where in the code we are (the instruction pointer), how function parameters are passed around in your specific case (they can go in registers, they can go in memory, they can do both).

Basically, just understanding the meaning of each individual assembly instruction isn't enough.
closed account (42TXGNh0)
Re: The last example given in the post http://www.cplusplus.com/forum/beginner/248142/3/#msg1094100 from Peter87.

Does that means '123' still exists but there is no way to get it back?

1
2
char a[] = "123";
char b[] = "123";


Do both 'a' and 'b' point to the same location in memory?

1
2
3
4
5
 
{
  char a[] = "123";
}
char b[] = "123";


(general question not addressing anybody)
closed account (E0p9LyTq)
Do both 'a' and 'b' point to the same location in memory?

No.

Unless the compiler decided it could optimize things so the memory used is shared.
Do both 'a' and 'b' point to the same location in memory?
No. a and b are independent arrays both initialized with 4 characters ('1' to '3' and 0).

The string "123" does not exist in this scenario.
Oops I meant
1
2
 char *a = "123";
char *b = "123";
Since the OP's last question seems to have gotten ignored due to Grime's hijacking:

ArnoldLai wrote:
Re: The last example given in the post http://www.cplusplus.com/forum/beginner/248142/3/#msg1094100 from Peter87.

Does that means '123' still exists but there is no way to get it back?


No, not at all. Just because the array str has fallen out of scope and been destroyed, the pointer p still exists and has its value:

1
2
3
4
5
6
7
8
9
10
11
12
// p points to the locaton where "abc" is stored (it has to be stored somewhere)
const char* p = "abc";

{
	// str will be a copy of the string literal "123"
	char str[] = "123";

} // str goes out of scope and is destroyed, but the string literal "123"
  // is still stored in memory (it has to be, otherwise how could you 
  // initialize str next time this code is executed?) 

std::cout << "The value of p is " << p;


And even if p has dropped out of scope and been destroyed, you can easily assign any other pointer to point to the same literal:

1
2
3
4
5
6
7
8
9
{
  // p points to the locaton where "abc" is stored (it has to be stored somewhere)
  const char* p = "abc";

} // p goes out of scope and is destroyed

const char* q = "abc";

std::cout << "The value of q is " << q;


Last edited on
1
2
char foo [N];
char bar [N];

Two distinct blocks of memory, allocated from stack. Cannot be overlapping.

1
2
3
4
{
  char foo [N];
}
char bar [N];

Two arrays, allocated from stack.
However, they might not exist at the same time.
If bar is allocated after foo has been deallocated (and stack "unrolled"), then same memory might be reused.

1
2
const char * foo = "1234";
const char * bar = "34";

More than two distinct blocks of memory.
Both foo and bar are separate pointer objects allocated from stack.

Literal constants are stored in separate, read-only memory resource. At least 5 bytes of it will be used, for the values
{ '1', '2', '3', '4', 0 }
It is up to the compiler to decide whether it assigns the address of that literal '3' into the bar, or will it add the three bytes { '3', '4', 0 } separately to the read-only memory area.


Edit: it is quite likely that the compiler does compress the list of literals.
Easy to test too:
lit.cpp
1
2
3
4
5
6
7
8
#include <iostream>

int main()
{
  const char * foo = "Slartibartfast";
  const char * bar = "bartfast";
  std::cout << foo << bar;
}

$ g++ -O2 -Wall -Wextra lit.cpp -o lit
$ strings lit | grep bart
Slartibartfast
Last edited on
keskiverto: I don't understand that last example. Surely the output of strings should be "Slartibartfastbartfast". Why would grep display only "Slartibartfast"?
Also, how could you tell if the strings are being pooled just by looking at strings, rather than the pointers themselves?
1
2
3
4
5
6
7
8
9
10
11
12
bool pooled(const char *a, const char *b){
    auto A = (uintptr_t)a;
    auto B = (uintptr_t)b;
    if (A > B)
        return pooled(b, a);
    auto delta = strlen(a) - strlen(b);
    return A == B - delta;
}

//...

std::cout << (pooled(foo,bar) ? "pooled" : "not pooled");
Last question on this topic I promise.
Are the assembly codes of below different or same?

char a[] = {'h', 'i', '\0'}; char a[] = "hi";

godbolt.org shows the exact assembly for both examples ;'(

I have too less knowledge in computers and C++ in general.. so I got confused (a lot, and some things I still find hard to understand). I know that this is a small detail that doesn't even friggin matter but hey we're 90% through now answer my one last question and that's it for this topic.

And sorry for hijacking this guy's thread ayyyeeee
Last edited on
helios: keskiverto wasn't running the program, he was using the strings command (view the human-readable characters within any file).

Grime:
godbolt.org shows the exact assembly for both examples ;'(
If the assembly is the same, it's the same. In your particular case, I believe they are defined to be the same (the string literal is just the short-hand way of doing it.)

But also, in general, remember the as-if rule (I think it was already linked):
https://en.wikipedia.org/wiki/As-if_rule
The standard for the C++ programming language allows compilers for this language to apply any optimizing transformation to a program during compilation, provided that such optimizations make no change in the "observable behavior" of the program, as specified in the standard
Last edited on
Ah.
closed account (42TXGNh0)
It's okay la, we all here just came for knowledge, that's not a big deal for "hijacking". :P

lit.cpp is resulting in
Slartibartfastbartfast
in http://www.cpp.sh/ ...
Yes, that's correct. The lit.cpp program shown a few posts above is 100% defined behavior.

Whether the address foo + strlen("Slarti") is equal to bar is what is implementation-defined. Thus, you should not rely on logic that assumes this (which you shouldn't need to anyway).
Last edited on
You shouldn't even rely on the same string literal always giving you the same array object.

1
2
3
4
for (int i = 0; i < 2; ++i)
{
	std::cout << (void*) "abc" << '\n';
}

This is not guaranteed to print the same address twice, even though it most likely will.
Pages: 12345