Classes and rules

Pages: 12
Hi all,

I'm rather new to this case and am struggling to figure it out so that I can use it in programs.

My first question goes to the rule of five on https://en.cppreference.com/w/cpp/language/rule_of_three

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class rule_of_five
{
    char* cstring; // raw pointer used as a handle to a dynamically-allocated memory block
 public:
    rule_of_five(const char* s = "")
    : cstring(nullptr)
    { 
        if (s) {
            std::size_t n = std::strlen(s) + 1;
            cstring = new char[n];      // allocate
            std::memcpy(cstring, s, n); // populate 
        } 
    }
 
    ~rule_of_five()
    {
        delete[] cstring;  // deallocate
    }
 
    rule_of_five(const rule_of_five& other) // copy constructor
    : rule_of_five(other.cstring)
    {}
 
    rule_of_five(rule_of_five&& other) noexcept // move constructor
    : cstring(std::exchange(other.cstring, nullptr))
    {}
 
    rule_of_five& operator=(const rule_of_five& other) // copy assignment
    {
         return *this = rule_of_five(other);
    }
 
    rule_of_five& operator=(rule_of_five&& other) noexcept // move assignment
    {
        std::swap(cstring, other.cstring);
        return *this;
    }
 
// alternatively, replace both assignment operators with 
//  rule_of_five& operator=(rule_of_five other) noexcept
//  {
//      std::swap(cstring, other.cstring);
//      return *this;
//  }
};


And the question is, why do we need code:

1
2
3
4
5
if (s) {
            std::size_t n = std::strlen(s) + 1;
            cstring = new char[n];      // allocate
            std::memcpy(cstring, s, n); // populate 
        } 


Why not:

if (s) cstring = s;
instead?

When the zero-terminated pointers both point to the same address (here s and cstring), after the assignment above, then both have the same value. Not?

PS: I'm not expert in C++, so please use simple language. :)


Last edited on
Hi,

On line 6, cstring is initialized to a nullptr, so that it is initialized to something, at this point we haven't said how big.

Line 9 decides how much room is needed, the +1 is for the null character at the end of a c style string.
Line 10 allocates an array of n bytes of memory.
Line 11 copies the contents of s into cstring.

The thing is: one can't copy a c style string by just assigning the address. Your code would probably seg fault as soon as an attempt to use cstring was made.

This code looks like the absolute bare bones of a C++ string class, so it has it's own dynamic memory management, and uses C style char arrays . I hope you are not going to write ordinary C++ code like this? I mean: don't focus on how it does the memory management - just think about when to provide 0, 3, or 5 of the special member functions. The main idea of providing these special functions is to make your class behave like the ones in the STL do.
Think about this code:

1
2
3
4
5
6
int main()
{
    {
        rule_of_file rof("Hello World!");
    }
}


If your suggestion were valid, when line 4 completes, rof.cstring (a non-const char*) points to a string literal. If your class tried to manipulate cstring (say, cstring[0] = 'a'), this would be a fault.

Then consider when rof goes out of scope in line 5. The destructor would be called and delete rof.cstring. Again, this is a string literal and would result in a fault.

Try this one:

1
2
3
4
5
6
7
8
9
int main()
{
    char* my_c_string = new char[100];
    strcpy(my_c_string, "Hello World");
    {
        rule_of_file rof(my_c_string);
    }
    delete my_c_string;
}


Now, main thinks that rof will treat my_c_string as a const char* (because that's what the header file says it will do), but rule_of_five stores it as a char*.

Also, when leaving scope, rof will (correctly) delete cstring. Then, main() deletes my_c_string, which is double deletion, causing all sorts of other problems.
Why not:

if (s) cstring = s;
instead?

to put it simply, this is array/pointer syntax and the 'why' is that there is no good way to make the compiler understand what you wanted here.

you effectively have
char cs[somesize];
char s[someothersize];

what does s = cs mean?
does it mean copy ALL of the bytes in cs to s? What if cs is bigger? what you wanted was 'copy from the start of cs until the first zero valued byte in cs into '. But maybe you didn't ... maybe this is binary data, not text strings? How would it know? And it gets even worse with char*s because then it is unclear if you want the pointer copied or the data copied with such a statement.
c's solution to this difficult problem was to not allow it: you cannot copy arrays with an assignment operator. You can do it with a loop. (memcpy is an optimized loop, as is strcpy). When using C pointers and arrays, you copy it with a loop, one way or another.

c++ strings allow assignment operator, which, as you can probably guess by now, is hiding a loop from you :) ALL the STL containers hide some (complex) operations in their assignment operators; this is why making blind copies of them without regard can be a performance hit.

there are some complexities behind what I am saying, but you asked for simple explain.
Last edited on
Thanks to all.

@TheIdeasMan:
No I never tries to use C style strings; C++ string are much better and easier to use, as you know. I was just reviewing the code. :)
Thanks for your answer. I got it.

@jpnnin:
One question on:
maybe this is binary data, not text strings

Isn't binary data all 0, 1? And isn't any value, be it a text, number etc will eventually be stored in binary, 0, 1?

Thank you for you other explanations too. ?Yeah, I got them because they are simply said. :)


My other question:
On line 30, where the copy assignment method calls the constructor, rule_of_five(other), other is not a const char*, but a const reference to an object of the class! These two don't match, do they? (Different argument types)

As well as, the constructor doesn't return something to consequently to set to *this on that line!

Last edited on
frek wrote:
@TheIdeasMan:
No I never tries to use C style strings; C++ string are much better and easier to use, as you know. I was just reviewing the code. :)
Thanks for your answer. I got it.


Also the use of new : I notice you have it in other code too (Your topic about the Shapes). You should try to avoid using new, instead use: 1. A STL container; 2. A smart pointer like std::unique_ptr or std::shared_ptr. They all put their data on the heap already and probably use new or a smart pointer internally. The other thing to consider is if one puts instances of your class into a std container, it will all be on the heap.

The biggest problem with new is if an exception is thrown: the destructor is never reached, and neither is the delete .

Read up about RAII :+)

frek wrote:
Isn't binary data all 0, 1? And isn't any value, be it a text, number etc will eventually be stored in binary, 0, 1?


What jonnin is saying is that binary data may not have a terminating null character like a c string does, so where does it end?



Regards :+D
One question on:
maybe this is binary data, not text strings

Isn't binary data all 0, 1? And isn't any value, be it a text, number etc will eventually be stored in binary, 0, 1?

Thank you for you other explanations too. ?Yeah, I got them because they are simply said. :)

-------------
well, technically binary is zeros and ones, yes.
but in a computer, the smallest thing you can look at is a byte, on all standard models that is.
a byte is a group of 8 bits, and just *happens* to be the same as an extended ascii char, so c++'s char type is a byte. So when reading a binary file like a jpg image, you may do that into a vector (advanced array) of char type (often, we use unsigned char type here, but that isnt necessary). You will note that when you look at a file at the operating system level, its size is given in *bytes*! Now, if you did read a jpg file into a block of chars, odds are extremely high that before you reached the end of the file, at least one byte in there would have been zero. If the compiler assumed it was a text string, it would stop there, and you would lose a chunk of your file. The compiler can't guess here.

you can get to the individual bits of a byte (or a group of them like an int) using logical operations ( int x = 3; if(x & 2) cout << "second bit is 1";) (NOTE & not && in the if!) and c++ has some bitwise tools (vectors of booleans are usually stored compacted, and we have bitset, and bitfields, and more). You won't use the bit tools too often, but understand that they are really splitting up bytes into bits FOR you, to make life easier and code more readable. The CPU itself can't do anything to a single bit.
Last edited on
On line 30, where the copy assignment method calls the constructor, rule_of_five(other), other is not a const char*, but a const reference to an object of the class! These two don't match, do they? (Different argument types)
I had to run your code in a debugger to figure this one out:
1
2
3
4
    rule_of_five& operator=(const rule_of_five& other) // copy assignment
    {
         return *this = rule_of_five(other);  // recursive call to the assignment operator??
    }


It turns out that that it isn't recursive:
1
2
3
4
5
6
7
8
9
10
11
    rule_of_five& operator=(const rule_of_five& other) // copy assignment
    {
         return *this = rule_of_five(other); // calls operator=(rule_of_five &&) below
    }
 
    rule_of_five& operator=(rule_of_five&& other) noexcept // move assignment
    {
        std::swap(cstring, other.cstring);
        return *this;
    }
 


The move assignment struck me as a little odd too. I was under the impression that a move assignment should move the data, leaving the right-hand-side argument in some legal state equivalent to newly initialized. This one leaves it in a good state but with unexpected data. I can see the performance advantage: the source data won't get deleted until it must.
@IdeasMan, I understood your answers very well. Thanks. Some more things to learn from you. :)

2. A smart pointer like std::unique_ptr or std::shared_ptr.

Where to use each one in programs, in simple language? I personally think the shared one is for times we transfer it from a part of code to another, E.g., to send it as a function argument or return value etc., and the unique one is for occasions we want it to be deleted when it goes out of scope. Right?

it will all be on the heap

I also suppose the difference between "heap" and "stack" is but the way some memory is allocated either automatically by declaring a variable (stack) or manually by new (heap), right?

But when we have an infinite loop in a program and after a while the compiler gives us the error "stack overflow", does it mean "all" accessible memory (say, we have 3 GB out of 4 GB free RAM to use) is utilized and there is no more memory to allocate for the loop, therefore we get that error?

What jonnin is saying is that binary data may not have a terminating null character like a c string does, so where does it end?

Thanks, I got it too. :)
So normally we have two types of data namely: binary data and literal strings data ending in a null character while binary data doesn't. Right?
Last edited on
@dhayden
Thank you for the answer and I got your first assumption which was recursive (although it changed). But I still can't comprehend what makes that statement to call the move assignment and not the copy assignment.
frek wrote:
Where to use each one in programs, in simple language? I personally think the shared one is for times we transfer it from a part of code to another, E.g., to send it as a function argument or return value etc., and the unique one is for occasions we want it to be deleted when it goes out of scope. Right?


Unique pointers are for single ownership, and you should these where possible. Shared pointers are for shared ownership.

https://stackoverflow.com/questions/7657718/when-to-use-shared-ptr-and-when-to-use-raw-pointers

frek wrote:

I also suppose the difference between "heap" and "stack" is but the way some memory is allocated either automatically by declaring a variable (stack) or manually by new (heap), right?


Stack Overflow is a very good resource to get answers:

https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap

Although you should note they still include examples using new and malloc from C, which is discouraged for general programming these days.

Note that one can do a tremendous amount just by using the STL. The STL was invented so that the programmer doesn't have to worry about all the low level stuff, just use the containers, classes and algorithms that are the STL.

frek wrote:
But when we have an infinite loop in a program and after a while the compiler gives us the error "stack overflow", does it mean "all" accessible memory (say, we have 3 GB out of 4 GB free RAM to use) is utilized and there is no more memory to allocate for the loop, therefore we get that error?


The amount of memory in the stack is fixed and usually much smaller that what is available in the heap. A stack overflow simply means one has exhausted that small resource. If one had written the code differently (as in using a STL container like std::vector or class like std::string) then the overflow wouldn't happen, provided one didn't have an infinite loop in which it will happen eventually when all the heap memory is exhausted too.

Note that one doesn't have to allocate every variable on the heap: for example small variables (POD types, not a container ) in a function that go out of scope would be fine on the stack. Read the pros and cons in the links above.

frek wrote:
So normally we have two types of data namely: binary data and literal strings data ending in a null character while binary data doesn't. Right?


With POD (Plain Old Data) types like char, int, long, short, float, double etc. The size is fixed. A C string is null terminated as is a string literal of type const char*. A C++ std::string has two things: a pointer to the data on the heap, and a size - this happens with the containers too. So when doing std::cout on a std::string, it knows where the end is.
Thanks, I got it too. :)
So normally we have two types of data namely: binary data and literal strings data ending in a null character while binary data doesn't. Right?

100% correct.

Text data is a subset of binary data. That isn't made clear enough in most books/classes/etc on the subject. Inside the box, its all just a bunch of bytes. The language lets you tell the computer what to do with those bytes, depending on which context you wanted. An easy example: strcpy and memcpy are nearly identical, but strcpy is looking for the zero terminal and memcpy is told a size. Of course you can use memcpy on a c-string; its one way to get a substring :)

char hw[] = "Hello world";
char h[100] = {0};
memcpy(h, hw, 5);
cout << h; //it already has a zero terminal due to {0} initialize.

I just treated string data as if binary data. Since you won't be using c-strings for long, its of little value, of course.

Last edited on
@frek

This whole guideline is awesome, here is the part about RAII:

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-resource
@TheIdeasMan
Thanks. I read prominent answers and comments there. They all, plus your answer, revolve around some advanced topics, especially when it is belonging to pointers, such as ownership, where it's single or shared? Also objects, ownership attached to them and pointers. If there's a simple example illustrating the topic basically, I will figure the rest of topic which is not murky. One thing I learnt is that, the use of unique_ptr is considerably higher than shared_ptr.

And on stack and heap, I got both but one issue is that why don't we store large amount of data on stack? Because it's small? Why? Because the contiguous memory on RAM is little and the majority of memory is sparsely populated on RAM? OK. One naive question, what percent of free (available) RAM is normally assigned for stack on Windows for instance? I guess less than 10 percent. What if we just restarted the machine and 75% of RAM is available to use? That is, we have much contiguous blocks of memory on RAM.

Thanks to you too, jonnin.
You are correct about the size of the stack - it is typically small compared to the heap. I think 8-32MB is typical. The stack size is a compiler option. I usually don't worry about it, even though I work with software that typically has dozens of functions/methods on the call stack. You just have to be aware of it in case you try to put something really big on the stack.

The problem gets much worse with multi-threaded programs because the threads share the stack. So each thread has a much smaller stack to work with.

Regarding pointers to the heap, it always boils down to a simple question: who owns the data? You have to be clear about this in your design. Document it in the code using comments. Then enforce the policy in the code. Usually a single object owns the memory - that's why unique_ptr is used more than shared_ptr, which expresses shared ownership of the memory.
@frek,

The stack is chosen by a combination of platform and compiler settings. In the era when 32 MBytes (not GBytes) of RAM was over $2,000, the stack would be merely a few KBytes (maybe 16K).

In some modern platforms the stack can be expanded at runtime.

The stack's primary function is to support function calls. The stack is how the CPU knows where to return when the call is finished. The second purpose is to store data local to the function, which gives that data very local scope to the function, supports recursion and automatically frees the memory upon function return. Without recursion the depth of the stack due to function calls should be quite low (usually less than 50 function calls in depth, often less than 20). Depending on the CPU's atomic size, that may require less than a few KBytes of RAM. It can be difficult to predict how much stack is really required to run an application, but since it is likely RAM is "wasted" that will never be used, it is wise not to rely on a large stack without will considered reasons.

@dhayden, I don't think it can be said that threads share the stack. The stack must be private, lest thread action on the stack would interfere between threads. There would be a strict limit on thread count if stack sharing were implemented.

Niccolo wrote:
There would be a strict limit on thread count if stack sharing were implemented.
of course each thread has its own, independent stack, but anecdotally, that may lead to a limit on thread count: when I worked a job that compiled 32-bit, we couldn't make more than roughly 500 threads because we've set stack size to 8 MB (why 8 MB stack means 500 limit is an exercise to the reader.. in practice it was even a bit smaller, but the fix was the same, drop to 4 MB)
Last edited on
Thanks to you all. I comprehended almost everything. To summarize, for the majority of programs we needn't pointers either smart or raw, because of stack. Since stack is small, so for big programs we ought to toy with pointers, and between them only smart ones and mostly unique_ptr. All right, right?

The only remained item is this snipped code:

1
2
3
4
5
6
7
8
9
10
 rule_of_five& operator=(const rule_of_five& other) // copy assignment
    {
         return *this = rule_of_five(other); // calls operator=(rule_of_five &&) below
    }
 
    rule_of_five& operator=(rule_of_five&& other) noexcept // move assignment
    {
        std::swap(cstring, other.cstring);
        return *this;
    }



I still can't comprehend what makes that statement to call the move assignment and not the copy assignment.


Another ambiguous matter for me is = default, as in:

1
2
3
4
5
6
7
8
9
class base_of_five_defaults
{
 public:
    base_of_five_defaults(const base_of_five_defaults&) = default;
    base_of_five_defaults(base_of_five_defaults&&) = default;
    base_of_five_defaults& operator=(const base_of_five_defaults&) = default;
    base_of_five_defaults& operator=(base_of_five_defaults&&) = default;
    virtual ~base_of_five_defaults() = default;
};


When we declare = default, we tell the compiler to create that specific method for us automatically. But without declaring any method, too, the compiler creates all methods of the class. so why do we need that = default yet?

Isn't there any reply for the above questions!? :(
I just want to reach a conclusion and keep that in mind for uses in C++.
Pages: 12