Polymorphism clarification.

Reading the tutorials on this website. Breeze through friendship and inheritance. Get to polymorphism. Read about halfway through the article. Get to my ABCs (abstract base classes, lol). At this point I realize I've encountered something multiple times that isn't elaborated upon:

(from the beginning of http://www.cplusplus.com/doc/tutorial/polymorphism/ )

1
2
3
4
5
6
7
8
9
10
11
int main () {
  CRectangle rect;
  CTriangle trgl;
  CPolygon * ppoly1 = ▭
  CPolygon * ppoly2 = &trgl;
  ppoly1->set_values (4,5);
  ppoly2->set_values (4,5);
  cout << rect.area() << endl;
  cout << trgl.area() << endl;
  return 0;
}


The use of the reference operator (&) in the two italic lines is what confuses me in particular

The lesson goes on to say:


In function main, we create two pointers that point to objects of class CPolygon (ppoly1 and ppoly2). Then we assign references to rect and trgl to these pointers, and because both are objects of classes derived from CPolygon, both are valid assignment operations.


I understand why they're valid assignments (they are objects of a class derived from the base class, which we use for the pointer type), but what I don't understand is why we're assigning the addresses of rect and trgl to the pointers, and not the objects themselves, like so:


1
2
  CPolygon * ppoly1 = rect;
  CPolygon * ppoly2 = trgl;




How is it at all relevant that I assign the addresses to the pointers? Isn't the address just an int value corresponding to some location in memory?

How does this achieve my being able to set values on rect and trgl with the two following statements?

1
2
  ppoly1->set_values (4,5);
  ppoly2->set_values (4,5);



I know the arrow operator (->) refers to the member(s) of the object pointed by ppoly1 and ppoly2, which in this case are supposedly rect and trgl's width and height members, but if the object being pointed is actually an address value of type int, and not the CRectangle and CTriangle objects themselves, then why is the behavior of the base class properly applied to the derived classes?
Last edited on
I don't understand is why we're assigning the addresses of rect and trgl to the pointers, and not the objects themselves

the operator & constructs a temporary nameless pointer, which is then used to initialize the pointers ppoly1 and ppoly2. Pointers are objects on their own right, separate from what they are pointing to. You could use references to bind to the objects directly:

1
2
3
4
  CPolygon * ppoly1 = &rect;
  CPolygon * ppoly2 = &trgl;
  ppoly1->set_values (4,5);
  ppoly2->set_values (4,5);
is functionally the same as
1
2
3
4
  CPolygon & rpoly1 = rect;
  CPolygon & rpoly2 = trgl;
  rpoly1.set_values (4,5);
  rpoly2.set_values (4,5);


f the object being pointed is actually an address value of type int, and not the CRectangle and CTriangle objects themselves

The object that's pointed to by ppoly1 is a CRectangle. The object pointed to by ppoly2 is a CTriangle. Don't get hung up on the concept of memory address, it is an implementation detail that doesn't participate in program semantics (it's also not of type int).
Last edited on
Just got home from work, lol. Was gone by the time the initial reply came in.

Thanks in advance for your speedy reply, and any further patience you deign to show me. I've got followup questions.

(Also sorry in advance for how long-winded this post turned out to be, I'd rather have too many questions than not enough)

the operator & constructs a temporary nameless pointer, which is then used to initialize the pointers ppoly1 and ppoly2.


Nameless? So rect and trgl aren't the names of the pointer objects? Then why their juxtaposition?

Does "initializing" in this case mean the same thing as explicitly assigning a value for the first time, to a newly created object? Or is there a subtler context I'm missing? Why I ask is, the documentation for pointers says that:

The address that locates a variable within memory is what we call a reference to that variable. This reference to a variable can be obtained by preceding the identifier of a variable with an ampersand sign (&), known as reference operator, and which can be literally translated as "address of". For example:


ted = &andy;


This would assign to ted the address of variable andy, since when preceding the name of the variable andy with the reference operator (&) we are no longer talking about the content of the variable itself, but about its reference (i.e., its address in memory).


So, for confirmation: Does this mean I'm initializing the ppoly pointer objects (as opposed to the ppoly Cpolygon objects, which you go on to tell me are separate objects) with the address of rect and trgl, and not any of its member values or return values?

Pointers are objects on their own right, separate from what they are pointing to. You could use references to bind to the objects directly:


1
2
3
4
5
 CPolygon * ppoly1 = &rect;
  CPolygon * ppoly2 = &trgl; //assigns the address of trgl to the pointer object, 
//not the CPolygon object.
  ppoly1->set_values (4,5);
  ppoly2->set_values (4,5);

is functionally the same as
1
2
3
4
5
  CPolygon & rpoly1 = rect;
  CPolygon & rpoly2 = trgl; //assigns trgl's contents to the reference object, 
//not the CPolygon object
  rpoly1.set_values (4,5);
  rpoly2.set_values (4,5);




Interesting. Tell me whether the comments I added are correct assumptions, and represent a clear understanding of what you're trying to tell me.

The object that's pointed to by ppoly1 (the pointer ppoly1, not the CPolygon, right?) is a CRectangle. The object pointed to by ppoly2 (the pointer ppoly2, not the CPolygon, right?) is a CTriangle.


Questions in bold.


Don't get hung up on the concept of memory address, it is an implementation detail that doesn't participate in program semantics


Not to be nitpicky, but I thought understanding low-level memory stuff was key to optimization and other things that would separate me as a professional (assuming I am to ever become one). I kinda want to get hung up on this stuff because it's important, and it pertains to general programming principles that go beyond just c++.

Good principles will more easily allow me to become multilingual in programming, when the time comes.

Plus, I don't like the idea of using things I don't fully understand. Stroustrup calls it "believing in magic." Others call it "Cargo cult programming." I desire to be seen as neither, here.

(it's also not of type int).


I'm gonna take your word for it, you've got a lot of experience. Certainly more than me. But I need one last bit of clarification:

I can't help but notice by virtue of this little program I wrote (after reading your reply) that address values have a lot of similar features to integers, namely the standard amount of space they take up in memory (4 bytes) and the fact that their return values are exclusively what would be called "integers" in mathematics (even if they aren't of that type technically in C++).

My example/test/thingy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int peggy = 3;
int hank = 6576;
int *bobby = &hank;
int *louanne = &peggy;

int main()
{
    cout << sizeof(peggy)<<'\n';
    cout << sizeof(hank)<<'\n';
    cout << sizeof(bobby)<<'\n';
    cout << sizeof(louanne)<<'\n';

    cout<< peggy <<'\n';
    cout<< hank <<'\n';
    cout<< bobby <<'\n';
    cout<< louanne <<'\n';

    return 0;
}


The first four statements of main(), when compiled and built, all return 4, meaning 4 bytes. Which is the standard size of an integer. This is obvious for the first two, but when I did the same thing for bobby and louanne, which represent peggy and hank's addressses, I get 4 also.

Could you tell me why this is?

The second four statements of main() return 3, 6576, and two 0x (hexadecimal) values representing peggy and hank's respective locations in memory. Aren't hexadecimals technically integers, even though they are in a different base (base 16)?
Last edited on
I'm not cubbi, but I'll try to answer these anyway:

NullInfinity wrote:
Nameless? So rect and trgl aren't the names of the pointer objects? Then why their juxtaposition?


I think cubbi was being a bit hypertechnical, and it might have thrown you off of what's conceptually happening (which is simple).

rect and trgl are objects.
&rect and &trgl are pointers to those objects.

ppoly1 and ppoly2 are not CPolygon objects, but rather are just pointers.
Once they have had &rect assigned to them, they point to the rect object.

Does "initializing" in this case mean the same thing as explicitly assigning a value for the first time, to a newly created object? Or is there a subtler context I'm missing?


There are subtleties in the difference between "initialization" and "construction" which still trip me up after well over a decade. In practice, they come up extremely rarely. Unless you're planning on writing a C++ compiler it likely will not matter to you what those differences are.

So yeah... "initialization" here just means assigning the value for the first time.

So, for confirmation: Does this mean I'm initializing the ppoly pointer objects (as opposed to the ppoly Cpolygon objects, which you go on to tell me are separate objects) with the address of rect and trgl, and not any of its member values or return values?


For starters... there IS NO ppoly CPolygon object. ppoly is a pointer, it is not an object. It only contains an address. At that address, there is the CRectangle object (which, since CRectangle is derived from CPolygon, it means there is also a CPolygon object at that address).

So to recap, there are only 2 objects here: rect and trgl. ppoly1 and ppoly2 are not objects... but they merely point to existing objects.

CPolygon * ppoly2 = &trgl; //assigns the address of trgl to the pointer object,


Your comment above is correct.


CPolygon & rpoly2 = trgl; //assigns trgl's contents to the reference object,


Your comment above is incorrect.

References are similar to pointers in that they are not objects, but simply refer to existing objects. This code is practically identical to the previous code with only some minor syntactic differences.

References are conceptually "aliases" or "alternative names" for an existing object. In this case... the object is trgl... and by creating a reference rpoly2, we are saying "rpoly2 is just another name for the trgl object".

The object that's pointed to by ppoly1 (the pointer ppoly1, not the CPolygon, right?) is a CRectangle. The object pointed to by ppoly2 (the pointer ppoly2, not the CPolygon, right?) is a CTriangle.

Questions in bold.


There are ONLY pointers. Remember that ppoly and ppoly2 do not have CPolygon objects themselves... they are merely pointing to rect and trgl's objects.


Not to be nitpicky, but I thought understanding low-level memory stuff was key to optimization and other things that would separate me as a professional (assuming I am to ever become one). I kinda want to get hung up on this stuff because it's important, and it pertains to general programming principles that go beyond just c++.


The concepts are more important than the details. Especially since the details can vary from system to system and compiler to compiler.

The first four statements of main(), when compiled and built, all return 4, meaning 4 bytes. Which is the standard size of an integer. This is obvious for the first two, but when I did the same thing for bobby and louanne, which represent peggy and hank's addressses, I get 4 also.

Could you tell me why this is?


You're looking at it at too low of a level, and its obstructing the conceptual view. This is why you need to take a step back and focus less on the details and more on the concepts. Just because sizeof(int) and sizeof(int*) are both 4 does not mean they are the same. They are conceptually very different. And the language also treats them very different.

Case in point: sizeof(float) will probably give you the same result as sizeof(int) (both 4).. but clearly floats and ints are different. So using sizeof to determine whether or not two types are the same (or even similar) is nonsense.

Computers are digital devices. This means EVERYTHING is represented as a series of 0s and 1s.

- A single 'bit' can have a value of either 0 or 1
- A byte consists of 8 bits
- Variables consist of one or more bytes

Even complex types like strings are just a bunch of bytes strung together.

So by that definition... anything could be called an "integer", but if you do that, the term loses all meaning because it literally defines any and all possible types on a digital machine.
There are subtleties in the difference between "initialization" and "construction" which still trip me up after well over a decade. In practice, they come up extremely rarely. Unless you're planning on writing a C++ compiler it likely will not matter to you what those differences are.


And that, right there, really put things into perspective for me. I'm wading into this stuff, not diving, and I didn't even know it.

Thank you for your wisdom and your time.

Since the rest of your post does an excellent job at clearing up my follow-up questions, I won't quote it. Just note that you were successful at penetrating this thick skull of mine.

Bless you, bliss you.
:)
Cubbi wrote:
the operator & constructs a temporary nameless pointer,
NullInfinity wrote:
Nameless? So rect and trgl aren't the names of the pointer objects?
rect is the name of a CRectable object. &rect is an expression (executable code), and it tells the computer to build a new, nameless pointer to rect, and to return it to the caller of this expression. The caller, in that case was the declaration statement CPolygon * ppoly1 = ...;, which used the pointer returned from &rect: it copied its contents into the named pointer ppoly1 which it constructed.

yes I am being fairly technical (could be more, I didn't bring up value categories even once), but you did say "I don't like the idea of using things I don't fully understand".

In machine code, since you seem to want to learn that as well &rect may compile to the commands that load the address of rect into a CPU register and CPolygon* ppoly1=... may compile to a stack frame adjustment that allocates memory for the pointer ppoly1 and a store from that CPU register into the the memory location of ppoly1. Or it may all compile to a one-step operation. Or it may be optimized out completely, since nothing in your program depends on runtime information, and your compiled main() will have only the compiled equivalent of cout << 20 << endl; (that's what it did for me). Whatever happens, the result will be as if the program executed every step how it was defined by the language spec. And the language spec does not deal with memory addresses.

NullInfinity wrote:
Does "initializing" in this case mean the same thing as explicitly assigning a value for the first time, to a newly created object?

initializing means providing the initial value.
assignment always *replaces* a value that is already present in the object.

NullInfinity wrote:
the documentation for pointers says that:
The address that locates a variable within memory is what we call a reference to that variable. This reference to a variable can be obtained by preceding the identifier of a variable with an ampersand sign (&), known as reference operator, and which can be literally translated as "address of".

This is probably the source of your confusion; the terminology is incorrect: the word 'reference' in C++ means something different (it's a special non-object type used to introduce named aliases for already-existing objects). Preceding an identifier with an ampersand (the address-of operator, it's not known as 'reference operator') creates a pointer.
Last edited on
Topic archived. No new replies allowed.