Is data hiding effective in C++?

Hi all,

In my python class yesterday we were covering classes and how "__" is used before a variable to make it 'private' - in a VERY loose sense - and my professor went on a rant about how, like in python, data hiding does not completely work in C++. I was wondering if anyone could elaborate on what he meant and how one could access a private data member outside of the class. Would you get the address of the instantiation of the class and play around with the pointer? Even at that, isn't this what unique_ptrs are used to prevent?
If done correctly, data hiding in C++ is VERY effective.

my professor went on a rant about how ... data hiding does not completely work in C++

No clue what he was talking about. The only thing I can think of is blindly writing getters and setters for private members, rather writing an interface that changes the state of the class in known ways. If you're going to blindly write public getters and setters for private members, you might as well make the members public. Although there is a debugging advantage to using getters and setters instead of making the members public.

Would you get the address of the instantiation of the class and play around with the pointer?

That's quite unsafe and requires guilty knowledge of the layout of the class, which is a no-no.

isn't this what unique_ptrs are used to prevent?

No. unique_ptrs are used for something else.
http://www.cplusplus.com/reference/memory/unique_ptr/
As far as I know, C++ data hiding is definitely very effective. This is why I was pretty confused when he said it was more effective than python at it (which is not very effective) but not flawless. I was just wondering were the flaw was or if he was just spewing non-sense, although I doubt that is the cause.

When you say playing around with the pointer is 'unsafe,' what exactly do you mean? It's discouraged or not allowed?

As to the unique_ptrs, I thought it would throw an exception or something if another pointer tried to point to the same address? Admittedly, I have no experience with them so I'm not too sure and any explanations would be greatly appreciated.

I will mess around with some of this when I get home and can access a compiler and update if I answer my own question.
The only way you can get around the data hiding mechanism in C++ (correct me if I'm wrong) is by writing a public member function that returns a reference to a private data member.

Calling this member function and saving that reference would then provide an outside function with complete and unlimited access to the private data member of an object.

Of course, it's easy to prevent this by returning const references always from your member functions.

Joe
sparkprogrammer@gmail.com
Unless you use the pimpl idiom, data hiding is imperfect in C++ because the user of the class must be able to see the private members at compile time to correctly generate calls into the class and access public members, so changing private implementation details can require recompilation of user code. In other words, all parts of the class are part of the interface, even if the language does not allow some parts to be used at all.
Compare this to for example C#, where only the public members of the class are part of its interface, and so only when such members change is recompilation necessary.

When you say playing around with the pointer is 'unsafe,' what exactly do you mean? It's discouraged or not allowed?
An unsafe operation is one that may fail[1] and it's not possible to know ahead of time if it will. Often, there are procedures that the programmer is expected to follow that can ensure that an otherwise unsafe operation becomes safe. Some examples of unsafe operations:
* Dereferencing an invalid pointer.
* Returning a pointer or reference to a local.
* Accessing elements outside the bounds of an array.
* Simultaneously modifying a container in two separate threads without synchronization.
* Non-atomically incrementing a variable in two separate threads.

It's not necessarily true that accessing private members of an object by directly using the object's memory is unsafe. If you happen to know what kind of code the compiler generates for a given class, it's perfectly safe. The code will not mutate while the program is running (if it does, it's a different story).

As to the unique_ptrs, I thought it would throw an exception or something if another pointer tried to point to the same address?
This is not possible, because a pointer is just an integer whose value happens to be used by the program in a specific way (to access memory). An integer or even a float could by chance be assigned the same value as a real memory location, and there would be no way to prevent someone from using that value to access that location. Doing that would require scanning the entire address space and all CPU state every time the program state changes. This is of course impossible from a program that's running on the same processor.


[1] In this case, an operation is said to "fail" if it doesn't alter the program state in a manner that is consistent with the operation's postconditions. For example, ++x; is expected to cause x@after == x@before + 1 to be true. The operation has failed if it's not.
It may help to consider the question, "who are you hiding the data from?"

- You aren't hiding it from yourself. You know what you have written.

- You aren't hiding it from other programmers. All they have to do is to look at your source code and they will see what you have done.

- You are hiding it from the code that makes up the other classes/modules in your program. You are preventing that code from holding explicit dependencies on the hidden declarations. That means that, if the hidden declarations change, that code in the other modules doesn't need to change as well.

Without a mechanism for data hiding, such dependencies can be created, intentionally or accidentally, and then forgotten until a change to should-have-been-hidden-data breaks tons of outside code.

Is it perfect? No. Can you get around it? Sure. But the very fact that you can't defeat it easily or without engaging in arcane magic means that most people will take the easy route and restrict their code to work through the public interface. And that's enough to keep the inter-module dependencies manageable.
Thank you guys for the answers!

@helios
helios wrote:
..the user of the class must be able to see the private members at compile time...



My professor was showing us the name mangling that occurs in python with 'private' class members (Ex: "__<varName>" turns into "_C__<varName>") and said that C++ compilers do something similar; do you think this is what he was referring to regarding the safety of data hiding in C++?

Again, thank you all for the explanations!
Last edited on
No. Name mangling has nothing to do with data hiding. The compiler may not even need to output mangled names for private members into the final executable. Mangled names are only needed to lookup the locations of functions when calling functions across modules, but since private members can only be called from the same module (obviously), no other module can possibly use those names, so there's no need to create them. A C++ function can exist unmarked in the middle of an executable and other code within the same executable will be able to call it if it just knows its address.

When I say that the user of the class must be able to see private members, what I means this.
Imagine this class in module A:
1
2
3
4
5
class Foo{
    int bar;
public:
    int baz;
};
And imagine there's some other code in a separate module B that does this:
1
2
Foo foo;
foo.baz = 42;
If at some point the definition of Foo changes, for example:
1
2
3
4
5
6
class Foo{
    int bar;
    int snafu;
public:
    int baz;
};
Even though the change only affects private members, this change may break module B, because the relative location of Foo::baz within the object may have changed. If you try to run module B without recompiling it, the statement foo.baz = 42; is liable to do something unexpected, such as change the value of a private variable, which should never have happened.
OK, I think it's starting to make sense now.

In an oversimplified recap:
1
2
3
4
5
class Foo {
    int bar; // Stored at location x
public:
    int baz; // Stored at location x + 4
};


Then Foo gets updated to become:
1
2
3
4
5
6
class Foo {
    int bar; // Stored at location x
    int snafu; // Stored at location x + 4
public:
    int baz; // Stored at location x + 8
};


So when module B runs foo.baz = 42; without recompiling module B, it is still accessing the memory at location (x + 4), the old location of Foo::baz, which is actually now the location of Foo::snafu. I'm sure it's much more complicated than that in implementation but this is the gist of what you're saying correct?
Last edited on
Yes, basically. The compiler is of course not required to place members in memory in the order in which they were declared in the code. This could in some cases be used to prevent that breakage, which is why I qualified my statements with "may".
Things become more complicated when the class uses more complex features, such as virtual functions, multiple inheritance, etc. Usually breakage becomes inevitable at that point.
Last edited on
closed account (48T7M4Gy)
A common criticism of C++ and the 'impurity' of it's data hiding in relation to 'pure' languages is C++'s use of friend's which is seen as breaking down the principle. I think it is with Java or whatever, the advocates of that language use the criticism as a badge of superiority. The non-availability of C++ operator overloading is another rant source sometimes. C++ users see both as a strength not a weakness.
Thank all of you for the insight! Python is the second language I am formally learning and it's strangely helping me understand the subtleties I didn't really think about when initially learning C++.
Topic archived. No new replies allowed.