Write cross-platform file IO.

Hi.

I discussed a topic about how to write cross-platform file IO code with a member named Disch for about a year ago. Since I am a beginner I am not sure if the "rules" for doing this has changed or not within C++.

He taught me that differenct CPU:s use different endianness when reading and writing to files. However, why can't the C++ standard file IO functions detect what endianness should be used on the current machine that is running the program? Why didn't the developers who created the standard library develop file IO functions that are cross-platform from the beginning? Have the rules changed since last year?

What I learn is that if you need to store data in files that will be read and written to on different machines, you have to define in the program what endianness should be used. For example, if I needed to store 4 bytes, I had to do this manually with my own functions and define in those which endianness is used.

Thanks for your help!
Best regards, Zerpent.
He taught me that differenct CPU:s use different endianness when reading and writing to files.
This isn't actually true, and I'm sure that's not what Disch said.
Endianness is a property of multi-byte values stored in memory*. If you try to read these values normally from within the program, you'll find that you can do so transparently, because the CPU knows about the endianness is used to put those values there.
The problem comes when you try something like this:
1
2
unsigned i = 42;
file.write((const char *)&i, sizeof(i));
All a file IO operation sees is an array of bytes, since that's what a file is. The above has simply copied the literal contents of memory to a file. If you try to read this file back on the same computer by copying its literal contents to memory, it'll work just fine. If you try it on a computer with a different endianness, you might get i == 704643072.

However, why can't the C++ standard file IO functions detect what endianness should be used on the current machine that is running the program? Why didn't the developers who created the standard library develop file IO functions that are cross-platform from the beginning?
The file IO functions are endianness-independent, since they only deal with byte buffers. If your question is why there isn't a way to portably serialize data, the answer is likely that you have to draw the line at some point about what to include in your standard library and what not to include in it.
IMO, minimalism isn't a bad thing.

What I learn is that if you need to store data in files that will be read and written to on different machines, you have to define in the program what endianness should be used. For example, if I needed to store 4 bytes, I had to do this manually with my own functions and define in those which endianness is used.
Yup.



* Within a more general context, a file can be considered memory and have endianness. Bare bones C/++ understand files and memory as fundamentally different, so for the purposes of this discussion, files are simply arrays of bytes. But do note that a more abstract interface could in principle deal with files and/or memory transparently.
Ok thank you.
> Why didn't the developers who created the standard library develop file IO functions
> that are cross-platform from the beginning?

They (the designers of both C and C++ standard library i/o facilities) did that right from the beginning - by emphasizing the importance of being textual. The primary model for output in C++ is to convert the internal representation of an object into a sequence of human readable characters. Correspondingly, input is the conversion of a sequence of human readable characters into the internal representation of an object.

HTTP, PNG, XML, SOAP, RESTful web services ... all provide evidence of the validity of that design decision.


> What I learn is that if you need to store data in files that will be read and written to on different machines,
> you have to define in the program what endianness should be used.

No. Most often, what you need to do is store the data in (ideally a self-describing) textual format.

ESR on the importance of being textual (emphasis added):
Interoperability, transparency, extensibility, and storage or transaction economy: these are the important themes in designing file formats and application protocols. Interoperability and transparency demand that we focus such designs on clean data representations, rather than putting convenience of implementation or highest possible performance first. Extensibility also favors textual protocols, since binary ones are often harder to extend or subset cleanly. Transaction economy sometimes pushes in the opposite direction — but we shall see that putting that criterion first is a form of premature optimization that it is often wise to resist. ...
...
When you feel the urge to design a complex binary file format, or a complex binary application protocol, it is generally wise to lie down until the feeling passes. ...
...
Designing a textual protocol tends to future-proof your system. ...


Perhaps you should read the whole chapter http://www.faqs.org/docs/artu/textualitychapter.html
Topic archived. No new replies allowed.