Can I avoid reinterpret_cast in this example?

I'm trying to avoid adopting bad habits as I learn to program in C++. I've read that reinterpret_cast is best avoided because it turns one type directly into another, which may cause unexpected results, especially when compiling and running the program on a system with a different architecture that might internally store the affected types differently. Maybe there are other reasons too, but that seemed to be the danger most often called out in the references I found.

In any case, I'm using it right now in the following situation. I'd like to know if there is a way I could do this that doesn't involve using reinterpret_cast, and also whether it is known to be dangerous in this usage case.

The translate function works with unsigned char *; it treats each character of a buffer that is passed to it as an index in the range of 0 to 255 to access a 256 byte array of translated characters.

As far as I know, I have to define the buffer as unsigned char * and reinterpret_cast when I use it in the fstream read statement, or define it as char * type and reinterpret_cast to unsigned char * when I use it in the translate function.

This is a simplified code fragment for illustrative purposes, not a complete working example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void translate(size_t, unsigned char *, unsigned char *);

int main() {
  fstream fs;                     // an fstream
  unsigned char *fsbuf;           // pointer to char buffer
  unsigned char *table;           // pointer to translation table
  size_t buflen;                  // length of data to translate

  ... (code here to open file in binary mode)

  fs.read(reinterpret_cast<char*>(fsbuf), 4096);
  buflen = fs.gcount();          
  
  translate(buflen, fsbuf, table);

  ... (more code)
}

void translate(size_t len, unsigned char *s, unsigned char *trt)
{ // translate text of length len in buffer s using table trt.
  register unsigned char *buf;
  register int j;
  for (buf=s, j=len; j; j--, buf++)
    *buf = trt[*buf];
  return;
}


Since a value of type char and one of type unsigned char both occupy one byte, does this use of reinterpret_cast pose a problem? What if a char and an unsigned char are not the same length on some systems?

Is there some way I can avoid the reinterpret_cast in my example?
Last edited on
closed account (Dy7SLyTq)
a) i dont think reintrepet cast is even needed in this case. it should still work fine.
b) i wouldnt use register.
c) i think the standard says they occupy the same amount of space, but if they dont, most modern systems usually have them be the same size
Thank you DTSCode.

a) I'm not sure what the standard says about the implementation of char, whether it may be either signed or unsigned, but my translate function requires unsigned char because it needs the value to have a range of 0 to 255. If it were passed a buffer of type char and char was implemented as a signed one byte integer, it would fail whenever a byte with an absolute value greater than 127 were passed to it.

b) Why not use register? Shouldn't it be faster if the pointer and the loop index are kept in a register? Or is it preferred to let the compiler optimize as it sees fit?

c) Ok. I'm only compiling for X86-64 architecture at the moment, but I'd also prefer to avoid dependence on the architecture.
closed account (Dy7SLyTq)
i think the compiler will just convert it between unsigned and signed for you so you can have:
1
2
unsigned char x = 'a';
signed char = x;

and it will work fine.
and yeah its generally best to let the compiler optomize it for you because they are generally desinged by super smart people who understand memory really well
1. Just use the reinterpret_cast. That's what it's for. Both signed and unsigned char are the same size. The reinterpret_cast does not "convert" the underlying byte. It just causes a reinterpretation by the compiler of what that byte means.
If you have a byte of 0xFF then as a unsigned char it is 255, but as a signed char it is -1.

2. Use of register. The section of compiler code that handles your request to keep a variable in a register, assuming it's contained in a function called processRequestToKeepVariableInRegister() would probably look something like this:

1
2
3
4
5
6
7
void processRequestToKeepVariableInRegister()
{
   // Ha! Ha! Ha! This poor programmer thinks that we, the compiler writers, actually
   // care about his opinion as to which variables should be kept in a register. Ha! Ha! Ha!

   return;
}


Compiler writers long ago figured out that their fellow programmers were clueless about which variables belonged in registers and so they will ignore your request.


Last edited on
@DTScode, if I don't use reinterpret_cast, I get compile errors because the char* and unsigned char* are different types, and they don't match. I have to use reinterpret_cast to satisfy the compiler.

If you meant that I could just define the buffer as type char and pass a char* to the function, I tried that and it didn't work.

To test it, I changed the buffer type from unsigned char to char, and the pointer type from unsigned char* to char*. That's the second parameter in the translate function up above. It compiles fine, but when it executes, all of the characters that have a value between 0x00 and 0x7F get translated properly, while characters with a value between 0x80 and 0xFF retain their original value, untranslated. That was surprising. I expected they would have been translated to garbage.

@Alrededor, thank you. I removed the register keywords. I see that it was treated only as a suggestion to the compiler, and in C++11 it has been deprecated. And thanks for confirming that reinterpret_cast is OK in this case.

Last edited on
closed account (Dy7SLyTq)
sorry my mistake. *'s are different than built in types. so to answer your question... it seems to be unavoidable to use a cast, but in this case it wont hurt too much i think
Topic archived. No new replies allowed.