On fread() and arrays, big/little endian integers.

A function in my program is intended to read a file for integer values. The values may take up 1 to 4 bytes, as indicated by parameter. My issue is with fread().


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int read_val (FILE *handle, int len)
{
  unsigned char bytes_read[4];
  int *pi; /* pointer to integer, not PI.  */
  if (len == 1)
  {
    return fgetc (handle);
  }
  else if (len < 1 || len > 4)
  {
    return 0;
  }
  else
  {
    pi = bytes_read;
    *pi = 0;
    fread (bytes_read, sizeof (char), len, handle);
    return *pi;
  }
}


fread() here should put len bytes into bytes_read, should it not? I'm dealing with a 4-byte integer word size, so I would expect that when len is 2, that *pi should have the value I need times 65536 (being shifted two bytes left, no?), and I originally had

fread (bytes_read + (4 - len), sizeof (char), len, handle);

but that didn't work -- it works as printed above.

Could somebody explain this? (I'm compiling with mingw on winXP sp3. Any big/little endian differences between the file and my os wouldn't seem to make sense either; if there were a difference would it not screw up for len=4?)
Last edited on
Hi,

I don't know if this is any different with Win XP, but let me see if I can help...

 
fread (bytes_read, sizeof (char), len, handle);


reads in "len" characters and puts them in "bytes_read". If on disk you have:

<byte 1><byte 2><byte 3><byte 4>

then if len=2, only the first two bytes are read. byte 3 and byte 4 remain on disk and the next call to fread will start from byte 3. I think what you're thinking of is:

1
2
unsigned int bytes_read = 0;
fread (&bytes_read, sizeof (unsigned int), 1, handle);


In this case, bytes 1-4 are read. If you're only using two of those bytes, then the other two will be 0's. Just keep in mind a char is a byte and len means how many bytes you are gobbling from the file. Similarly, an unsigned int is 4 bytes in length (depends on architecture, of course).

Hope this is correct :-) and that it helps! BTW, if you are returning bytes_read to the calling function, you might want to dynamically allocate it with new or malloc...

Ray


It may help to know that specifically I'm reading RIFF WAVE files. I read here

https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

that RIFF would indicate little endian, RIFX would indicate big endian.

In the format chunk the data are written back to back in a predefined order. If you look at the file it would be like

<A Byte 0> <A Byte 1>|<B Byte 0> <B Byte 1>|<C Byte 0> <C Byte 1> <C Byte 2> <C Byte 3>|<D Byte 0> <D Byte 1> <D Byte 2> <D Byte 3>|<D Byte 0> <D Byte 1>|<D Byte 0> <D Byte 1>.

If I ran fread() on an int * 1 element, I'd suck up A and B data and it would be useless.


As I understand, in RAM,

 
fread (bytes_read, sizeof (char), len, handle);


would put into my array <A Byte 0> <A Byte 1> <0> <0>, and that

 
fread (bytes_read + (4 - len), sizeof (char), len, handle);


would put <0> <0> <A Byte 0> <A Byte 1>.

My integer pointer should be aimed at the 0th byte of my array, such that depending on the preference Windows has, *pi is either

(A0) + (256 * A1) + (256 * 256 * 0) + (256 * 256 *256 * 0)
or
(256 * 256 *256 * A0) + (256 * 256 * A1) + (256 * 0) + (0).

The value should be and really is 256*A0 + A1, so apparently I'm misunderstanding something.

Hi,

Honestly, I'm not much of an endian-person so I can't answer your question and hopefully someone else can. I was answering your question about fread.

If your data is in this format:

<A Byte 0> <A Byte 1>|<B Byte 0> <B Byte 1>|<C Byte 0> <C Byte 1> <C Byte 2> <C Byte 3>|<D Byte 0> <D Byte 1> <D Byte 2> <D Byte 3>|<D Byte 0> <D Byte 1>|<D Byte 0> <D Byte 1>

How about for A, you do this instead:

1
2
unsigned short int bytes_read = 0;
fread (bytes_read, sizeof (unsigned short int), 1, handle);


I think you're doing what you're doing because you want to use the same function for reading both 2-byte numbers and 4-byte numbers? You might save yourself a headache if you read in short ints for one and unsigned int for the other.

I think endian is important if the computer that generated the data is of different endian from the machine that's reading the data. If you have four bytes: <A, B, C, D>, the reversed endian is <D, C, B, A> (I believe).

Also, in your code:

 
fread (bytes_read + (4 - len), sizeof (char), len, handle);


If len=2 and bytes_read is an unsigned char array, you are writing what bytes 0 and 1 of A (from what I can tell), but:

* the first two bytes of bytes_read are uninitialized, unless you did it before
* you should pass the address of (bytes_read + (4 - len)) if you declared bytes_read as: unsigned int bytes_read.

I hope this helps or that someone else corrects me if I am wrong...

Ray

Thanks, I probably will end up calling different functions depending on whether len is 1,2 or 4, although I wanted a function that could handle all that and the wacky possibility of 3 bytes too lol. So anyway, I actually initialized it at *pi = 0, and your last comment illuminates the exact problem I have -- I have something that works (except when len < 4 what should be negative numbers appear, understandably, positive), and I can't explain why.

I'm going to flag as solved to reduce the headaches, but here is the befuddling output:

When using

fread (bytes_read + (4 - len), sizeof (char), len, handle);

  Format           65536
  Channels         131072
  Sample Rate      44100
  Byterate         176400
  Block Align      262141
  Bits Per Sample  1048576



fread (bytes_read, sizeof (char), len, handle);

  Format           1
  Channels         2
  Sample Rate      44100
  Byterate         176400
  Block Align      4
  Bits Per Sample  16


Only the outputs with 2 bytes are affected, and they are the bitreverse of what what intended. I checked a million times to make sure I wasn't reading the compilation of the wrong one, too. Maybe it'll click in my head some day.

Anyway thanks for your interest. Happy 'grammin!

Hi,

Glad I helped out in some way and sorry I can't help. Endian is something that if I need to know, I can dig up some web page about it -- but literally days later, I'll forget what I read.

I will say that I had a similar problem as you where I had to read in sequence of 2-bytes or 4-bytes and from a maintenance point of view, it would be nice to have one function to do it all. Kind of like what you're doing with a len parameter. Never figured out how; and in the end, I guess it wasn't worth it. The computer doesn't know the difference...only I do for wondering if I could have done it better! :-)

Ray

Topic archived. No new replies allowed.