outputting in binary

I' putting together a program that compresses the frequencies of a Huffman code into binary but am unsure on how exactly to do this. I have an array of Nodes as defined:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Node
{
  public:
    Node(unsigned char let, long unsigned int freq, bool tree, bool Char, Node *root, Node *l, Node *r);  
    Node();
    unsigned char letter;
    long unsigned int weight;
    bool inTree;
    bool isChar;
    Node *parent;
    Node *left;
    Node *right;
    char *bin;
};


We are supposed to output the frequencies as 4-byte unsigned ints. We have an example of a compressed file, and it is unreadable by humans, which obviously means that I'm not just looking for how to make a number like 5 into 101. We take information in through cin and are supposed to output it through cout. Thanks in advance!
Your professor wants you to do binary I/O on the standard streams?!

First, you must make sure your standard streams are actually doing binary I/O -- which on Windows they are not by default:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#if defined(__WIN32__) || defined(_WIN32_) || defined(__WIN32) || defined(_WIN32) || defined(WIN32) || defined(__WINDOWS__) || defined(__TOS_WIN__)
  #ifdef __cplusplus
    #include <cstdio>
  #else
    #include <stdio.h>
  #endif
  #include <fcntl.h>
  void std_binary_io()
    {
    _setmode( _fileno( stdin  ), _O_BINARY );
    _setmode( _fileno( stdout ), _O_BINARY );
    _setmode( _fileno( stderr ), _O_BINARY );
    #ifdef __cplusplus
      cin.sync_with_stdio();
    #endif
    }
#else
  void std_binary_io() { }
#endif 


Next, you need a way of properly packing and unpacking multi-byte values to and from file. You must choose whether or not you want your file to store things is Big-Endian or Little-Endian (or some other oddball Mixed-Endian) format.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>

namespace little_endian_io
  {
  template <typename Word>
  std::ostream& write_word( std::ostream& outs, Word value )
    {
    for (unsigned size = sizeof( Word ); size; --size, value >>= 8)
      outs.put( static_cast <char> (value & 0xFF) );
    return outs;
    }

  template <typename Word>
  std::istream& read_word( std::istream& ins, Word& value )
    {
    for (unsigned size = 0, value = 0; size < sizeof( Word ); ++size)
      value |= ins.get() << (8 * size);
    return ins;
    }
  }

namespace big_endian_io
  {
  template <typename Word>
  std::ostream& write_word( std::ostream& outs, Word value )
    {
    for (unsigned size = sizeof( Word ); size; )
      outs.put( static_cast <char> ( (value >> (8 * --size)) & 0xFF );
    return outs;
    }

  template <typename Word>
  std::istream& read_word( std::istream& ins, Word& value )
    {
    for (unsigned size = sizeof( Word ), value = 0; size; --size)
      value = (value << 8) | ins.get();
    return ins;
    }
  }


Finally, you need to review your professor's documentation on the file format of your compressed file, and make sure you read and write it exactly.


Oh, before I forget, here is how to do it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <cstdint>

...

using namespace std;
using namespace big_endian_io;  // If your file stores multi-byte words in big endian

int main()
  {
  std_binary_io();  // switch to binary I/O on the standard streams

  ...

  uint32_t value;  // size-specific types
  read_word( cin, value );

  value ~= value;  // do something to value... (you'd actually want to compress or decompress the stream here...)

  write_word( cout, value );
  }


Hope this helps.
Last edited on
If the binary file is Huffman coded data then it's probably best to read the data section a single byte at a time, so endianness will not be a problem. Even the header section will be a byte (character) followed by it's encoding, which will also fit into a byte. So the whole thing may be readable a byte at a time.

What is the format of the file?

What do you mean by compressing the freqs into binary?


(BTW, the extra long line above can be split across lines like so:
1
2
3
#if defined(__WIN32__) || defined(_WIN32_) || defined(__WIN32)     \
 || defined(_WIN32)    || defined(WIN32)   || defined(__WINDOWS__) \
 || defined(__TOS_WIN__) 
)
Last edited on
> it's probably best to read the data section a single byte at a time

Agreed. I was going to comment on this, but the OP specifically stated that:

> We are supposed to output the frequencies as 4-byte unsigned ints

Knowing nothing more about the assignment than what he has told us, I suspect that his professor isn't too worried about bit-packing the data just yet... only in outputting the huffman tree...


I also always make sure that if I am programming on Windows that __WIN32__ is #defined... but I haven't any clue what compiler he's using and that line covers most of them...

LOL. ;-)
Last edited on
Topic archived. No new replies allowed.