Bitstacks Quest

I'm aiming to write some unsigned numbers to a network packet.
The shortest way I calculate to write this information is what I call "bitstacks". This may be a term already coined.
Basically, I have a variable size vector of unsigned integers. Each time an (unreliable) network packet of a type is received, it contains one of those unsigned integers. That unsigned integer is deleted from the vector.
After all the unreliable packets have been received, there's a high chance there are some numbers left - a different message type is formed, and sent with all the numbers. I need to copy all the numbers in the vector to a memory address, copying only the bits that can be different per number.
I calculate this by finding the lowest power of 2 from the maximum limit of the range (aka the highest unsigned integer expected).
For example, if the packets are limited from 0-NumPackets...
unsigned char NearestSize = (unsigned char)ceil(log((double)NumPackets)/(8*log(2.0)));
...where 8 is number of bits in a byte, and 2.0 is because each bit is worth the next power of 2 (casted to a double for non-ambiguous log()).

if the maximum number is 31, the bits can reach 00011111. So there's always an excess 3 bits. By shifting the packet numbers to "fill the gap", I save 3 bits per number, and being there is potentially thousands of numbers being sent back, this'll quickly multiply. In fact, in 3 numbers, you've already saved an entire byte. That's 66% compression :)


So I know how many bits to write, where to read it from. That's all I need to read the variable properly; but how do I write it? The best approximation is annoyingly close to what I want, but still fails to meet the mark.

See the full draft code here:
http://paste.pocoo.org/show/0TnEFukAsPJntrZhVADf/

Sorry about the format of the code. I appreciate this is a challenge even for seasoned C++ers.
Thanks in advance, SC.





Last edited on
Is this question too hard to understand, or too hard to answer? :)
See here for a fast and reliable log2 for integers:
http://graphics.stanford.edu/~seander/bithacks.html
("Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup")
Applied to the largest possible value it gives you the number of required bits per value.

You can then append each value to a temporary buffer and write the output byte for byte.
Something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
const uint bitsPerValue=log2(maxValue);
uint64_t bitBuffer=0; //a 32 bit value works too if bitsPerValue<=25 is guaranteed
uint bitsUsed=0;
foreach(val,values)
{
  bitBuffer|=val<<bitsUsed;
  bitsUsed+=bitsPerValue;
  while (bitsUsed>=8)
  {    
    writeByte(bitBuffer&255);
    bitBuffer>>=8;
    bitsUsed-=8;
  }
}


Do not forget to flush the remaining bits when you're done (if any).
Last edited on
http://paste.pocoo.org/show/8KveVeqAv5fcRjNut3NX/
The code outputs

Outputting current char_array:
01000010000100001000010000100001000100001000010000100001000010000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000
result: 0
Converting back: Cleaning variables.
#0: 134217728   00001000000000000000000000000000
#1: 138412032   00001000010000000000000000000000
#2: 134217728   00001000000000000000000000000000
#3: 138412032   00001000010000000000000000000000
#4: 134217728   00001000000000000000000000000000
#5: 134217728   00001000000000000000000000000000
#6: 138412032   00001000010000000000000000000000
#7: 134217728   00001000000000000000000000000000
#8: 134217728   00001000000000000000000000000000
#9: 138412032   00001000010000000000000000000000
#10: 134217728  00001000000000000000000000000000
#11: 138412032  00001000010000000000000000000000
#12: 134217728  00001000000000000000000000000000
#13: 0  00000000000000000000000000000000

So out of the 14 ints that should be written, the 1st does not appear, and the 2nd & 9th is mis-aligned.
Still, it's closer, and a lot cleaner. I appreciate the effort :)
Last edited on
You access the destination buffer via a pointer to unsigned int, so you basically set the surrounding 24 bits to zero every time you write a byte.
You also don't flush at the end, thus losing the last few bits:
if (bitsUsed>0)writeByte(bitBuffer);

The main problem is with the back conversion/output routine. I haven't looked at what exactly you're doing there, but it seems overly complicated and not correct. Chances are that something goes wrong during all the casting - no part of this problem actually requires any casts.

Basically you just have to do the reverse to read the packed values:
1
2
3
4
5
6
7
8
9
while (bitsUsed<bitsToRead)
{
  bitBuffer|=readByte()<<bitsUsed;
  bitsUsed+=8;
}
uint val=bitBuffer&((1<<bitsToRead)-1);
doSomethingWithValue(val);
bitBuffer>>=bitsToRead;
bitsUsed-=bitsToRead;
This seems a lot of effort to go to in order to get a 1/3 compression.

If it is *really* worth it to you, you could implement a bitset and manually set the bits so as to record your numbers in 5 bit chunks. (You can create a dynamic bitset yourself or use the one provided in the boost library). That would seem better than trying to set them with pointers (which you could do - the problem Athar pointed out could be worked around with a lot of bitshifting and/or bitwise operations, but I would not advise it!).

But the memory efficiency is going to come at a huge speed efficiency hit. I can't see many uses for such a strategy.
The integer sizes can vary from 1-32 bits, chosen at runtime, so my idea was to write unsigned integers, as 32 bits is the maximum size of the maximum range. You've attempted it a different way by writing it a byte at a time.

I changed the unsigned int * pointer to unsigned char * because of the method change, and also added your suggested if() code inside (and also tried outside, no difference in the output) the for() loop.
http://paste.pocoo.org/show/in4zX4ubUbHIh6YJ7eGN/

The result has a single bit changed to 1, seen several bytes after the rest of the bitstack.
exiledAussie, I would be interested in seeing how the Boost library would handle this, I did a bit of reading up but I'm not entirely certain how it would be used.
Last edited on
also added your suggested if() code inside

That's incorrect, but that should be rather obvious if you understood what the code is doing.
Your output function is still broken as well.

Here are the two code snippets put together in a full example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
template <class I> I roundUpToBoundary(I val,I boundary)
{
  return (val+(boundary-1))/boundary*boundary;
} 

template <class T> void printBits(const T& obj)
{
  const byte* ptr=reinterpret_cast<const byte*>(&obj);
  for (int i=0;i<sizeof(obj);i++)
  {
    for (int b=7;b>=0;b--)std::cout << (*ptr&(1<<b)? '1' : '0');
    std::cout << ' ';
  }
}

typedef unsigned char byte;

int main()
{
  using std::vector;

  const int valueCount=30;
  const int bitsPerValue=5;

  vector<uint> values;
  for (int i=0;i<valueCount;i++)values.push_back(i);


  vector<byte> packed(roundUpToBoundary<size_t>(values.size()*bitsPerValue,8)/8);
  byte* outp=&packed[0];
  uint64_t bitBuffer=0;
  uint bitsUsed=0;
  foreach(val,values)
  {
    bitBuffer|=val<<bitsUsed;
    bitsUsed+=bitsPerValue;
    while (bitsUsed>=8)
    {
      *outp++=bitBuffer&255;
      bitBuffer>>=8;
      bitsUsed-=8;
    }
  }
  if (bitsUsed>0)*outp++=bitBuffer;

  std::cout << "Buffer contents: ";
  foreach(b,packed)printBits(b);
  std::cout << std::endl;

  byte* inp=&packed[0];
  bitBuffer=0;
  bitsUsed=0;
  for (int i=0;i<valueCount;i++)
  {
    while (bitsUsed<bitsPerValue)
    {
      bitBuffer|=*inp++ << bitsUsed;
      bitsUsed+=8;
    }
    uint val=bitBuffer&((1<<bitsPerValue)-1);
    std::cout << val << std::endl;
    bitBuffer>>=bitsPerValue;
    bitsUsed-=bitsPerValue;
  }
}


Now all you need to do is to put that into a stream class and you're done.

And if you plan on using that log2 function, I'd have a look at your variant of it again if I were you.
Last edited on
http://paste.pocoo.org/show/CjbrFDsidJcLOp9Ylzt1/
Your code reads what it writes correctly, but the output of bits is somewhat iffy.

For debug purposes, I set all the values to 1 in the vector.
00100001 10000100

I'm expecting:
00001000 01000010

I note that when we read the bits, we use decrementing of i instead of incrementing.

I'm not sure what you mean by "variant" of log2(), I copied it from the page you recommended.
EDIT: Ah I see, using v when I should be using the parameter :P
Last edited on
I'm expecting:
00001000 01000010

The first value is in the lower bits of the first byte, followed by three more bits of the next value.
This is the first value:
00100001 10000100
And this is the second one:
00100001 10000100
And the third:
00100001 10000100

I note that when we read the bits, we use decrementing of i instead of incrementing.

What do you mean?
Last edited on
Interesting, that explains why I couldn't make the formula myself, if it's supposed to display in that fashion.

for (int b = 7; b >= 0; --b)
is what I meant by decrementing :)
Ah yeah, that is because the least significant digits come last in our numeral system.
Topic archived. No new replies allowed.