Can anyone suggest an algorithm to reduce memory required to store uint16_t data?

I am working on a program that sends data via mmap from one process to another. A buffer is used which holds sequential pairs of uint16_t values. Each pair contains x and y coordinates of an array of h264 macroblocks in a frame and will have values from 0 to 1920 for x and 0 to 1080 for y.

My thought was to bit shift the uint16_t to the Right by 4 in order to divide by 16. If the value is 16 or below, then change the sign. On the other end, decompress by bit shifting to the Left by 4 and if the value is negative, just multiply by -1. The values that are not multiples of 16 will obviously not decompress to the original value.

Is there a better algorithm or library that is fast and provide greater precision?

How do I decomress the last two values to their original and still have the type as uint16_t?

Thanks,
Chris

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/#include <array>
#include <vector>
#include <iostream>
#include <string>
#include <bitset>
using namespace std;

template <typename ValueType>
inline void print( const char* name, ValueType value )
  {
  cout << name << " = "
       << bitset <8> ( value ).to_string()
       << " = "
       << (int)value << endl;
  }


int main()
{
  
  std::array<uint16_t,5> intarray{1920,55,140,16,4};
  std::vector<int8_t> resarray;
  
 std::cout << "Original" << std::endl; 
 for(const auto& s: intarray) {
    print( "usigned uint16_t original", s );
 } 

 
 std::cout << "Compressing" << std::endl;
 for(const auto& s: intarray) {
  int8_t res=0;
  res=s>>4;
  if (res <=1)
    res=-s;
  print( "signed int8_t  >> 4", res );
  resarray.push_back(res);
 } 
 
 
 std::cout << "Decompressing" << std::endl;
 for(const auto& s: resarray) {
 
  uint16_t res=0;
  res= s<<4;
  if (res <0)
    res=s*-1;
  print( "unsigned uint16_t  << 4", res );

 } 
 
}


Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Original
usigned uint16_t original = 10000000 = 1920
usigned uint16_t original = 00110111 = 55
usigned uint16_t original = 10001100 = 140
usigned uint16_t original = 00010000 = 16
usigned uint16_t original = 00000100 = 4
Compressing
signed int8_t  >> 4 = 01111000 = 120
signed int8_t  >> 4 = 00000011 = 3
signed int8_t  >> 4 = 00001000 = 8
signed int8_t  >> 4 = 11110000 = -16
signed int8_t  >> 4 = 11111100 = -4
Decompressing
unsigned uint16_t  << 4 = 10000000 = 1920
unsigned uint16_t  << 4 = 00110000 = 48
unsigned uint16_t  << 4 = 10000000 = 128
unsigned uint16_t  << 4 = 00000000 = 65280
unsigned uint16_t  << 4 = 11000000 = 65472





Last edited on
I'm probably missing something but I don't understand what the compression is for. 1080 and 1920 is way below the upper limit of uint16_t so a pair of uint16_t should be able to store the x, y coordinates just fine.
Sorry, should have been more clear. A buffer is used to transfer data between two processes via mmap. There could be up to 4 pairs of these processes communicating this way. Right now a pair of coordinates takes up 4 bytes ( 2 uint16_t). If I can use a pair of uint8_t instead then I can halve the size of the buffer. I am working on the raspberry pi and don't have a lot of system RAM to work with.

Thanks,
Chris

Are you willing to lose the last 3 bits? Then you can simply shift it by 3 to the right for "compressing" and shift it to the left by 3 for "uncompressing". The low 3 bits of the uncompressed values be zeroes.

The whole "negating" idea simply doesn't work.

If you can live with saving 25% instead of 50%, that would be pretty easy. Just put each value into 12 bits. You can pack 8 of those into 12 bytes (instead of 16 bytes for uint16_t). The values can actually fit into 11 bits (which, if packed, could save 30%), but that's quite a bit more work to compress/uncompress for only a little more savings.
Last edited on
If all values (0 ≤ x ≤ 1920 ; 0 ≤ y ≤ 1080) are possible, and you don't want to lose precision, then you need to use at least 21 bits for each x y coordinate pair.
Last edited on
you can run a zip compression over a block of them and reduce it by 50 to 90% or so. The standard gz library, maybe? This costs time to pack and unpack, of course, and again to bundle a group, so it may not be ideal for a stream or real time use? Compression will handle the extra bits so you don't have to squeeze it to 21 if you go this route.

are the numbers correlated at all? you may be able to exploit that (eg if you can compute one from the other). Do you know anything about the numbers (statistics, etc?).



Using a memory map is really fast. Are you sure that this is the bottleneck? I would think that you could transfer the data faster than you could do anything with it.
I am working on the raspberry pi and don't have a lot of system RAM to work with.

If the data won't fit in the memory? This device probably uses a flash disk, not a rotating hard drive, right? Its probably almost as fast as ram to write to a file, if that is the case... might be all you need is a disk file...?

@Peter87, How do you pack 1919,1079 into 21 bits? It seems it might be possible, but I'm not sure how. 10 bits can hold a max value of 1023 (56 less than we need for 1079), but 11 can hold 2047 (128 more than we need for 1919) so I suppose there's a little wiggle room there, but how to use it?

(I'm assuming the OP's mistaken that he needs to hold up to 1920 and 1080, but if that's the case it would obviously be the same problem.)
Last edited on
How do you pack 1919,1079 into 21 bits?
encode(x, y) = x + y * 1920
decode(n) = (n % 1920, n / 1920)

(1920*1080 / 2^21 ~= 0.99)
There you go. So at 21 bits per pair you can compress to about 66%. But packing all those 21-bit units would be kind of a drag. How best to do that?
Yes, it kind of sucks.
OP has said losing the 4 least significant bits of precision of each dimension is acceptable, and both 1920 and 1080 are divisible by 8. 1920 * 1080 / 64 < 2^15. That's more easily packable.

encode(x, y) = x / 8 + y * 1920 / 8
decode(n) = (n % (1920 / 8) * 8, n / (1920 / 8) * 8)
Thanks guys.
What am I doing wrong here. It seems the transitional variable needs to be saved as 32 bit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <array>
#include <vector>
#include <iostream>
#include <string>
#include <bitset>
using namespace std;

template <typename ValueType>
inline void print( const char* name, ValueType value )
  {
  cout << name << " = "
       << bitset <32> ( value ).to_string()
       << " = "
       << (int)value << endl;
  }


int main()
{
  
  std::array<uint16_t,5> intarray{1920,54,140,16,4};
  std::vector<uint32_t> encodearray;
  
 std::cout << "Original" << std::endl; 
 for(const auto& s: intarray) {
    print( "usigned uint16_t original", s );
 } 

 
 std::cout << "Compressing" << std::endl;
 for(const auto& s: intarray) {
 
  uint32_t encode = ((s/8)  + ((1080/8) * 1920));
  print( "encode ", encode );
  encodearray.push_back(encode);
 } 
 
 
 std::cout << "Decompressing" << std::endl;
 for(const auto& s: encodearray) {
 
  uint16_t res=0;
  
  res=(s % 1920)*8;
  print( "unsigned uint16_t", res );

 } 
 
}


Output :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Original
usigned uint16_t original = 00000000000000000000011110000000 = 1920
usigned uint16_t original = 00000000000000000000000000110110 = 54
usigned uint16_t original = 00000000000000000000000010001100 = 140
usigned uint16_t original = 00000000000000000000000000010000 = 16
usigned uint16_t original = 00000000000000000000000000000100 = 4
Compressing
encode  = 00000000000000111111010101110000 = 259440
encode  = 00000000000000111111010010000110 = 259206
encode  = 00000000000000111111010010010001 = 259217
encode  = 00000000000000111111010010000010 = 259202
encode  = 00000000000000111111010010000000 = 259200
Decompressing
unsigned uint16_t = 00000000000000000000011110000000 = 1920
unsigned uint16_t = 00000000000000000000000000110000 = 48
unsigned uint16_t = 00000000000000000000000010001000 = 136
unsigned uint16_t = 00000000000000000000000000010000 = 16
unsigned uint16_t = 00000000000000000000000000000000 = 0
 


Thanks,
Chris
Your code doesn't match what you said earlier.
I am working on a program that sends data via mmap from one process to another. A buffer is used which holds sequential pairs of uint16_t values. Each pair contains x and y coordinates of an array of h264 macroblocks in a frame and will have values from 0 to 1920 for x and 0 to 1080 for y.
Why does intarray contain an odd number of elements?

I would have expected your input to look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
uint16_t encode(const std::pair<uint16_t, uint16_t> &p){
    return p.first / 8 + p.second / 8 * 1920;
}

void test(){
    std::array<std::pair<uint16_t, uint16_t>, 3> pairsarray = {
        {1920, 54}, // See note 1, below.
        {140, 16},
        {4, 0},
    };

    for (auto &p : pairsarray)
        std::cout << "encoded(" << p.first << ", " << p.second << ") = "
            << encode(p) << std::endl;
}
Note 1: Keep in mind that the code I wrote earlier assumes that the input range is [0; 1920) for the x axis and [0; 1080) for the y axis. Using values outside these ranges will produce incorrect results, if you use the constants I used.
I tried this test code. It seems to work for smaller values but not if both the x and y coordinates are close to the maximum values. What's wrong with the code?

Thanks,
Chris

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <array>
#include <vector>
#include <iostream>
#include <string>
#include <bitset>
using namespace std;

template <typename ValueType>
inline void print( const char* name, ValueType value )
  {
  cout << name << " = "
       << bitset <16> ( value ).to_string()
       << " = "
       << (int)value << endl;
  }


uint16_t encode(const std::pair<uint16_t, uint16_t> &p){
    return p.first / 8 + p.second / 8 * 1920;
}

std::pair<uint16_t, uint16_t>  decode(const uint16_t &p){
    std::pair <uint16_t, uint16_t> e;
   
    e.first = ( p % 1920) * 8;
    e.second =( p / 1920) * 8;
    return e;
}


int main()
{
  
  std::vector<uint16_t> encodearray2;
  std::array<std::pair<uint16_t, uint16_t>, 3> pairsarray ={ {
        {1920, 1080}, /
        {140, 16},
        {4, 0},
 }};
    
    
    
  for (auto &p : pairsarray) {
        std::cout << "encoded(" << p.first << ", " << p.second << ") = "
            << encode(p) << std::endl;
    encodearray2.push_back(encode(p)); 
    print( "encoded value", encode(p) );
  }            
              
   for (auto &p : encodearray2) {
       std::pair< uint16_t, uint16_t> o;
       o=decode(p);
        std::cout << "decoded(" << o.first << ", " << o.second << ")"
            <<  std::endl;
                     
   }            

 
}



Output

1
2
3
4
5
6
7
8
9
encoded(1920, 1080) = 62832
encoded value = 1111010101110000 = 62832
encoded(140, 16) = 3857
encoded value = 0000111100010001 = 3857
encoded(4, 0) = 0
encoded value = 0000000000000000 = 0
decoded(11136, 256)
decoded(136, 16)
decoded(0, 0)

Last edited on
looks like way, way too much going into all this.

encode:
unsigned char imauint16[2];
imauint16[0]= the1080var*0.23611111111111111111111111111111;
imauint16[1] = the1920var*0.1328125;
uint16_t result = *((uint16_t*)(imauint16));

decode is the reverse, pull the 16 bit thing apart into 2 bytes and multiply by the conversion factor.
unsigned char * cp = (unsigned char *)result;
first = cp[0]/0.23611111111111111111111111111111;
second = cp[1]/0.1328125;

right?

edit
the pointer gibberish is a bit much here ... a working example of it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

uint16_t encode(int x, int y)
{
unsigned char imauint16[2];
imauint16[0]= x*0.23611111111111111111111111111111;
imauint16[1] = y*0.1328125;
uint16_t result = *((uint16_t*)(imauint16));
return result;
}

void decode(uint16_t x)
{
unsigned char * cp = (unsigned char *)(&x);
uint16_t first = cp[0]/0.23611111111111111111111111111111;
uint16_t second = cp[1]/0.1328125;
cout << first << endl << second<< endl;
}
int main()
{
  int x,y;
  cin >> x >> y;
  unsigned short z = encode(x,y);   
   decode(z);  
    return 0;    
}


this would be even easier if you are open to the union hack instead of DIY with pointer mischief.
Last edited on

1
2
3
uint16_t encode(const std::pair<uint16_t, uint16_t> &p){
    return p.first / 8 + p.second / 8 * 1920;
}

The main problem with this function is that multiplying with 1920 is too wasteful so the result doesn't always fit in an uint16_t. You should actually scale this value too, by dividing by 8, like you do for the other values.

Another problem is that the code assumes 1919 is the highest value that you will be using. If you want to be able to use 1920 you'll have to make it one bigger.

1920 / 8 + 1 = 241

So, if you replace 1920 with 241 in these functions it should work.

1
2
3
4
5
6
7
8
9
10
11
uint16_t encode(const std::pair<uint16_t, uint16_t> &p){
    return p.first / 8 + p.second / 8 * 241;
}

std::pair<uint16_t, uint16_t>  decode(const uint16_t &p){
    std::pair <uint16_t, uint16_t> e;
   
    e.first = ( p % 241) * 8;
    e.second =( p / 241) * 8;
    return e;
}
Last edited on
The main problem with this function is that multiplying with 1920 is too wasteful so the result doesn't always fit in an uint16_t. You should actually scale this value too, by dividing by 8, like you do for the other values.
Good eye! I moved the division by 8 because I thought p.second * 1920 / 8 would not be properly reversible by dividing by (1920/8). I forgot the correct expression was p.second / 8 * (1920 / 8).
Thanks a lot guys. I will be looking at this and evaluating some other strategies. Jonnin put in the idea of exploiting how the numbers are correlated and I think I will try to pursue that avenue as well as I am dealing with two groups of data. This works well for the first group where not having done any detailed statistics but just looking at the numbers, I don't see a specific pattern that I could exploit. The second group of data represents vectors with coordinates that come in ordered representing 16x16 blocks of pixels left to right and top to bottom. I switched my code for the second group of data for now and got it working but I am wondering if my implementation is efficient. I will post the question in a different thread.

Chris
Topic archived. No new replies allowed.