Store a double/float (ie 1.2) in a 2 byte 2's complement format

Which approach could be taken to store a double/float value like 1.2 into 16 bit 2's complement format?

Thanks in advance.
Just wondering why you would want to do that?

And what knowledge do you have of how FP is stored?

16 bit will give very little precision, and would be almost unusable as a FP.
I am dealing with an equipment, reading that from it. The documentation says the measurement in between +-1.2 and it is 2 Bytes.

So I think +1.2 will be 0111 1111 1111 1111 and -1.2 will be 1111 1111 1111 1111.

In fact I don't understand how to store that value into 2 Bytes nor even how to read it back.
Last edited on
The equipment says that it takes 16 bit 2's complement format? So are you inputting a decimal to be changed to 16 bit to be input to the equipment or do you want to take a 16 bit from the equiment and convert it to decimal?
Does the documentation say anything about the FP standard used?

Could you quote the relevant part of the documentation verbatim?

There could be various ways of doing this, so it is easiest to find out how the system does it, otherwise one is just guessing.

Btw, I think you reference to two's complement might be an assumption on your part.

If the FP number is only 1 dp and there is no exponent, then I can see how using only 16 bit might work.

What happens if you read the value in as an ordinary float?

It might work if a C float stores it's exponent & associated sign at the end. I can't remember the exact format or which endianess is used - it's just a random idea. Although I have a vague idea that the mantissa sign came first, then the mantissa, then exponent sign, and exponent last.

If this is right any bits in excess of 16 might not matter. You might need to pad the value to be the same size as a float or double so you don't get type mismatch errors.

Any way this is all random speculation & could be totally wrong. As i said, best to find out exactly how it works.

Hope all goes well.
Last edited on
If you have a known range that's [-1.2, +1.2] this could be a simple problem of scaling.

If you need 0 to match up to being actually 0... then you could do something like this:

1
2
3
4
int16_t to16Bit(float f)
{
    return static_cast<int16_t>( f * 0x7FFF / 1.2 );
}


This will convert the floating point value of 1.2 to 0x7FFF (16-bit maximum), and a value of -1.2 to -0x7FFF (close to, but not quite 16-bit minimum).

The problem is that this will fail completely if the value given is above 1.2 or below -1.2.
> The documentation says the measurement in between +-1.2 and it is 2 Bytes.
> So I think +1.2 will be 0111 1111 1111 1111 and -1.2 will be 1111 1111 1111 1111.

That is unlikely.

Chances are that the the representation is in IEEE 754 half-precision (binary16) format.
https://en.wikipedia.org/wiki/Half-precision_floating-point_format


If it is IEEE 754 half-precision, use a library. For instance: http://half.sourceforge.net/

Topic archived. No new replies allowed.