Uncomfortable with my BCD Int class

Hello friends, i have written a class that represents a binary coded decimal integer with 64 bit range, but since im relatively new to c++, i dont feel like i have done a very good job. So if some of you can have a look and show me possible mistakes and improvements, i would be really happy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <cstdint>
class BCDInt64 {
	uint16_t _data[5];

	int constexpr Mask(uint16_t value, int shiftSteps) const {
		return (value >> (shiftSteps * 4)) & 0xF;
	}

public:
	BCDInt64() = default;

	explicit BCDInt64(uint64_t value) : BCDInt64() {
		if (ByteOrder::IsLittleEndian()) {
			for (int int16 = 4; int16 >= 0 && value != 0; int16--) {
				for (int shift = 12; shift >= 0; shift -= 4) {
					_data[int16] |= (value % 10) << shift;
					value /= 10;
				}
			}
			return;
		}
		for (int int16 = 0; int16 < 5 && value != 0; int16++) {
			for (int shift = 12; shift >= 0; shift -= 4) {
				_data[int16] |= (value % 10) << shift;
				value /= 10;
			}
		}
	}

	uint64_t constexpr Value() const {
		uint64_t value = 0ULL;
		for (int int16 = 0; int16 < 5; int16++) {
			value *= 10000;
			if (ByteOrder::IsLittleEndian())
				value += (Mask(_data[int16], 0) * 1000) 
				+ (Mask(_data[int16], 1) * 100) 
				+ (Mask(_data[int16], 2) * 10) 
				+ Mask(_data[int16], 3);
			else
				value += (Mask(_data[int16], 3) * 1000)
				+ (Mask(_data[int16], 2) * 100)
				+ (Mask(_data[int16], 1) *10)
				+ Mask(_data[int16], 0);
		}
		return value;
	}

	int constexpr operator[](int Digit) const {
		int index = 4 - (Digit / 4);
		int shift = 3 - (Digit % 4);
		return Mask(_data[index], shift);
	}

	explicit operator const uint64_t () const { return Value(); };
};
well for starters you could comment some of it. Its just a soup of magic numbers, and I *get* it, that is how this kind of low level stuff needs to be done, but you should either name the values as constants or put some comments saying what you are doing.

also, consider:
byte order is a single cpu instruction if you can call assembly and are on intel boxes (not all cpu have this available). If not, swap it once outside of your class, rather than handle it in each place inside your class, and swap it back out if needed when your tool is done with its internal processing. Visual studio has functions to do this for you, possibly because they nerfed inline asm for 64 bit.

in some places log10 may be useful for base 10 digit count or other operations.

value != 0; same as value without the expression.
int16 is masked as a type in some IDE/systems. may not be a good name.

there are multiple BCD formats.
if its the most simple one, a lookup table of the digits 0-9 in binary can all be appended together with some <bitset> hand waving. I think that would greatly reduce your code and effort. If you need speed, avoid it, but if not, turning the number to a string may simplify it even more.
Last edited on
Comments will be done, but this was a test class so i thought leaving out comments first on such a small class would be more efficient for the development time (its for a larger project).

For the byte order cpu instruction, i want it to be as exportable as possible without changing code, so i will look into that later. assembly itself shouldnt be a problem, but i think doing c++ only first might be better for now (not sure about that though)

For Log10, i dont think it will work here, but i will figure that out too.

If "value != 0" works the same here as "value", i may consider it. Would it be bad to leave it like i did for readability?

And last: strings are normally not an option because the program reads out a file, and for some to me unknown reasons, some numbers are saved in BCD format, so i will read the file in binary mode. Maybe i can implement string support afterwards.

I thank you for your time, I will take my time now to change some things according to your thoughts
even if you want it portable and DIY the byte order, I would still factor it out to simplify everything. Assembly wouldn't be portable, every compiler does it a little diff and every CPU is diff so no go on that.

log10 just gives you the number of digits in a base 10 number (add 1 to it); it may help reduce the number of iterations if you just had to encode say '1'. Its not required.

readability on the condition is subjective. Some folks will say its bloated and harder to read with extra characters, and others will say the explicit logic looks better. Its fine either way. No matter what you do, someone will always be butthurt over your style :)

converting to string if it came in binary is useless, its fine.

so from your feedback the only real change I would make if I did it would be to factor out the byteswapping and leave it alone. If I did it again from scratch, I would at least consider doing it via bitsets or tables, but you already have done too much work to redo it unless you need to do billions per second.


Ok, just one last question: should i expect the byteorder to be bigendian or littleendian per default?
Here is a 2 cent example of how I would have approached it, not in a class yet just a quick hack on the idea:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//use a table to get a byte sized chunk rather than trying to work in nibbles. 
unsigned char bcdtbl[]  //use a program to generate this table, I am not THAT into it to 
//do this by hand or anything :)   this is 0 to 99 in bcd 
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 
80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 
96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 
144, 145, 146, 147, 148, 149, 150, 151, 152, 153		
};

int main()
{
   uintmax_t u {7654};
   uintmax_t bcd {};
   unsigned char*cp = (unsigned char*)(&bcd);
   //whether you go 0-7 or 7 down to 0 for byte order is up to you ... 
   cp[0] = bcdtbl[u%100];   //first byte is 54 in bcd 4 bit nibbles
   cp[1] = bcdtbl[u/100 %100]; //second byte is 76 in bcd nibbles
   //... insert more of the above for additional digits. 
   bitset<64> b = bcd;  //unnecessary, used to print output / debug
   cout << b << endl;
}

output: (byte order may affect this)
0000000000000000000000000000000000000000000000000111011001010100

and its not better just different, far as that goes.
Last edited on
The compiler already knows the machine's byte order. Therefore testing it yourself is a waste of time.
Last edited on
@mbozzi that may be true. But do i get this wrong?

the result of this loop will change depending on the endianness, or does it not?:
1
2
3
4
5
6
7
for (int int16 = 0; int16 < 5; int16++) {
    value *= 10000;
    value += (Mask(_data[int16], 0) * 1000) 
          + (Mask(_data[int16], 1) * 100) 
          + (Mask(_data[int16], 2) * 10) 
          + Mask(_data[int16], 3);
}


if not, im stupid and wasted a lot of time because of that.

@jonnin thank you for the code example. lookup tables are indeed something i might use in the future, but i think it is not worth the memory usage when i dont really need that much speed for the calculation. The file i receive contains only a few of these bcd integers, so execution time on this parsing is not really worth optimizing towards it.
Thank you both anyway :)
the result of this loop will change depending on the endianness, or does it not?
[...]
if not, im stupid and wasted a lot of time because of that.

Bit shifts are specified in terms of multiplications by powers of two, so it doesn't. But that definitely doesn't mean you're stupid.
Last edited on
So i removed the endian check and added comments like i normally do (looks a bit bulked because of not being separated in header and source file yet). if you have final thoughts on something let me know, else i will mark this as solved.

And thank you for your feedback :)

PS: StdTypes is a header created by me and contains alias typedefs for all numeric types i use, like ULong:
typedef uint64_t ULong, UInt64

current code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <StdTypes>

class BCDInt64 {
	// BCD integer data stored in 16-bit container
	UShort _data[5] = { 0 };

	/// <summary>
	/// Masks a digit of 4 bits of an 16-bit container from given position
	/// </summary>
	/// <param name="iValue">16-bit container to mask</param>
	/// <param name="iShiftSteps">Amount of shifts (in steps of 4) needed</param>
	/// <returns>Returns the digit as int</returns>
	const Int constexpr Mask(UShort iValue, Int iShiftSteps) const {
		// Right shift the bits by 4 per shift step and mask the last 4 bits
		return (iValue >> (iShiftSteps * 4)) & 0xF;
	}

public:
	// Default constructor
	BCDInt64() = default;

	/// <summary>
	/// [Explicit] Conversion constructor
	/// </summary>
	/// <param name="iValue">64-bit integer to convert from</param>
	explicit BCDInt64(ULong iValue) : BCDInt64() {
		// Iterate through the data array as long as value is not 0
		for (int i = 4; iValue != 0; i--) {
			// Iterate through the next 4 digits
			for (int shift = 12; shift >= 0; shift -= 4) {
				// Saves the last digit in the BCD integer data and cuts it out
				_data[i] |= (iValue % 10) << shift;
				iValue /= 10;
			}
		}
	} // .ctor BCDInt64(ULong)

	/// <summary>
	/// Getter function for the 64-bit integer value
	/// </summary>
	/// <returns>Returns the non BCD 64-bit integer value</returns>
	const ULong constexpr Value() const {
		ULong value = 0ULL;
		// Iterate through the data array
		for (int i = 0; i < 5; i++) {
			// Decimal left shift return value by 4 digits
			value *= 10000;
			// Add the next 4 digits to the return value
			value += (Mask(_data[i], 0) * 1000) 
			      +  (Mask(_data[i], 1) * 100) 
			      +  (Mask(_data[i], 2) * 10) 
			      +   Mask(_data[i], 3);
		}
		return value;
	} // ULong Value()

	/// <summary>
	/// [Readonly] Array index operator accessing the digits from given position (RtL)
	/// </summary>
	/// <param name="iDigitIndex"></param>
	/// <returns>Returns the digit as Int</returns>
	const Int constexpr operator[](Int iDigitIndex) const {
		Int index = 4 - (iDigitIndex / 4);
		Int shift = 3 - (iDigitIndex % 4);
		return Mask(_data[index], shift);
	} // Int operator[](Int)

	/// <summary>
	/// [Explicit] Conversion operator to convert to ULong
	/// </summary>
	explicit operator const ULong() const { return Value(); };
}; // class BCDInt64 
Last edited on
that is looking a lot better, a lot smaller and neater.

mine was never about the speed. I suspect yours is PDQ. It was about the table (which is a whopping 100 bytes of ram, in a 64+ billion byte computer these days) doing all the ugly bits for you. My approach is literally 8 lines of code (the assignment of the 8 bytes of which I only showed 2) + the table for a 16 digit number packed into an 8 byte int. The 8 lines of code are admittedly ugly as you move thru the modulus /subtraction chains, but the effort was near zero. Again, not saying its better, just explaining the real reason for the approach (sheer laziness / simple code). If it also happens to be faster, thats a win-win. (not sure it would be, its doing 2 nibbles at a time gives it an edge but bit twiddles are super fast too).
Last edited on
@jonnin,

Why would you ever write your table using decimal values? Why not write them in hex (BCD) in the first place. Then you don't need to scratch your head and ask "why is 50 at index 32 of my table?"

Sure, there are more keystrokes to to this, but is is a whole lot more readable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//use a table to get a byte sized chunk rather than trying to work in nibbles. 
unsigned char bcdtbl[]  //use a program to generate this table, I am not THAT into it to 
//do this by hand or anything :)   this is 0 to 99 in bcd 
{
  0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09,
  0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19,
  0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29,
  0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39,
  0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49,
  0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
  0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69,
  0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79,
  0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89,
  0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99
};
I generated the table with a drive by loop, and did not put hex on the print statement. No real reason; I barely looked at the table, it was just a data dump.
Topic archived. No new replies allowed.