Uintvar character input as hex to a decimal integer

Hi all. So I'm very new to C++ as I've been coding in Java for the length of my programming career and only just started on C++ recently. I've been tasked with creating a program that accepts input from the command-line and print results to the standard output. Each Uintvar that is input is assumed to be in hexadecimal and the output should display the decimal integer. An example would be

./a.out 8a5c 8102

which should print to the standard output the lines

8a5c: 1372
8102: 130

I've managed to wrap my head around how the conversion of a hexadecimal uintvar to a decimal but am unsure about how to write this in code. So far, my program reads input and converts it to the binary representation of the hexadecimal input. Now what I would like the program to do is divide the size of the input string by 8 to segment it into 8 bit binary figures. Knock the last bit off the string then do the base 128 conversion of each segment. This should give the decimal integer. Any ideas? Any help is greatly appreciated. Code below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>  // Header for stream I/O.
#include <sstream>   // Header for string stream.
#include <cmath>
#include <cstdlib>

using namespace std;

string BinaryFromHex (string hex);

int main (int argc, char *argv[])
{
	for(int i = 1; i < argc; i++){
		cout << atoi(argv[i]) << ": ";
		string input = BinaryFromHex(argv[i]);
		cout << input << "\n";

	}
	return 0;
}

string BinaryFromHex (string hex)
{
	string binary = "";
	for (int i = 0; i < (int)hex.length (); ++i){
		switch (hex [i]){
			case '0': binary.append ("0000");
				break;
			case '1': binary.append ("0001");

...all the other binary representations continue...
isn't 0x8A5C, a 35420 in dec, and 0x8102, a 33026 ?
Yea sure, but the values that are input aren't hexadecimal, they are just in hexadecimal representation. The user actually inputs uintvars in hex representation. From the brief "For each unitvar supplied in the command-line the program prints the corresponding integer values. Inputs in the command-line are assumed to be in hexadecimal representation. The output integer is in decimal representation."
Last edited on
the values that are input aren't hexadecimal, they are just in hexadecimal representation.


They are what exactly in hexadecimal representation?
Not ordinary numbers. Instead they are....?
They are uintvars. A unitvar representation uses variable number of bytes to represent an unsigned integer. Only the lower 7 bits of a byte are used for the number. This means the maximum value a single byte can house is 127. Values greater than 127 bytes will need more than a byte to represent them.

The most-significant bit (MSB) of the byte is used as a continuation marker. If MSB is 0, that means no further bytes to follow, and an MSB of 1 means there is a byte to follow. This allows chaining an arbitrary number of bytes.

Here is the only page I can find on them. http://en.wikipedia.org/wiki/Variable-length_quantity
Basically what I want the program to do is this. Once the user inputs what ever they want, e.g. 8a5c, the program should convert this into binary.

8A5C = 1000101001011100 if I'm not mistaken. Then return this binary string back to the main function where it takes the length and divides it by 8 so that we get 8 bit segments.

The result will be 10001010:01011100. Then check whether the last byte has a 0 at the end. If it does, that's the end of the uintvar. Every other 8 bit segment that ends in a 1 means there is another 8 bit segments coming. Finally knock off the 8th bit so there are 7 bit segments and sum those to get a decimal number. So fort the above example this would give 10 : 92. Then convert to base 128. 10 * 128^1 + 92 * 128^0 which gives 1372 which is the answer in my first post... I'm stuck at the point of changing the binary string into 8 bit segments.
I understand the representation now, thank you.

8a5c = (0*16+10)*128 + (5*16+12) = 1372
8102 = (0*16+1)*128 + (0*16+2) = 130
I'd say that converting the hex string into a binary string is just adding an unnecessary step, without getting closer to a result. All that's required is to process the input two characters at a time.

For each pair of characters as a substring named "pair", use the cstdlib function strtol() to convert to a long integer.

1
2
3
    char pair[3];                        // null-terminated string of two hex digits
    char * endptr    = 0;                // Unused                     
    long val = std::strtol(pair, &endptr, 16);   // convert to integer 


We are only interested in the bottom 8 bits of this value.
Then use the standard c/c++ bit operations to check the continuation flag, and mask the required 7 bits to get the value.
Thank you Chervil, you're responses have all been a great help. Seems to be more or less working now, just with one bug. There are some inputs, such as EFFA or 9A81 that should return 0 instead of a number. This is because there is no most significant bit that is set to 0. Do you know what I mean? EFFA would be 111011111111010 in hex. There is no MSB set to 0 so it should return 0 which represents an error. How would one check for this?
Glad you're making progress.

You could have two booleans, both set to 'false' at the start.

When a MSB is found 'on', set "continuation_required" to true.
When a MSB is found 'off', set "continuation_found" to true.

Just before returning the result, check the two flags. If continuation is expected but not found, set the result to zero.
All done! Thanks for all the help Chervil.
static rage, how did you swap the bits after you tested the conditions?
Lets say the input was 8a5c, which is 10001010 01011100, how did you change the MSB and LSB? I tried using value of the last byte ^ 10000000 if the LSB of the previous byte = 1, and if the LSB = 0, use 00000000, but i get some enormous values that don't make logical sense.

heres my code, i tried just testing things first, which is why i havent finished the odd input etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <iostream>  // Header for stream I/O.
#include <sstream>   // Header for string stream.
#include <cmath>
#include <cstdlib>

using namespace std;

void BinaryFromHex (string hex);

int main (int argc, char *argv[])
{
	for(int i = 1; i < argc; i++){
		//cout << atoi(argv[i]) << ": ";
		//string input = convert(argv[i]);
        convert(argv[i]);
		//cout << input << "\n";
	}
	return 0;
}

void convert (string hex){
    cout << "Input: " << hex << endl;
    int length = (int)hex.length(), count = 0, LSB = 1;
    if((length % 2) == 0) { //even input
        for(int i = 0; i < ((length/2)); i++){
            char pair[3]; pair[0] = hex[count]; count++; pair[1] = hex[count]; count++;
            char* endptr = 0;
            long value = strtol(pair, &endptr, 16);
            value = (value & 0xFF);
            if((value & 10000000) == 128){ //if MSB = 1
                if(count == length){ //if MSB = 1 and the byte is the last byte, output 0
                    value = 0;
                    cout << value << endl;
                } else {
                    //here
                }
            } else { //last byte, MSB = 0
                if(count == 2){
                    cout << value << endl;
                    break;
                } else {
                    cout << pair << endl;
                    cout << value << endl;
                }
            }

        }
    } else { //odd input
        
    }
}
Last edited on
@magic magic

Please use code tags, <> on the right.

Look at this code:
1
2
3
    value = (value & 0xFF);

    if ((value & 10000000) == 128){      //    if MSB = 1  


Line 1 is correctly using 0xFF to represent binary 11111111.
Note, 0xFF = decimal 255.

But line 3 is using the decimal number 10000000.
You need to use either 0x80 which is binary 10000000
or you might use decimal 128.

The hexadecimal style is recommended, like this:
if ((value & 0x80) == 0x80){ // if MSB = 1

Since this is just a yes/no test, you could simplify the code like this:
if (value & 0x80) {

One more comment. Function strtol() requires a null-terminated character string. I don't think your code assigns any particular value to pair[2], so results might be unpredictable.
Last edited on
@Cervill

thank you for the tips.
I spent awhile last night studying the arts of masking and bit manipulation, to the task would become possible to do. I took this task to a different approach, working from the last byte, adding the value to the Result variable, and then left shifting 8 bits to start the process again.

What I am having difficulties with is 1: originally taking out all the MSB of each byte. lets say i use 8080805c which is 10000000 10000000 10000000 01011100, i would like to get rid of all those ones, as they will not be used in the result.

Secondly, i need to somehow shift the result variable each time so my last result wont override my previous one. lets say i use 8a5c, which is 10001010 01011100, at first value will be set to 92 (5c), and then i want to add the value of a (1010) to the start of the result array. does this make sense? this is were i'm at.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void convert(string input){
    int length = input.size(), result = 0;
    long start = 0;
    stringstream(input) >> hex >> start;
    long shiftval = start;
    
    for(int i = 0; i < 2; i++){
        int temp = (start & 0xFF); //mask the last byte
        temp = (temp & 0x7f); //mask 7 bits
        result = (temp | result);
        shiftval >>= 1; 
        temp = (shiftval & 0x80); //add the MSB to 
        result = (temp | result);
        shiftval >>= 7;
        start = shiftval;
    }
    
    cout << "Result: " << result << endl;
}
Last edited on
Wow, that's a different approach than I'd thought of.

I played around with the idea for a bit, I think the work fields need to be of type unsigned long (or perhaps unsigned int) just in case the input is something like "abcdef70".

My idea is not to shift the result field at all.

Instead, set up a mask like this: unsigned int mask = 0x7f;. This is used (with the "and" operator) to obtain the numeric part of the byte we are interested in.

Each time around the for loop, shift the mask left by 7 bits. At the same time the input data (start or shiftval) is shifted right by 1 bit.

In addition you will need another mask to test the MSB, which I set initially to 0x8000. Again this is shifted left by 7 bits each time around the loop.

Those are just some ideas to play with, I'm not entirely comfortable with this approach, but it does have some merits.

1
2
3
4
    unsigned int result = 0;
    unsigned int start;
    unsigned int mask    = 0x7f;
    unsigned int msbmask = 0x8000; 
Last edited on
@Chervill

Thank you so much for the support. You are helping me out a lot, as I am very new to this whole style of programming. However, i still get the same result. With 8a5c, i still get 92 as my result output, no matter how large the loop is.

It seems as though its only processing the last byte, and doesnt add the value of the first. I know how to do the testing that are required, but cant manage to change the value with the other bytes.

this is where i'm at with your approach :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void convert(string input){
    cout << "Input: " << input << endl;
    
    unsigned long result = 0;
    unsigned long start;
    unsigned long mask = 0x7f;
    unsigned long msbmask = 0x8000;
    stringstream(input) >> hex >> start;
    unsigned long shiftval = start;
    
    for(int i = 0; i < 2; i++){
        long temp = (start & 0xFF); //mask the last byte
        temp = (temp & mask); //mask 7 bits
        result = (temp | result);
        mask <<= 7;
        shiftval >>= 1;
        temp = (shiftval & msbmask); //add the MSB to
        result = (temp | result);
        msbmask <<= 7;
        start = shiftval;
    }
    
    cout << "Result: " << result << endl;
}


is this what you meant?
That's partly what I meant:
This is closer (but still not 100% right):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void convert(string input){
    cout << "Input: " << input << endl;

    unsigned long result = 0;
    unsigned long start;
    unsigned long mask = 0x7f;
    unsigned long msbmask = 0x8000;
    stringstream(input) >> hex >> start;

    for(int i = 0; i < 4; i++){
        long temp = (start & mask); //mask the current byte
        result = (temp | result);
        mask <<= 7;
        start >>= 1;
        // temp = (shiftval & msbmask); //add the MSB to
        // result = (temp | result);
        msbmask <<= 7;
    }

    cout << "Result: " << result << endl;
}


Note, in this version the "for" loop goes around 4 times.
What it does NOT currently do is test the MSB. You don't need to add or combine this bit with the result. Remember the MSB is used to indicate whether there is a continuation byte. If there is not, you need to break out of the loop.
That's why I suggested using 0x8000. This will test (with the "and" operator) the next byte. If the bit is not set, the loop can end.
(I deliberately made lines 15 and 16 comment lines, as they were incorrect).

Another thing missing in this version, the first byte (starting from the right, as you are doing here) should always have a MSB of zero - because this is the end of the string.

Also, the code could be shortened a bit. These two lines:
1
2
        long temp = (start & mask); //mask the current byte
        result = (temp | result);


can be merged into one:
 
        result |= (start & mask); //mask the current byte 

Last edited on
Here's why I have reservations.

Consider this input: "ff00"
Result: 16256

Now what about this: "ff0000"?

If you start processing from the left, you will get 16256 as before, and ignore the last "00".

On the other hand, if you start processing from the right, the "next" byte is "00", there is no continuation bit so the process terminates and the result is zero.

Which one is correct?
well, when i run the code, i get ff0000: 2080768 ff00: 16256
but that is because i havent yet finished the testings for the last byte.

on paper, i get 0x3f80, which is 16256 in decimal. So that is the correct answer.

I am now trying to set the first byte with MSB=0 to be the last byte.

i also did
1
2
if(x & 0x80){ //if there is no byte with MSB = 0, return 0
        result = 0;
in case of inputs like 9a81 which have no MSB =0
Topic archived. No new replies allowed.