Parsing Bytes with Varying Fields

I have a byte stream that represents a message in my application. There are 5 fields in the message for demonstration. The first byte in the stream indicates which message fields are present for the current stream. For instance 0x2 in the byte-0 means only the Field-1 is present for the current stream.

The mask field might have 2^5=32 different values. To parse this varying width of message, I wrote the example structure and parser below. My question is, is there any other way to parse such dynamically changing fields? If the message had 64 fields with I would have to write 64 cases, which is cumbersome.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include <iostream>

typedef struct
{
    uint8_t iDummy0;
    int iDummy1;
}__attribute__((packed, aligned(1)))Field4;

typedef struct
{
    int iField0;
    uint8_t ui8Field1;
    short i16Field2;
    long long i64Field3;
    Field4 stField4;
}__attribute__((packed, aligned(1)))MessageStream;

char* constructIncomingMessage()
{
    char* cpStream = new char(1+sizeof(MessageStream)); // Demonstrative message byte array
                                                            // 1 byte for Mask, 20 bytes for messageStream

    cpStream[0] = 0x1F; // the 0-th byte is a mask marking
                                // which fields are present for the messageStream
                                // all 5 fields are present for the example
    return cpStream;
}

void deleteMessage( char* cpMessage)
{
    delete cpMessage;
}

int main() {
    MessageStream messageStream; // Local storage for messageStream
    uint8_t ui8FieldMask; // Mask to indicate which fields of messageStream
                            // are present for the current incoming message
    const uint8_t ui8BitIsolator = 0x01;
    uint8_t ui8FieldPresent; // ANDed result of Mask and Isolator

    std::size_t szParsedByteCount = 0; // Total number of parsed bytes

    const std::size_t szMaxMessageFieldCount = 5; // There can be maximum 5 fields in
                                                    // the messageStream

    char* cpMessageStream = constructIncomingMessage();
    ui8FieldMask = (uint8_t)cpMessageStream[0];
    szParsedByteCount += 1;

    for(std::size_t i = 0; i<szMaxMessageFieldCount; ++i)
    {
        ui8FieldPresent = ui8FieldMask & ui8BitIsolator;

        if(ui8FieldPresent)
        {
            switch(i)
            {
                case 0:
                {
                    memcpy(&messageStream.iField0, cpMessageStream+szParsedByteCount, sizeof(messageStream.iField0));
                    szParsedByteCount += sizeof(messageStream.iField0);
                    break;
                }
                case 1:
                {
                    memcpy(&messageStream.ui8Field1, cpMessageStream+szParsedByteCount, sizeof(messageStream.ui8Field1));
                    szParsedByteCount += sizeof(messageStream.ui8Field1);
                    break;
                }
                case 2:
                {
                    memcpy(&messageStream.i16Field2, cpMessageStream+szParsedByteCount, sizeof(messageStream.i16Field2));
                    szParsedByteCount += sizeof(messageStream.i16Field2);
                    break;
                }
                case 3:
                {
                    memcpy(&messageStream.i64Field3, cpMessageStream+szParsedByteCount, sizeof(messageStream.i64Field3));
                    szParsedByteCount += sizeof(messageStream.i64Field3);
                    break;
                }
                case 4:
                {
                    memcpy(&messageStream.stField4, cpMessageStream+szParsedByteCount, sizeof(messageStream.stField4));
                    szParsedByteCount += sizeof(messageStream.stField4);
                    break;
                }
                default:
                {
                    std::cerr << "Undefined Message field number: " << i << '\n';
                    break;
                }
            }
        }
        ui8FieldMask >>= 1; // shift the mask
    }

    delete deleteMessage(cpMessageStream);
    return 0;
}
Last edited on
yea I misunderstood.

if you have 64 fields, you need 64 extracts, yes.
The other option is if you have like 10-20 messages, you can do it 'message wise' instead of field-wise. That is, if message 1 has fields 11,12, 32, and 63 ... just group those up, and message 2 has fields 1-23 and 37, group that set up... and you have 10-20 cases instead of 64, but its less flexible (but more practical).

another option is to not 'name' the fields. That is, if field 1 is a double named whatever and field 2 is another double named something else, don't do that, share them and put something in the message to indicate maybe both what it is (type) and where it goes (location key). I don't know if you are doing anything like that or not?

also, will a loop work?
eg for(# of fields)
{
if field is there
switch field type (lookup by field #?)
{
case double: extract_double();
case int: extract_int(); ...etc?
}
}

finally you can combine the ideas... possibly make it message based with a presence within that to drop unused fields.
Last edited on
It is not possible to group the variables that are of same type, as I will be using the variable themselves actively that requires readability withing the code. I should look for other alternative solutions, I suppose.
You can use the variant/visit mechanism.
1. The variant will be the possible records you expect to receive.
2. Use a factory to create the correct type of variant from the stream.
3. Use visit to process.
3.1. You can have common processing, in which case a single handler does the common processing.
3.2. You can have distinct processing, in which case the handler will be like your switch.
Nothing I suggested would be unreadable if you took some time to implement it cleanly.

grouping variables of the same type, one really dumb and simple but readable way.

double data[3];
enum msgtype1{field1, field2, field3, mt1max};
enum msgtype2{red, green, blue, mt2max};
enum msgtype3{heading, pitch, roll, mt3max};
...
working with message type 2..
data[red] = something;
...
working with message type 3
data[pitch] = 3.14;
or for(i = 0; i < mt3max; i++)
foo(data[i]); //iterate all the items in that message of that type.

crude, but its readable if you use good enum names. There are other ways to do the above, its just a basic approach. 3 spaces, represent 9 things. It may not work for your code, that is fine. But it does not need to be a mess like making double_number_1 and using it for 20 different meanings. You can also use references to rename an array location if all the enums seems clunky. double &pitch = data[1];


@kbw

I didn't know the variant/visit mechanism. I am going to check it and try to understand your suggestions. Thank you for recommendations.

@jonnin

I see, for a specific type of message, place the same types into single array and use enums to get the respective variable out of the array. Then, I will have a lookup table that shows the type of the n-th element in the message stream. I will see if I can arrange it, because to have consistency among the code, parsing or using message field should be somewhat same. Thank you for the elaboration.
Topic archived. No new replies allowed.