memcpy to a data struct not working

Pages: 12
GPS sends well-defined structured data out, for example, Applanix (POS TNG3 and POS LV5) sends Group 1, where latitude and latitude ... count to 132 bytes are defined.
"ftp://nic2b.whoi.edu/pub/instruments/posmv/info_from_knorr/PUBS-ICD-000027_POS_TNG3_Core_User_ICD_Rev9.pdf"

Follow the specified fields order and types, struct Data {} is implemented.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
        struct Data {
            struct TimeDistance {
                double time1;     // seconds
                double time2;     // seconds
                double distance;  // meters
                char timeType; 
                char distType; 
            } td; // 26 bytes
            
            double lat; // degrees
            double lng; // degrees
            double alt; // meters

            struct Velocity { // meters/second
                float North;
                float East;
                float Down;
            } velocity;

            struct Vehicle
            {
                double roll;  // degrees
                double pitch;   // degrees
                double yaw;//heading; // degrees

                double WanderAngle; // degrees
                float TrackAngle; // degrees
                float Speed; // meters/second

                struct AngularRateAboutAxis { // degrees/second
                    float Longitudinal;
                    float Transverse;
                    float Down;
                } angularRateAbout;

                struct Acceleration { // meters/second^2
                    float Longitudinal;
                    float Transverse;
                    float Down;
                } acceleration;
            } vehicle;
            char AlignmentStatus;
            char pad; // 1
            unsigned short checksum;
            char end[2];
        } d;

Add up all bytes which is exactly 132.
Upon receiving the data (char *), memcpy to d seems the simplest way to get values of every fields.
 
        memcpy((char *)(&d), data, 132);

No, it doesn't work because sizeof(d) is 144, "structural internal paddings" dislocate the fields.
What is your suggestion to get values of every fields?
You need to:
1. Not use char, short, int, etc. Use <stdint.h>/<cstdint> types.
2. Disable padding for that particular struct. For example, in MSVC it's
1
2
3
4
5
6
7
8
9
#pragma pack(push, 1)

struct SomeStruct{
    //...
};

//(Other structs that should not be padded.)

#pragma pack(pop) 
Check your compiler documentation.

Keep in mind that not padding structures has a run time cost, so you should only do this when your struct defines a storage or transmission data format.

EDIT: It's unclear to me if you're using C or C++, but if it's C++, static_assert() can help you find these kinds of problems:
static_assert(sizeof(Data) == 132, "You forgot to disable padding!");
Last edited on
You can get the size of the struct and memcpy that many bytes. No need to count by hand.

memcpy(&d, data, sizeof(Data));
padding can be good or bad. It wastes space in files on the disk, but it may be more efficient in memory (more space but reduced effort to access it correctly, depending on the CPU's instructions). I generally don't mess with the defaults for the system unless I see an issue, and go with the sizeof approach.

also remember that if the struct has any sort of pointer (including stl containers) memcpy WILL cause problems. Its easy to forget that :)
OP needs to implement a data format they have no control over. Without disabling padding the struct's members will be in the wrong places for the GPS data.
Last edited on
Thanks, helios. Disable padding works, also static_assert helps.
This is embedded system (Critical Link ARM, SOM)
Run time cost is my major concern, because GPS real-time broadcasts group data at 200 Hz, meanwhile two frame grabbers run at 100 Hz, and I don't want loose any frames.
Is there a solution not sacrifice efficiency?

Thanks Repeater, jonnin and helios for replies.
You're not really sacrificing efficiency, because that would imply that there's a faster way to do what you want to do. There's not.

The fastest way to copy a buffer to a struct is via memcpy(), and if you want to use the destination struct's members in a meaningful way you need it to be unpadded.
The struct being unpadded means that some memory accesses into the struct may be slightly slower than if the struct had been padded, because the CPU may need to do unaligned accesses for some of the members. But this is irrelevant because you can't pad the struct.

If I decide to walk home from work I'm missing out on the efficiency I could get from using the wheels on my legs to move instead. But I don't have wheels on my legs, so it was never an option in the first place!
Run time cost is my major concern


If you care about speed, let the compiler do whatever padding it likes. The padding is there to make using the struct's data faster.
Based on enlightenment from both helios and Repeater, I come up with 4 solutions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#define DATA_FIELDS                  \
    struct TimeDistance {            \
        double time1;                \
        double time2;                \
        double distance;             \
        char timeType;               \
        char distType;               \
    } td;                            \
                                     \
    double lat;                      \
    double lng;                      \
    double alt;                      \
                                     \
    struct Velocity {                \
        float North;                 \
        float East;                  \
        float Down;                  \
    } velocity;                      \
                                     \
    struct Vehicle                   \
    {                                \
        double roll;                 \
        double pitch;                \
        double yaw;                  \
                                     \
        double WanderAngle;          \
        float TrackAngle;            \
        float Speed;                 \
                                     \
        struct AngularRateAboutAxis {\
            float Longitudinal;      \
            float Transverse;        \
            float Down;              \
        } angularRateAbout;          \
                                     \
        struct Acceleration {        \
            float Longitudinal;      \
            float Transverse;        \
            float Down;              \
        } acceleration;              \
    } vehicle;                       \
    char AlignmentStatus;            \
    char pad;                        \
    unsigned short checksum;         \
    char end[2];

#pragma pack(push, 1)
struct Data {
    DATA_FIELDS
};
#pragma pack(pop)

struct Data1 {
    DATA_FIELDS
};
1
2
3
4
5
6
7
8
9
void solution1(char *source, Data &d) { // no padding
    memcpy(&d, source, sizeof(Data));
}

void solution2(char *source, Data1 &d) { 
    memcpy((char *)(&d), source, 26); // followed by 6 un-named paddings
    memcpy((char *)(&d.lat), source+26, 36); // by 4 paddings
    memcpy((char *)(&d.vehicle.roll), source+26+36, 70); // by 2
}

or let's use cast instead of copy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#define REINTERPRET int i = 0;                                                              \
    d.td.time1                              = *reinterpret_cast<double *>(source+i); i += 8;\
    d.td.time2                              = *reinterpret_cast<double *>(source+i); i += 8;\
    d.td.distance                           = *reinterpret_cast<double *>(source+i); i += 8;\
    d.td.timeType                           =                             source[i]; i += 1;\
    d.td.distType                           =                             source[i]; i += 1;\
    d.lat                                   = *reinterpret_cast<double *>(source+i); i += 8;\
    d.lng                                   = *reinterpret_cast<double *>(source+i); i += 8;\
    d.alt                                   = *reinterpret_cast<double *>(source+i); i += 8;\
    d.velocity.North                        = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.velocity.East                         = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.velocity.Down                         = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.roll                          = *reinterpret_cast<double *>(source+i); i += 8;\
    d.vehicle.pitch                         = *reinterpret_cast<double *>(source+i); i += 8;\
    d.vehicle.yaw                           = *reinterpret_cast<double *>(source+i); i += 8;\
    d.vehicle.WanderAngle                   = *reinterpret_cast<double *>(source+i); i += 8;\
    d.vehicle.TrackAngle                    = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.Speed                         = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.angularRateAbout.Longitudinal = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.angularRateAbout.Transverse   = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.angularRateAbout.Down         = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.acceleration.Longitudinal     = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.acceleration.Transverse       = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.vehicle.acceleration.Down             = *reinterpret_cast<float  *>(source+i); i += 4;\
    d.AlignmentStatus                       =                             source[i];
1
2
3
4
5
6
7
void solution3(char *source, Data &d) {
    REINTERPRET
}

void solution4(char *source, Data1 &d) {
    REINTERPRET
}

Four solutions give the same output. Which one is better, more efficient?
well since you coded them, why not run them all on your system and see which one is fastest?

Best, most efficient code is often hard to zen out in c++. You can make some highly educated guesses, but the CPU type, compiler optimization settings and capability, memory management, and a whole lot of other stuff go into real time coding.

When speed matters, write the code a few ways and time it to see. Then pick the best one and try to understand why it was better and what could be done better in that version to make it faster still.

I am about 99% sure that solutions 3 and 4 are going to be slower. Memcpy is very good at what it does. Assignment operators are less good at what they do when done this way with a bunch of them. The only way I can see this as faster is if you have some sort of internally parallel cpu like a DSP that can do several of these statements at once.

if I had to guess, I would still say the padded struct will be a little faster, but on that I am *totally guessing*. Heres the deal. If your machine cpu is set up so that dealing with fixed sized (word sized?) data elements is easier for it to do, then padded will be faster most likely for such a small struct. However if the alignment has little to no effect on your hardware, then the extra wasted bytes are still being copied, and that makes memcpy a tad slower, so it may be faster to use unpadded. Most embedded systems are probably going to like the padding more.

Basically, what Helios said, except I recall compiler flags on some systems that let you govern how things are padded/aligned. So it may be that you can control this, but its almost always best to use the defaults, I never found monkeying with this to be an improvement.

is float even sufficient to hold gps data? Some systems may be microscopically faster using doubles anyway, for the same reasons (bus width / alignment, and promotion / reduction in and out of the FPU).

There may also be a practical aspect here. Do you really need *float precision sloppy value gps* at that rate? IMHO you could read the gps at 50hz and not lose much just because of the low resolution of the data values, unless you are moving at some multiple of the speed of sound or something? Am I missing something here? For reference we landed a 2 square foot unmanned aircraft on a roughly 10*20 square foot unmanned watercraft rigged as a 'carrier' using a 1 meter resolution gps @ 60 hz, both vehicles moving at 20 or 30 or so MPH (its been a long time, don't remember the speeds well).



Last edited on
Solutions #3 and #4 are the worst.

Solution #2 is marginally slower than #1 during copy from the buffer to the struct. Solution #1 may be marginally slower than #2 during access of the struct members, but it might not be.
Overall which one is slower is difficult to predict because it depends both on the particular platform (does it have a cache? Does it optimize for aligned accesses?) and on your particular code (how often does it need to move data from a buffer to a struct vs. how often will it need to perform unaligned accesses?)

Personally, I strongly discourage solution #2. In order to implement it you're adding magic numbers to your code (26, 36, and 70) without any explanation of where they come from or how to derive them, not to mention that the copy of data is no longer trivial.
Suppose I'm the maintenance programmer than comes along after you. How long before I screw up making the copy because there's only one way to do it right and many ways to do it wrong? What if later on I need to support the next revision of the GPS hardware that adds a field somewhere in the middle? How do I figure out the new magic numbers?
Solution #2 is inherently more brittle.

is float even sufficient to hold gps data?
It depends. float has 24 bits of mantissa, which is enough for a minimum resolution (at the equator) of approximately 2.4 m West-East and roughly twice as accurate North-South. Most civilian GPS struggle to get sub-10 m resolution, although a few can reach centimeter resolution.
There is no magic number. The inline comments tell exactly where and how many paddings are.

TimeDistance has 4 fields, bytes count to 26, followed by 6 un-named paddings. Then, 36 more bytes followed by 4 un-named paddings, and 70 more bytes to finished the copy: 26+36+70=132.

To figure out exactly where and how many paddings,
1) define a field equal operator,
2) make a byte change. If equal remains, this byte is a padding byte.

Do for ENVIRONMENT64:
There are 12 structal paddings inside Data struct of Group 1
00000000000000000000000000111111000...
Do for ENVIRONMENT32:
There are 4 structal paddings inside Data struct of Group 1
00000000000000000000000000110000000...
1
2
3
4
5
6
7
    static_assert(sizeof(Data) == 132, "You forgot to disable padding!");

#ifdef ENVIRONMENT64
    static_assert(sizeof(Data1) == 144, "Enable padding");
#else
    static_assert(sizeof(Data1) == 136, "Enable padding");
#endif 

Otherwise, the code cannot compile.
1
2
3
4
5
6
7
8
9
10
void solution2(const char *source, Data1 &d) {
#ifdef ENVIRONMENT64
    memcpy((char *)(&d), source, 26);
    memcpy((char *)(&d.lat), source+26, 36);
    memcpy((char *)(&d.vehicle.roll), source+26+36, 70);
#else
    memcpy((char *)(&d), source, 26);
    memcpy((char *)(&d.lat), source+26, 132-26);
#endif
}


GPS have pretty stable data structure. For example, from POS TNG3 to POS LV5 release, about 10 years passed, quality improved a lot, but the data structure remains unchanged.

The code will be finally run in an embedded system. Before I reach the hardware, here is an early result on two development machines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    size_t N = 9999999999;
    {
        Timeit timeit;
        double sum = 0;
        for (size_t i = 0; i < N; ++ i) {
            Data d;
            solution1(source, d);
            sum += d.alt;
        }
        cout << "1: alt: "  << sum/N << endl;
    }
    {
        Timeit timeit;
        double sum = 0;
        for (size_t i = 0; i < N; ++ i) {
            Data1 d;
            solution2(source, d);
            sum += d.alt;
        }
        cout << "2: alt: "  << sum/N << endl;
    }
    {
        Timeit timeit;
        double sum = 0;
        for (size_t i = 0; i < N; ++ i) {
            Data d;
            solution3(source, d);
            sum += d.alt;
        }
        cout << "3: alt: "  << sum/N << endl;
    }
    {
        Timeit timeit;
        double sum = 0;
        for (size_t i = 0; i < N; ++ i) {
            Data1 d;
            solution4(source, d);
            sum += d.alt;
        }
        cout << "4: alt: "  << sum/N << endl;
    }

1
2
3
4
5
6
7
8
9
10
11
Windows(64bit)
1: alt: -86.9947818556557	Time elapsed 8125
2: alt: -86.9947818556557	Time elapsed 8123
3: alt: -86.9947818556557	Time elapsed 8118
4: alt: -86.9947818556557	Time elapsed 8122

Fedora (32bit)
1: alt: -86.99479215760162276 Time elapsed 18222
2: alt: -86.99479215760162276 Time elapsed 15894
3: alt: -86.99479215760162276 Time elapsed 10429
4: alt: -86.99479215760162276 Time elapsed 10435


About GPS accuracy: 10 centimeter results in 0.000001 degree at latitude 40.47028.
so lat,lng,alt are in double precision, but IMU data can be float.
Those timings don't sound right. Are you sure the compiler didn't optimize away some assignments for solutions #3 and #4? You're only reading a single member from the struct, so that would be a valid optimization.
Also, you should make these measurements on the microcontroller. The performance on x86(-64) doesn't tell you a lot about the performance on the microcontroller.
Last edited on
Add two more fields to sum:
 
        sum += d.alt + d.vehicle.roll + d.vehicle.angularRateAbout.Down;

Result in:
1
2
3
4
1: alt: -86.995925526952106570	Time elapsed 17884
2: alt: -86.995925526952106570	Time elapsed 17273
3: alt: -86.995925526952106570	Time elapsed 10965
4: alt: -86.995925526952106570	Time elapsed 10171

It looks consistent.
You should just check the generated code to see what the compiler did.
Finally I get a harware, which has two ARMv7 Processors:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
root@mitysom-5csx:/# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 0 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

processor       : 1
model name      : ARMv7 Processor rev 0 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

The first three solutions gives:
1
2
3
4
looping	N = 999999999
1: alt: 185.1183579319253	Time elapsed 102561
2: alt: 185.1183579319253	Time elapsed 138829
3: alt: 185.1183579319253	Time elapsed 296412

The 4th solution crashed on "Bus error" -- does anyone know why?
You could try running the x86 version through Valgrind to see if there's any invalid accesses.
It crashed at line 7 posted in "cast instead of copy" , where i = 26,
 
 d.lat                                   = *reinterpret_cast<double *>(source+i); i += 8;

while solution 3 went through this line of code without any problem, but solution 4 got crashed at this same line!
Last edited on
The relevant bit of the cast operation looks like this:
1
2
    d.td.distType                           =                             source[i]; i += 1;
    d.lat                                   = *reinterpret_cast<double *>(source+i); i += 8;

Note that you didn't disable padding for Data1, so you're likely reading padding bytes following the structure member d.td.distType. Using that junk as a pointer explains the bus error.
Edit: wrong. the buffer source is not a Data1 struct.

The source of the data is a byte stream that is not padded. You must account for that.
Depending on how often you must access each raw frame, it might be worth it to copy it into a padded struct once.

I wouldn't suggest bothering with that unless its really necessary.
Last edited on
Thank you, mbozzi.
1) The relevant bit of the cast operation, as you listed, is correct.
2) You are right that Data1 has paddings, and
3) you are also right that the source is compact (not padded).
but I don't understand which field you said "Using that junk as a pointer explains the bus error." ?

Here is the source which run into a crash ONLY in ARMv7, an embedded system.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
    string group1DataInBase64 = "XMgWgsB3AEFcyBaCwHcAQQAAAAAAAAAAEQGyhrk9MjxE"
                                "QFVFVa2qv1XA/HbYmskjZ0DlnMe9Bwa3PfkrzjylMYtE"
                                "hAp3v9dWNq5OS+6/xYR97LtgdkAUpnkwnUkUv4d7CUPU"
                                "aAc+llaUunMQhT7XhBi90V1Vvnf7+j2zGxQ+BACqEyQj";
    string raw = base64::decode(group1DataInBase64);
    size_t size = raw.size();
    if (size != 132) {
        cout << "Group 1 Data size is not as expected!" << endl;
        return false;
    }
    char *source = new char [size + 1];
    memcpy(source, raw.c_str(), size);
    cout << "sizeof(Data):" << sizeof(Data)
         << ", sizeof(Data1):" << sizeof(Data1) << endl;

    Data d;
    solution1(source, d);
    cout << setprecision(numeric_limits<long double>::digits10 + 1) << d << endl;
    if (fabs(d.alt - 185.1183599689319) > 1e-12) {
        cout << "Valadate altititude number failed!" << endl;
        return false;
    } else {
        cout << "alt:" << d.alt << endl;
    }

    Data1 d1;
    solution2(source, d1);
    if (d1 != d) {
        cout << "solution1 and 2: d1 != d" << endl;
        return false;
    } else {
        cout << "yes, d1 == d as expected." << endl;
    }

    solution3(source, d);
    if (d1 != d) {
        cout << "solution 3: d1 != d" << endl;
        return false;
    } else {
        cout << "solution 3 is correct." << endl;
    }

    solution4(source, d1); // embedded crashed here!!!
    if (d1 != d) {
        cout << "solution 4: d1 != d" << endl;
        return false;
    } else {
        cout << "solution 4 is correct." << endl;
    }
Pages: 12