Memory alignment

I need to understand memory alignment thingy, to be able to use SSE instructions which needs data to be 16 bytes aligned.

Say i have:

1
2
3
4
5
6
7
8
9
10
11
struct Vec3
{
    float x, y, z;
};

__declspec(align(16))
struct AABB // axis aligned bounding box
{
    Vec3 center;
    Vec3 extent;
};


Q1:
I need to put __declspec(align(16)) in front of my struct if i want to create AABBs on the stack?

 
AABB aaBoxes[7];


Q2:
Now address of each aaBoxes is divisible by 16?

1
2
3
&aaBoxes[3] % 16 == 0
...
&aaBoxes[5] % 16 == 0


Q3:
There is a 8 bytes dummy data padded to AABB struct?

Q4:
If i want to create AABBs on the heap only i need to use _aligned_malloc function? So i don't need __declspec(align(16)) thingy in front of my struct in this case?

Thank you for your time.
Q1: I take it you're using a Microsoft compiler with that __declspec thing, but I've only used #pragma. If it does what I think it does, then you're doing what you want to do.

Q2: It sets the boundary at which records start, not specifically the overall size of the structure. To confirm it, you'd need to check the addresses of the members.

Q3: Dunno.

Q4: I thought malloc was already aligned. I don't see why you need that strange variant.
http://pubs.opengroup.org/onlinepubs/007908799/xsh/malloc.html
http://msdn.microsoft.com/en-gb/library/6ewkz86d(v=vs.80).aspx
Q1. Yes i use M$ compiler

Q4.

I have segfault in "debug mode" unless i use aligned version, it somehow passes in "release mode". Test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <xmmintrin.h>
//#define USE_ALIGNED

#ifdef USE_ALIGNED
    #define TEST_MALLOC (float*)_aligned_malloc(3 * sizeof(float), 16)
    #define TEST_FREE(X) _aligned_free(X)
#else
    #define TEST_MALLOC (float*)malloc(3*sizeof(float))
    #define TEST_FREE(X) free(X)
#endif

int main () 
{
	float* some_data = TEST_MALLOC ;
	some_data[0] = 0.1f;
	some_data[1] = 0.2f;
	some_data[2] = 0.3f;
	__m128 mdata = _mm_load_ps(some_data);

	float* some_data2 = TEST_MALLOC ;

	_mm_store_ps(some_data2, mdata);

	TEST_FREE(some_data);
	TEST_FREE(some_data2);

    return 0;
}


I need to be sure that i am using the "right thing" for the job.
closed account (zb0S216C)
morando wrote:
"I need to put __declspec(align(16)) in front of my struct if i want to create AABBs on the stack?"

No. A structure can be aligned to any boundary and the CPU won't care (but the compiler will align the structure to an even boundary anyway). However, depending on the CPU, accessing misaligned addresses can cause various issues, such as hardware exceptions. Regardless of alignment, an array of "AABB" objects can be created without too much trouble from the compiler (except a warning perhaps). If you want to explicitly specify the boundary to which all data-members are aligned then you can use any of the following specifiers:

1) Microsoft Compiler: "__declspec( align( X ) )"
2) GNU Compiler: "__attribute__( aligned( X ) )"
3) C++11: "alignas( X )" (now the standard way of aligning data-structures)

morando wrote:
"Now address of each aaBoxes is divisible by 16?"

The initial address of each data-member will be aligned to a 16-byte boundary, but that does not mean the structure size will be divisible by 16. Note that the compiler can still insert additional alignment bytes to the end of the structure and between each data-member (so long as the data-members remain aligned) if it wants, further increasing the size of a structure.

morando wrote:
"There is a 8 bytes dummy data padded to AABB struct?"

Can you expand on this.

morando wrote:
"If i want to create AABBs on the heap only i need to use _aligned_malloc function?"

You don't have to use the "_aligned_malloc( )" function as "std::malloc( )" is required to return a pointer which is aligned; usually to a 16-byte boundary. It's possible that "std::malloc( )" invokes "_aligned_malloc( )" internally anyway.

morando wrote:
"So i don't need __declspec(align(16)) thingy in front of my struct in this case?"

Yes you need the "__declspec( )" specifier to tell the compiler that each data-member must be aligned to a X-byte boundary. Removing the specifier will not guarantee the data-member's alignment to the boundary of your choice, but the compiler will choose the boundary that best fits its [the compiler's] optimisation scheme if no explicit alignment boundary is specified by the programmer.

Wazzak
Last edited on
I am having hard time to understand. Please treat me like a baby :)

You said

Yes you need the "__declspec( )" specifier to tell the compiler that each data-member must be aligned to a X-byte boundary


If i use my example above:
1
2
3
4
5
6
7
8
9
10
11
struct Vec3
{
    float x, y, z;
};

__declspec(align(16))
struct AABB // axis aligned bounding box
{
    Vec3 center;
    Vec3 extent;
};


it is actually this:
1
2
3
4
5
6
7
8
__declspec(align(16))
struct AABB // axis aligned bounding box
{
    Vec3 center;
    float dummy1;
    Vec3 extent;
    float dummy2;
};

since there is 12 bytes for Vec3 right?
Thats what i meant by "8 bytes dummy data".

I said

I have segfault in "debug mode" with malloc


Thank you for your time.
Last edited on
closed account (zb0S216C)
"Vec3" isn't aligned to a 16-byte boundary so the compiler will choose a boundary for you, which may not be 16-bytes. You must explicitly tell the compiler that "Vec3", too, must be 16-byte aligned. Also, those dummy data-members are not doing anything useful, so you may as well remove them.

As for your segmentation-fault, "std::malloc( )" is guaranteed to return an aligned pointer, but the boundary to which the allocated memory is aligned is system-specific; therefore, it's not safe to assume memory allocated with "std::malloc( )" is 16-byte aligned. What you could do, with a little work, it allocate the memory and then find the first 16-byte aligned address within the allocated block yourself, like so:

1
2
3
4
5
void *Memory_( std::malloc( ( 3 * sizeof( float ) ) + 16 ); // 16 is the alignment

float *AlignedMemory_( static_cast< float * >( ( char * )Memory_ + ( ( uintptr_t )Memory_ % 16u ) ) );

std::free( Memory_ );

I haven't tested this code, so your mileage may vary wildly. Note that "uintptr_t" is a C++11 addition so your current compiler may not define it. The code is pretty simple: enough memory is allocated to store at least 3 "float"s plus an additional 16 bytes. The additional bytes are used to ensure that the first "float" can be safely aligned to a 16-byte boundary. Then, the initial address pointed-to by "Memory_" is rounded up to the next 16-byte boundary where the first "float" will be stored. We keep "Memory_" around so that we know where the allocated memory began.

Wazzak
Last edited on
1. So i better off using this:
1
2
3
4
5
6
7
8
9
10
11
__declspec(align(16))
struct Vec3
{
    float x, y, z;
};

struct AABB // axis aligned bounding box
{
    Vec3 center;
    Vec3 extent;
};


instead:
1
2
3
4
5
6
7
8
9
10
11
struct Vec3
{
    float x, y, z;
};

__declspec(align(16))
struct AABB // axis aligned bounding box
{
    Vec3 center;
    Vec3 extent;
};


or need for both:
1
2
3
4
5
6
7
8
9
10
11
12
__declspec(align(16))
struct Vec3
{
    float x, y, z;
};

__declspec(align(16))
struct AABB // axis aligned bounding box
{
    Vec3 center;
    Vec3 extent;
};


?

2.

Also, those dummy data-members are not doing anything useful, so you may as well remove them.


I didn't add them, i am simply asking if compiler is adding that if i use __declspec(align(16))?

3. No disrespect but why would i use that "hacky" way with malloc if _aligned_malloc does the same job? Does it?
Last edited on
closed account (zb0S216C)
1) Align only the "Vec3" structure. When "Vec3" is in-lined inside of "AABB", all data-members of both "AABB::center" and "AABB::extent" will be 16-byte aligned.

morando wrote:
"i am simply asking if compiler is adding that if i use __declspec(align(16))?"

Yes, the compiler will add additional padding-bytes to align each data-member of the structure. Though, the amount of bytes requiref to align each data-member to a specific boundary depends on the size of each data-member and the selected boundary size.

morando wrote:
"No disrespect but why would i use that "hacky" way with malloc if _aligned_malloc does the same job? Does it?"

It may seem like a strange to acquire an aligned address (I always align my pointers that way), but it's actually quite common before "std::align( )" came along. Also, "_aligned_malloc( )" is Microsoft specific and is not available on any other compiler.

Wazzak
Last edited on
I think its better for me to use alignment on AABB only because i reuse Vec3 struct in other places that doesn't benefit from 16 bytes alignment, and it might produce unwanted behavior like with vertex formats and memcpy-ing data:
1
2
3
4
5
6
7
8
9
struct Vertex
{
    Vec3 pos;
    Vec3 normal;
}; // 24 bytes

Vertex* pVertices;
mesh->LockVB(&pVertices);
...


Thank you.
Last edited on
closed account (zb0S216C)
morando wrote:
"I think its better for me to use alignment on AABB only because i reuse Vec3 struct in other places that doesn't benefit from 16 bytes alignment"

You should have said that at the beginning. Anyway, you could create a variation of "Vec3" that's aligned to a 16-byte boundary, like so:

1
2
3
4
5
6
7
8
9
struct Vec3
{
  float x, y, z;
};

__declspec( align( 16 ) ) struct Vec3Align16
{
  float x, y, z;
};

Though, I don't know if this is acceptable for your project/program.

Wazzak
Topic archived. No new replies allowed.