It would appear that the microsoft compiler does vectorise the floating point moves, but does not vectorise the integer moves.
Note: With a single instruction, we can move four adjacent 32-bit values into a 128-bit register (say, xmm0), and then move those 128-bits to four adjacent 32-bit locations in another single instruction. Both
int and
float are
TriviallyCopyable 32-bit types on the platform in question, and the same set of machine instruction can be used for both.
See:
https://godbolt.org/g/4npW0V
When the copy is vectorised, both
int and
float would take the same amount of time;
TriviallyCopyable 64-bit types would take roughly twice the time and so on.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
|
#include <iostream>
#include <vector>
#include <ctime>
template < typename T > class Point
{
public:
T x, y;
Point() :x(0), y(0) {}
Point( T x, T y ) : x(x), y(y) {}
};
template < typename T > void time_it( std::size_t n )
{
std::vector< Point<T> > arr(n) ;
std::vector<T> xy(2*n) ;
for( std::size_t i = 0 ; i < 2*n ; ++i ) xy[i] = (T)i ;
const auto start = std::clock() ;
for (size_t i = 0, j = 0; i < n; i++)
{
const T x = xy[j++];
const T y = xy[j++];
arr[i] = Point<T>(x, y);
}
const auto end = std::clock() ;
std::cout << (end-start) * 1000.0 / CLOCKS_PER_SEC << " millisecs.\n" ;
}
int main()
{
const size_t n = 9'999'999;
std::cout << " float: " ;
time_it<float>(n) ;
std::cout << " int: " ;
time_it<int>(n) ;
std::cout << "unsigned int: " ;
time_it<int>(n) ;
std::cout << "\n double: " ;
time_it<double>(n) ;
std::cout << " std::size_t: " ;
time_it<std::size_t>(n) ;
std::cout << " long long: " ;
time_it<long long>(n) ;
}
|
clang++
float: 25.469 millisecs.
int: 24.453 millisecs.
unsigned int: 24.224 millisecs.
double: 44.985 millisecs.
std::size_t: 44.815 millisecs.
long long: 44.88 millisecs. |
g++
float: 22.427 millisecs.
int: 22.549 millisecs.
unsigned int: 22.619 millisecs.
double: 45.922 millisecs.
std::size_t: 50.013 millisecs.
long long: 50.109 millisecs |
http://coliru.stacked-crooked.com/a/0038cfc5cfbc024c
The non-vectorised copy (copy 32 bits at a time for 32-bit types) would take longer
than the vectorised copy (copy 128 bits - 4 values - at one go).
Microsoft:
float: 25 millisecs.
int: 97 millisecs.
unsigned int: 116 millisecs.
double: 48 millisecs.
std::size_t: 121 millisecs.
long long: 121 millisecs. |
http://rextester.com/HIFR26998