I know that c perform faster than c ++ |
There are many situations where C++ performs faster than C. The canonical example is std::sort() vs. qsort()
do you think useing 1d array as a 2d array will be better or not |
It is how matrices and higher-dimensional classes are implemented. The underlying data structure is almost always a 1D vector, 1D valarray, or some custom 1D indexed object.
Now, comparing your C program and your C++ program "3rd definition", they are indeed pretty much the same: the only real difference is that your C example allocates the arrays in static data section, while your C++ example uses main() function stack.
Given enough stack, this indeed should not matter except that in stack-allocated case, the compiler knows that clock() (or, in your case, start_calculate_MFLOPS() and stop_calculate_MFLOPS()) cannot access the matrices and it is free to move the calls to clock() with respect to the rest of the code in main(). As a sanity check, I timed the total runtime of each program using external tools, it wasn't very different from self-reported clock()/clock() time, but it explains slight preference to C++ in my timing below.
Here are exactly as compiled tests that I just ran
I increased your 1000x1000 to 2000x2000 because it's way too fast on Intel (with Intel compiler) otherwise
I changed you dummy trick to an honest use of the result because one of my compilers (IBM XL) saw right through the trick.
My C test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
#include <stdio.h>
#include <time.h>
#define INDEX 2000
float matrixa[INDEX][INDEX], matrixb[INDEX][INDEX], mresult[INDEX][INDEX];
int main()
{
/* Initialize the Matrix arrays */
for (int i = 0; i < INDEX * INDEX; i++) {
mresult[0][i] = 0.0f;
matrixa[0][i] = matrixb[0][i] = 10.01f;// rand() * (float) 1.1;
}
clock_t time_start = clock();
/* Matrix-Matrix multiply */
for (int i = 0; i < INDEX; i++)
for (int j = 0; j < INDEX; j++)
for (int k = 0; k < INDEX; k++)
mresult[i][j] = mresult[i][j] + matrixa[i][k] * matrixb[k][j];
clock_t time_end = clock();
double result= 0 ;
for (int i = 0; i < INDEX; i ++)
for (int j = 0; j < INDEX; j ++)
result += mresult[i][j];
printf("CPU time %lf sec.\nresult = %lf\n", (time_end - time_start) / (double)CLOCKS_PER_SEC, result);
}
|
My C++ test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
#include <cstdio>
#include <ctime>
const int matrixSize = 2000;
const int loopSteps = 1;
int main()
{
float A[matrixSize][matrixSize];
float B[matrixSize][matrixSize];
float C[matrixSize][matrixSize];
for (int i = 0; i < matrixSize; i += loopSteps) {
for (int j = 0; j < matrixSize; j += loopSteps) {
A[i][j] = B[i][j] = 10.01f;
C[i][j] = 0.0f;
}
}
std::clock_t time_start = std::clock();
/* Matrix-Matrix multiply */
for (int i = 0; i < matrixSize; i++)
for (int j = 0; j < matrixSize; j++)
for (int k = 0; k < matrixSize; k++)
C[i][j] = C[i][j] + A[i][k] * B[k][j];
std::clock_t time_end = std::clock();
double result= 0 ;
for (int i = 0; i < matrixSize; i ++)
for (int j = 0; j < matrixSize; j ++)
result += C[i][j];
std::printf("CPU time %lf sec.\n result = %lf\n", (time_end - time_start) / (double)CLOCKS_PER_SEC, result);
}
|
Results, from 5 runs
Intel platform
Intel icc 13.0.0 1.08 - 1.93 (avg 1.38) sec (-Ofast -xHost)
Intel icpc 13.0.0 0.98 - 1.29 (avg 1.05) sec (-Ofast -xHost)
GNU gcc 4.7.2 18.73 - 20.12 (avg 19.32) sec (-O3 -march=native)
GNU g++ 4.7.2 18.71 - 20.21 (avg 19.11) sec (-O3 -march=native)
IBM platform
IBM XL C 11.1 5.67 - 5.74 (avg 5.72) sec (-O5)
IBM XL C++ 11.1 5.58 - 5.76 (avg 5.65) sec (-O5)
GNU gcc 4.7.2 100+ seconds, I got bored there, sorry
|