Code Optimisation

I need to identify the problems with the code below and fix it so that the code runs quicker, I'm not sure where to start! The code outputs to a file so obviously the end result needs to output the same file with same content. Thanks in advance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <iostream>
#include <fstream>
#include <math.h>

using namespace std;

int main(int argc, char* argv[]) {

        int nx = 10000;

        int ny = 200;

        int nt = 200;

        double** vi=new double*[nx];

        double** vr=new double*[nx];

        double pi=(4.*atan(1.));

        for(int i=0;i<nx;i++) {

                vi[i]=new double[ny];

                vr[i]=new double[ny];

        }

        for(int i=0;i<nx;i++) {

                for(int j=0;j<ny;j++) {

                        vi[i][j]=double(i*i)*double(j)*sin(pi/double(nx)*double(i));

                        vr[i][j]=0.;
                }
        }


        ofstream fout("data_out_new");

        for(int t=0;t<nt;t++) {

                cout<<"\n"<<t;cout.flush();

                for(int i=0;i<nx;i++) {

                        for(int j=0;j<ny;j++) {

                                if(i>0&&i<nx-1&&j>0&&j<ny-1) {

                                        vr[i][j]=(vi[i+1][j]+vi[i-1][j]+vi[i][j-1]+vi[i][j+1])/4.;

                                } else if(i==0&&i<nx-1&&j>0&&j<ny-1) {

                                        vr[i][j]=(vi[i+1][j]+10.+vi[i][j-1]+vi[i][j+1])/4.;

                                } else if(i>0&&i==nx-1&&j>0&&j<ny-1) {

                                        vr[i][j]=(5.+vi[i-1][j]+vi[i][j-1]+vi[i][j+1])/4.;

                                } else if(i>0&&i<nx-1&&j==0&&j<ny-1) {

                                        vr[i][j]=(vi[i+1][j]+vi[i-1][j]+15.45+vi[i][j+1])/4.;

                                } else if(i>0&&i<nx-1&&j>0&&j==ny-1) {

                                        vr[i][j]=(vi[i+1][j]+vi[i-1][j]+vi[i][j-1]-6.7)/4.;

                                }
                        }
                }

                for(int i=0;i<nx;i++) {

                        for(int j=0;j<ny;j++) {

                                if(fabs(fabs(vr[i][j])-fabs(vi[i][j]))<1e-2) fout<<"\n"<<t<<" "<<i<<" "<<j<<" "<<fabs(vi[i][j])<<" "<<fabs(vr[i][j]);

                        }
                }

                for(int i=0;i<nx;i++) {

                        for(int j=0;j<ny;j++) vi[i][j]=vi[i][j]/2.+vr[i][j]/2.;

                }
        }
}
Last edited on
Remove line 44, cout<<"\n"<<t;cout.flush();

That will slice a lot of time off.
get one block of memory at the top and pull pointers out of it instead of calling new so much.
im very new to c++, how would I go about doing that?
how much total memory do you new?
It looks like (I only glanced, double check me) a pair of 10000 X 200. So 10000X200*2 doubles...

double * mem = new double[10000*200*2];

use that in a little memory manager function

double * new2(double* mem, int howmany)
{
static int index = 0;
double * dp = &mem[index];
index += howmany;
return dp;
}

...
for(int i=0;i<nx;i++)
{
vi[i]=new2(mem,200);
vr[i]=new2(mem,200);
}

That should be close. Doing this off the cuff so I havent debugged it, but hopefully you see the technique... ideally you could peel the pointers off a vector and get rid of the new

new is sluggish. this should net you some notable time savings, and it makes deallocation much easier.
Last edited on
After these easy wins, no more guessing. The only way to know for sure is to measure. To profile your code. If you're using VS community edition, it comes with a surprisingly solid performance profiler.
Agreed. There are a couple of microscopic items that might help a small bit..

the multiple fabs loop, can that be re-written to use only one fabs OR can you square everything cheaper? I dunno ... fabs used to be slower, but that was an eternity ago. You can also union hack some bit logic on doubles, but this is generally frowned upon in every possible way... if the fabs loop shows up as a problem on the profiler, we can look at it if you really want to. Its not pretty.

can you combine all the loops into one?
this may or may not be doable in all places, but the last 2 look like good candidates, same iteration over the same data, just doing different stuff.

finally, that if/else group looks suspect. It looks like there are too many comparisons and too many conditions.
consider a form of
thing = default
if(cond)
thing = x
else if (other)
thing = y

the assignment is cheaper than the comparisons, so you can eliminate one grouping, whichever is most likely to occur if you know it. And the conditions.. if you string them the other way, can you manage a

if (x < value)
...
else if (x < next value) //its already bigger than first value, no need to check that here

format?








How much faster do you need it? How long does it take now?

Is this code any faster? Remember to compile with maximum optimization (which you should also do with your code).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;

int main(void) {
    int nx = 10000, ny = 200, nt = 200;
    double pi = 4 * atan(1);

    double **vi = new double*[nx];
    double **vr = new double*[nx];
    vi[0] = new double [nx * ny];
    vr[0] = new double [nx * ny];
    for(int i = 1; i < nx; i++) {
        vi[i] = vi[i - 1] + 1;
        vr[i] = vr[i - 1] + 1;
    }

    for (int i = 0; i < nx; i++) {
        double x = sin(pi / nx * i) * i * i;
        for (int j = 0; j < ny; j++) {
            vi[i][j] = x * j;
            vr[i][j] = 0;
        }
    }

    ofstream fout("data_out_new");

    for (int t = 0; t < nt; t++) {
        cout << '\n' << t;

        for (int j = 1; j < ny - 1; j++) {
            vr[0][j] = (vi[1][j] + 10. + vi[0][j-1] + vi[0][j+1]) / 4;
            vr[nx-1][j] = (5. + vi[nx-2][j] + vi[nx-1][j-1] + vi[nx-1][j+1]) / 4;
        }
        for (int i = 1; i < nx - 1; i++) {
            vr[i][0] = (vi[i+1][0] + vi[i-1][0] + 15.45 + vi[i][1]) / 4;
            vr[i][ny-1] = (vi[i+1][ny-1] + vi[i-1][ny-1] + vi[i][ny-2] - 6.7) / 4;
            for (int j = 1; j < ny - 1; j++)
                vr[i][j] = (vi[i+1][j] + vi[i-1][j] + vi[i][j-1] + vi[i][j+1]) / 4;
        }

        for (int i = 0; i < nx; i++)
            for (int j = 0; j < ny; j++)
                if (fabs(vr[i][j] - vi[i][j]) < 1e-2)
                    fout << '\n' << t << ' ' << i << ' ' << j << ' '
                         << fabs(vi[i][j]) << ' ' << fabs(vr[i][j]);

        for (int i = 0; i < nx; i++)
            for (int j = 0; j < ny; j++)
                vi[i][j] = (vi[i][j] + vr[i][j]) / 2;
    }

    return 0;
}

Last edited on
Okay. I've just run the code I rewrote above. I see now that your algorithm probably needs to be sped up at least 100 (if not 1000 or more) times. That's probably not going to happen.

You probably need a new algorithm. What is it that you are trying to compute? If there's a link to a description, post it.
cout << '\n' << t;
Seriously, when I gave it a quick profile OVER HALF the run-time was being spent on cout.
Most of the time is spent writing to file.

You're only solving Laplace's equation by Jacobi's method - just write the final solution.

You are also under-relaxing it (line 85). This equation is so smooth you can over-relax it: go google "point-successive over-relaxation". There are also much better algorithms for this equation (alternating-direction-implicit): you'll notice it when you have a 10000 by 200 array: your method only propagates changes a distance of 1 per t iteration, so they take an age to propagate to the boundaries.

Use a better algorithm and stop writing intermediate results to file.

Given the potential size of i, line 33 also looks "unusual". It is also just setting initial conditions for something that will eventually take boundary values 10, 5, 15.45 or -6.7 (lines 56, 60, 64, 68) so looks rather unlikely. Are you sure you have coded correctly?
Last edited on
I need to identify the problems with the code

Did you write the original code and were told it has problems?
OR
Was this code given to you with request to "make it right"?


There is new without delete, and stylistic issues like variable names that aren't necessarily the most readable. Repeated/redundant code that might have no correctness or performance effect, but at least affects readability. Zero-initialization of vr does not require assignments in modern C++. It is laborous to check whether all border cases are handled correctly.
By far the biggest improvement I found was in turning on optimizations (7.55s to 3.16s).

I also tried the following with the indicated differences. Note that sometimes the changes don't reflect previous optimizations
- don't flush cout (7.55s -> 7.6s)
- Use fprintf instead of C++ streams (3.16s -> 3.22s)
- Functions (about the same)
- Removing redundant conditions (e.g., i==0&&i<nx-1 -> i==0 (3.22s -> 3.07s)

I profiled the code after breaking the main part into functions compute(), output() and update() corresponding to the 3 loops at lines 46, 74, & 83 in the OP. The results were:
46.4% in compute()
25.9% in output()
27.6% in update()
No big surprise there.

Do 5 loops in compute() rather than 5 conditions in 1 loop (similar to what tpb did (3.07s -> 2.88s)

Change update to do (vr+vi)/2 instead of vr/2+vi/2: (2.88s -> 2.83s)

Use a file scope arrays: 2.83s -> 2.50s

With all the above optimizations that helped, I get a runtime of 2.2s
@lastchance Off-topic, but: Impressive that you knew what algorithm it was just by the code. I guess the sin(pi/double(nx) ...) was a big hint, but it's been a while since I looked at that (back in my linear algebra class).
Hi @Ganado. It's actually line 52 that gives away the algorithm. Value at a point is the average of those around. vr[] is the (temporary) update value. Once the whole iteration is complete it is put back in vi[] on line 85 (in this case, unnecessarily using an effective under-relaxation factor of 0.5, which would definitely slow down convergence). A more common approach would be Gauss-Seidel, which would just update vi[] in situ and has half the memory requirements.

This is weird code. The boundary values are the numbers in that long sequence of else if's. The usual technique would be to set those as the edges of the array and then just loop over internal elements. The lines with sine() in are just initialising the array - bizarrely here, because the values are nowhere near the average boundary value. This looks suspiciously like code cobbled together from multiple sources without real understanding.

What's being written to file on line 78 is completely pointless. Something is clearly awry here. After all iterations are complete write the solution to file - not a fairly arbitrary collection of the smaller residual errors.

Never mind, I had fun coding it with Gauss-Seidel and then (my preferred) ADI for myself yesterday. Let's see if the OP can enlighten us as to the true application for the code. If I felt I was solving the actual problem then I'd post some code for others to criticise, but as I'm not convinced the OP is clear over what he/she is solving I'll hold off for clarification.
Last edited on
Topic archived. No new replies allowed.