Tips on making code run faster?

Hi, this is my first time posting, but I've been teaching myself C++ for a bit and I've created a program which plots the path an electron will take if it's shot into a field filled with stationary electrons. It pretty much works fine, but it essentially involves running a loop, in which the program does some basic algebra based on input, prints the results to a file, and then uses the results as the input for the next loop. The main loop has 10^20th iterations, and is itself inside a loop which runs 50 times.. And i'd like to make the inner loop do a couple more orders of magnitude still.. But it takes half of a day for the program to complete. Are there any basic things I could change about the code which could make the program run faster?

Like, currently, each iteration requires about 100 (quite small) function calls. How much faster would the program be if I condensed that number into something more like 10 or 5? It would be pretty complicated and hard to test so i'd rather not unless it'd help substantially.

Also, I use "fout" to print to the file, and once I think I accidentally told it not to print to file and it took only 4 hours to complete instead of the 12 it usually took. So is there something faster than fout?

And I cant copy paste all the code in here, so is there any other information I can give you that would help?

Sorry for the plethora of questions:p, thank you for any replies
I am by no means an expert, but without seeing your code I would look for any assignment operators doing a deep copy unnecessarily. When possible try passing by reference or using pointers

and example:
1
2
3
void function(int x){}//this created a new variable x and stores whatever value you feed it
void functionRef(int& x){}//this pass by reference tells the program look at whatever variable you feed it
void functionPoint(int * x){}//this takes in an int pointer as it argument 


void functionRef and functionPoint are both faster than void function because they do not create a new copy of int x, instead they point to a specific address of memory.

Let's see the code! Help us help you.
Thank you RadWayne, I'll try that.
But the code's 2000 characters too long to paste in here (and thats with barely any comments, which im sure i should put in).. Is there a way to attach it or something?
K i made it shorter, thank you though Codeez
I tried to add in some comments but theres to much to do all at once, hopefully it still makes some degree of basic sense..
Thank you all


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
#include <iostream>
#include <cmath>
#include<iomanip>
#include<fstream>
#include<string>
#include<cstdlib>
#include<ctime>

using namespace std;

ofstream fout;
ifstream fin;

string file= "ElectrostaticData";

    const long double pi = 3.14159265;
    const long double Ke= 8.98756e9;
    const long double Q1 = 1.60237657e-19;
    const long double Q2 = 1.60237657e-19;
    const long double m = 9.10938291e-31;
    const long double k = Ke*Q1*Q2/(m);


long double particleMatrix[27][2]={
                                {1E-3 , 2.00E-11},
                                {1E-3 , 1.95E-11},
                                {1E-3 , 1.90E-11},
                                {1E-3 , 1.85E-11},
                                {1E-3 , 1.80E-11},
                                {1E-3 , 1.75E-11},
                                {1E-3 , 1.70E-11},
                                {1E-3 , 1.65E-11},
                                {1E-3 , 1.60E-11},
                                {1E-3 , 1.55E-11},
                                {1E-3 , 1.50E-11},
                                {1E-3 , 1E-12},
                                {1E-3 , 0.5E-12},
                                {1E-3 , 0},
                                {1E-3 , -0.5E-12},
                                {1E-3 , -1E-12},
                                {1E-3 , -1.50E-11},
                                {1E-3 , -1.55E-11},
                                {1E-3 , -1.60E-11},
                                {1E-3 , -1.65E-11},
                                {1E-3 , -1.70E-11},
                                {1E-3 , -1.75E-11},
                                {1E-3 , -1.80E-11},
                                {1E-3 , -1.85E-11},
                                {1E-3 , -1.90E-11},
                                {1E-3 , -1.95E-11},
                                {1E-3 , -2.E-11}
                                            };

long double thetaArray [27];
void fThetaArrayGen(long double x, long double y);

long double accelGen(long double x, long double y);

long double fTheta(long double x, long double y);

long double distancesMatrix[27][2];
void distancesMatrixGen(long double x, long double y);

long double accelSum(char xy);

long double accelMagArray[27];
void accelMagArrayGen();
long double accelVectorGen(long double a,long double r);

int mainfunc(long double i7);

int main()
{
    fout.open(file.c_str());

    fout <<"t \tx \ty \tvx \tvy \tax \tay"<<endl;

    int counter=0;
    for (int i=50; i>=0; i--)
    {
            mainfunc(i*1E-9);
    }
    return 0;
}


int mainfunc(long double i7)
{
    srand( time( NULL ) );

    long double dt=1E-10, dt2=1E-10, x=0, y=0, v=3E8, theta1=i7;

    long double vx = v*cos(theta1);
    long double vy = v*sin(theta1);

    distancesMatrixGen(x,y);
    fThetaArrayGen(x, y);

    long double a;
    long double ax=accelSum('x');
    long double ay=accelSum('y');
    int counter=0;

int breaker=0;
for (long double t=0;  ; t+=dt)
    {
        counter++;
        x += (vx*dt)*dt2+(1/2*ax*dt*dt)*dt2*dt2;
        y += (vy*dt)*dt2+(1/2*ay*dt*dt)*dt2*dt2;

        vx += (ax*dt)*dt2;
        vy += (ay*dt)*dt2;

        fThetaArrayGen(x,y);
        distancesMatrixGen(x,y);

        ax=accelSum('x');
        ay=accelSum('y');

        v=sqrt(vx*vx+vy*vy);
        a=sqrt(ax*ax+ay*ay);

        if (x>9.995E-4&&counter%6000==0)
            {
                fout <<t<<"\t"<<x<<"\t"<<y<<"\t"<<vx<<"\t"<<vy<<"\t"<<ax<<"\t"<<ay<<endl;
            }
        if (y>5E-11)//edited, pretty much its "if the electron leaves the area I want it in"
            {
                return 0;
            }

        if (x>5E-3) //if the particle gets to the other side of the field
        {
            return 1;
        }
    }///end for loop
}

void fThetaArrayGen(long double x, long double y)///recalc
{
    for (int i=0;i<27;i++)
    {
        thetaArray[i]=fTheta(distancesMatrix[0][i],distancesMatrix[1][i]);
    }
}

long double accelGen(long double x, long double y)///called func
{
    return k/(x*x+y*y);
}

long double fTheta(long double x, long double y)///called func
{
    return atan(y/x);
}

void distancesMatrixGen(long double x, long double y)///recalc
{
    for (int i=0; i<27; i++)
    {
        distancesMatrix[i][0]=x-particleMatrix[i][0];
        distancesMatrix[i][1]=y-particleMatrix[i][1];
    }
}
long double accelVectorGen(long double a,long double r)
{

    if (r>0)
        {
            return a;
        }
    else if (r<0)
        {
            return -a;
        }
    else
        {
            return 0;
        }
}

long double accelSum(char xy)///recalc
{
    accelMagArrayGen();
    long double sum=0;
    if (xy=='x')
    {
        for (int i=0; i<27; i++)
        {
            sum+=accelVectorGen(abs(cos(thetaArray[i])*accelMagArray[i]),distancesMatrix[i][0]);
        }
    }

    else if (xy=='y')
    {
        long double tempSum=0;
        for (int i2=0; i2<27; i2++)
        {
            sum+=accelVectorGen(abs(sin(thetaArray[i2])*accelMagArray[i2]),distancesMatrix[i2][1]);
        }
    }
    return sum;
}

void accelMagArrayGen()///called func
{
    long double x;
    long double y;

    for (int i=0; i<27; i++)
    {
        x=distancesMatrix[i][0]*1E10;
        y=distancesMatrix[i][1]*1E10;
        accelMagArray[i]=1E20*accelGen(x,y);
    }
}
The only optimization I can think of is to make make the for loop in this function into 2 for-loops:

1
2
3
4
5
6
7
8
void distancesMatrixGen(long double x, long double y)///recalc
{
    for (int i=0; i<27; i++)
    {
        distancesMatrix[i][0]=x-particleMatrix[i][0];
        distancesMatrix[i][1]=y-particleMatrix[i][1];
    }
}


See this SO question for reason why: http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops
Another idea would be to replace all floating point divisions with multiplication, however, there's the possibility that your compiler already magic'd that for you . Even so, how effective the optimization would be is architecture dependent.

EDIT* This source (blog, whatever) claims that replacing divisions with multiplications will take its toll on the accuracy of your calculations... hmmmm.
http://simpletonprogrammer.blogspot.com/2012/01/assumed-optimization-floating-point.html
Last edited on
http://www.agner.org/optimize/
3 great books about optimization (free)
Last edited on
Ok so last night I told the code to start running, hoping itd be done by the end of today or so, but when I got up this morning the computer was off, blinking red lights and beeping, and a google search revealed one of the memory cards somehow died.. is it possible my program did that?
And ok, free is always good:P I'll check those out.
If at all possible I'd get rid of the fout call in a loop, try using something like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
string fout_buf = "";
for (long double t=0;  ; t+=dt)
    {
        counter++;
        x += (vx*dt)*dt2+(1/2*ax*dt*dt)*dt2*dt2;
        y += (vy*dt)*dt2+(1/2*ay*dt*dt)*dt2*dt2;

        vx += (ax*dt)*dt2;
        vy += (ay*dt)*dt2;

        fThetaArrayGen(x,y);
        distancesMatrixGen(x,y);

        ax=accelSum('x');
        ay=accelSum('y');

        v=sqrt(vx*vx+vy*vy);
        a=sqrt(ax*ax+ay*ay);

        if (x>9.995E-4&&counter%6000==0)
            {
                fout_buf  += t+'\t'+x+'\t'+y+'\t'+vx+'\t'+vy+'\t'+ax+'\t'+ay+'\n';
            }
        if (y>5E-11)//edited, pretty much its "if the electron leaves the area I want it in"
            {
                return 0;
            }

        if (x>5E-3) //if the particle gets to the other side of the field
        {
            return 1;
        }
    }///end for loop
    fout << fout_buf;
    fout_buf = "";
}



Re memory card:
As in a Flash memory stick/Camera card, or do you mean RAM?

If it's flash then yes, your writes __COULD__ have done that, if it's RAM then no... some more details on your system and what you mean by memory card could help us answer that one.
Last edited on
> is it possible my program did that?

No, the memory must have been failing anyway. High processor usage might have caused temperature to be high (though not dangerously high) and accelerated the process.


> I accidentally told it not to print to file and it took only 4 hours to complete instead of the 12 it usually took.

Get rid of the endl on every line. Flushing an output stream once per line is a very expensive operation. The output stream is buffered, characters are held in a large buffer in memory, and the buffer would be automatically flushed when it becomes full.

And print a single tab character as a char rather than as a string containing one char.

Just change just this one line, and then measure the performance.
1
2
3
// fout <<t<<"\t"<<x<<"\t"<<y<<"\t"<<vx<<"\t"<<vy<<"\t"<<ax<<"\t"<<ay<<endl;
const char TAB = '\t' ;
fout << t << TAB << x << TAB<< y << TAB << vx << TAB << vy << TAB << ax << TAB << ay << '\n' ;



> each iteration requires about 100 (quite small) function calls.
> How much faster would the program be if I condensed that number into something more like 10 or 5?

If the code is being compiled with optimization enabled, it would make no perceptible difference in performance.
Last edited on
I'll certainly change the "\t"'s, but the endl's are in there because I open the txt files in excel, so I can create a graph of what's happening. The tabs put each output into a different box, and then the endl puts each new iteration on a new line. Is there a way of accomplishing this without using endl? Like would using '\n' fix the problem?
And ok. I'll try ValliusDax's sollution next.
And as for the memory I think I mean RAM. the computer stopped giving the error once I removed one of the memory chip things from inside the pc. It certainly wasn't flash memory, nor the hard drive, so i guess that makes it RAM. Sorry for the non-technical verbage:p
Thank you:)
Last edited on
Just changed out the endls for newlines and it is noticeably faster, it calculated one iteration in probably half the time as previously. But looking again at ValliusDax's advice, making a really really long string and printing that out once as opposed to printing out a whole bunch of little bits of data- There would be somewhere around 2000 lines (~10000 chars) of output per iteration, can strings hold that much?
any data structure can hold as much as information as the amount of memory available to your program which is your entire memory. Even at that, 10000 chars = 10000 bytes == 10kb. That is nothing in today's standards
Yeah it'll handle it, just be careful with where you return.

It's strange that RAM would go bad. Is this a laptop or desktop?

My code is broken on second look, you need to store your return value in a variable, break from the loop and then return.

Like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int return_code = 1;
...
...
 if (x>9.995E-4&&counter%6000==0)
            {
                fout_buf  += t+'\t'+x+'\t'+y+'\t'+vx+'\t'+vy+'\t'+ax+'\t'+ay+'\n';
            }
        if (y>5E-11)//edited, pretty much its "if the electron leaves the area I want it in"
            {
                return_code = 0;
                break;
            }

        if (x>5E-3) //if the particle gets to the other side of the field
        {
            return_code = 1;
            break;
        }
    }///end for loop
    fout << fout_buf;
    fout_buf = "";
    return return_code;
Is there any particular reason why you wish to use long doubles as opposed to ordinary doubles? A double has 15 or 16 significant figures, while a long double has only 2 extra - 17 or 18. Do you really need the extra precision?

Btw, you should never use floating point types in for loops like you have on line 105. There is always a way to calculate how many times the loop should run as an int. OR use a while loop.

With the fTheta function, it just calls a library math function, could you just call the math function direct? If you have lots of calcs, the extra function call can become quite a cost.

Hope all goes well

> making a really really long string and printing that out once
> as opposed to printing out a whole bunch of little bits of data-
> There would be somewhere around 2000 lines (~10000 chars) of output per iteration,
> can strings hold that much?

Yes, a string can hold that much; but if you do that, it will degrade performance. Streams are buffered, and this would be needlessly buffering writes into a buffer. If the stream has a buffer size of say 4K, and you needlessly perform double buffering with a string of size of 10K, the number of flushes of the stream buffer (expensive) would not be any less. Creating and writing into the string first would just become an extra overhead.

If you want to control buffering, adjust the streams buffer size. There is typically a sweet spot for the buffer size on a particular implementation; measure performance. An unnecessarily large buffer size is a bad idea.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <fstream>
#include <iostream>
#include <memory>
#include <chrono>

int main()
{
    const char TAB = '\t' ;
    // values to be written into one line in the file
    // in the actual program, these would be the calculated values
    // volatile, because we do not want any super-smart optimization
    volatile double x[] = { 1.2345, 1.2345, 1.2345,  1.2345, 1.2345, 1.2345, 1.2345 };
    
    using clock = std::chrono::steady_clock ;
    using msecs = std::chrono::milliseconds ;
    using  std::chrono::duration_cast ;

    // for the actual tests, make this larger, say 1024*1024
    const int NLINES = 1024 * 64 ; 

    std::size_t sz = 1024 ;
    for( int i = 0 ; i < 10 ; ++i, sz *= 2 )
    {
        // allocate memory for the stream buffer
        std::unique_ptr< char[] > buf( new char[sz] ) ; 
        std::cout << "buffer size: " << sz ;

        std::ofstream file ; 
        file.rdbuf()->pubsetbuf( buf.get(), sz ) ; // set the buffer
        file.open( "test.txt" ) ; // and then open the file

        auto start = clock::now() ;
        for( int i = 0 ; i < NLINES ; ++i )
        {
            for( auto d : x ) file << d << TAB ;
            file << '\n' ;
        }
        file << std::flush ;
        auto end = clock::now() ;

        std::cout << " elapsed: " << duration_cast<msecs>(end-start).count()
                  << " milliseconds.\n" ;
    }
}

http://coliru.stacked-crooked.com/a/8be78a8962c2deaa

There is a lot of merit in TheIdeasMan's suggestion of using double instead of long double
http://www.cplusplus.com/forum/beginner/112257/#msg613439

And none at all in the claim that removing the fTheta function and calling the math function direct would improve performance in any way. The optimizer knows what you are doing.

EDIT: If you still want to take the roundabout (first data to string and then string to file) route, turn of buffering for the stream with file.rdbuf()->pubsetbuf( 0, 0 ) ; (before any i/o is performed on the stream).
Last edited on
Thank you all, I've been dealing with other computer problems for a bit so I haven't gotten to reply.
It was a desktop but upon taking the RAM cards out and putting them back in, the computer worked again.
I will try TheIdeaMan's suggestion, and I'll try to work through JLBorges's. I'll post back when I get the chance to do so.
Not sure if this is beyond the scope of a beginner, but I think your program would lend itself to multi-threading. This would give you a big boost.

I believe that C++ 2011 has provision for threading in the library, I've not used it yet myself, but perhaps it would be a good advanced projects for you to research.

Topic archived. No new replies allowed.