Valgrind: segmentation fault error

Pages: 12
fftw_malloc() allocates the buffer with alignment suitable for SIMD, which FFTW can make use of.
that negatively impact the propagation and accuracy of GPS signals.

interesting ... for a while (and many years ago) they were looking at *using* generated plasma/wig fields to steer or reduce air resists and such in planes (it takes more energy than made sense back then, dunno if it ever got anywhere). I don't know that anyone thought about gps reception since it was never practical when I was involved...
@lastchance

Why don't you simply use std::vector rather than all those dynamic arrays? Then you will never have to use a malloc or a free (which will reduce the line count in your code by 20% or so to start with, and the likelihood of bugs by a similar amount).

Yeah, I am starting to look into this. Probably will save me a lot of pain.


BTW, there is no point in timing your code with the -g option set (which puts all the debug information in, and turns off optimisation). When you do come to timing, turn -O3 on.


Thanks! I am doing this now.

@jonnin

interesting ... for a while (and many years ago) they were looking at *using* generated plasma/wig fields to steer or reduce air resists and such in planes (it takes more energy than made sense back then, dunno if it ever got anywhere). I don't know that anyone thought about gps reception since it was never practical when I was involved...

Wow that's interesting that you were involved. Yeah there's so much plasma/atmospheric research out there now concerned with GPS accuracy, especially within the military. I am obviously just starting into this field, but it's really fascinating and pretty cool.

yea military R&D decades ago. I was mostly on the sidelines for the plasma side, helped out with a couple of computations and some animations, minimal actual contributions.

If they ever get the ability to project it along a surface consistently without hauling a city sized power supply .. it will be a game changer for commercial aircraft and maybe land/watercraft too. The fuel saved and costs cut would be astonishing. But its like supercomputing at room temp... we know it would be nice to have, but can't seem to get there ..
Last edited on
ne[j + ny*i] = (a * tanh(b*(XX[j + ny*i] + c)) + d + a2 * tanh(b*(XX[j + ny*i] + c2)) + d2 + .02*cos(4*M_PI * YY[j + ny*i]/Ly) *( exp( - bg * pow(XX[j + ny*i] - xg,2)) + exp(-bg* pow( XX[j + ny*i] - Lx + xg,2) ) ) ) * 1.E11;


Since time is a major issue for you, realize that pow(x, 2) is much slower than x*x. So, I would consider something like this:

1
2
3
4
5
6
7
int index = j + ny*i;
double xxValue1 = XX[index ] - xg;
double xxValue2 = XX[index ] - Lx + xg;
ne[index ] =  (a * tanh(b*(XX[index ] + c)) + d +
                 a2 * tanh(b*(XX[index ] + c2)) + d2 +
                 .02*cos(4*M_PI * YY[index ]/Ly) *
                 ( exp( - bg * xxValue1 * xxValue1 ) + exp(-bg * xxValue2 * xxValue2  ) ) ) * 1.E11;

@jonnin

yea military R&D decades ago. I was mostly on the sidelines for the plasma side, helped out with a couple of computations and some animations, minimal actual contributions.

If they ever get the ability to project it along a surface consistently without hauling a city sized power supply .. it will be a game changer for commercial aircraft and maybe land/watercraft too. The fuel saved and costs cut would be astonishing. But its like supercomputing at room temp... we know it would be nice to have, but can't seem to get there ..


man that's awesome! this is exactly what I am interested in and just started learning about. There are a lot of possibilities in this field that I keep getting introduced to.

@doug4

Since time is a major issue for you, realize that pow(x, 2) is much slower than x*x.


I read about this and started implementing it in my code just not sure yet if it will make things considerably faster or not. Thanks!

I also would like to apologize for not keeping up with my posts here. Being a student with multiple summer jobs is exhausting unfortunately. I also started using the profiler "gprof", not sure if people here are aware of it. But, helps a lot with how or where to start in order to make things run faster. I might add a new thread with the results I got and my attempts. Thanks again everyone!
@helios
Also, I have been following with your git hub updates and will read through the code and run both for comparisons. Thanks a lot, I am trying to learn how to write things more efficiently and this will help a lot.
Oops! I deleted my local copy about an hour ago. I fucked up the refactor somewhere and ran into consistent numerical instability issues. My mistake made the whole thing much more effort than I originally anticipated, so I gave up because I thought you weren't really interested (plus I had other things to work on).
I've pushed the latest version I had in my local Git server. It doesn't work, but I think it conveys fairly well how (IMO) you should have structured the code to avoid common resource management errors.

Main changes:
* All matrices are now instances of a Matrix<T> class, which always correctly initializes and cleans up memory.
* Pairs of matrices of doubles were refactored into instances of Matrix<point2d>.
* Pairs of matrices of complex numbers were refactored into instances of Matrix<complex2d> (I guess they'd be more correctly named quaternions).
* Fourier-real conversion functions are type-safe and take and return the appropriate type instances of Matrix<T>. Also, they leverage FFTW to perform FFTs of matrices of points simultaneously.

Changes I would have implemented if I had managed to make the thing run correctly:
* FFTW plans are constantly created and destroyed, even though they operate on matrices with only a few possible combinations of parameters. I haven't timed it but I suspect, since FFTW gives an interface so separate plan-making from FFT computation, that making all those plans consumes a sizeable portion of the run time. I was going to add a plan manager that would return a previously-created plan that matched the parameters the function needed. The function would return the plan to the manager when it was done with it.
* Overall, the code is very difficult to parallelize. This is inherent to the problem being solved and not something that can be fixed by just restructuring the code. IIRC there were possibly two independent streams of computations inside the main iteration loop which would be susceptible to parallelization for a modest speed increase, but that's about it.

Where I fucked up: I underestimated how difficult it would be to locate any mistakes I might make. I tried to refactor the whole thing at once instead of doing it in small increments, so when I realized I was getting NaNs, I had the entire codebase to debug. Had I performed a more gradual refactor I would've been able to more easily detect the source of the error and correct it because the places where that error might have hidden would have been much fewer. On the other hand, this might have required a conversion function to convert between the old matrix format and the new one.

Hope this helps.
Last edited on
@helios
Thanks a lot! I have your code and have been poking it for a while. Thanks again and apologies for taking some of your important time. Obviously, I am aware you have other priorities but I greatly appreciate this.
No worries. Sorry for making you deal with my broken code.
Topic archived. No new replies allowed.
Pages: 12