Parallel MPI code in C++ memory errors

Dear all,
I am trying to understand the C++ code I am working with gives out of memory errors. This is a scientific code with several flag variables to turn on/off a bunch of code functionalities. The code works fine when a couple of functions are turned off. However, when these routines are active, it causes 'Out of memory' situations....

Error file created by Qsub, says
Exit status : -4

job terminated due to one or more nodes running out of memory. The function I am talking about used to work fine until I made some additions. I basically created some pointers, intialize to NULL, create a memory chunk to associate with it, store a quantity of interest in it and later delete []*p

I am trying hard to figure out the source of the problem. I wonder what is causing it.. I believe its some C++ programming error (which I am overlooking due to my inexperience with C++). Is there a way to figure out what the bug is .... where it is or how to resolve it.


Some thoughts that ran my mind,
- use try{ } catch {}
- Run some memory program to track the memory usage in the system (in realtime)
- Any other efficient way of debugging a MPI/C++ code for such situations.

- I Read about something on stacks and heaps and how memory is stored... What the safest way to declare a 2D-array, 1D-array on the fly... pointer based or array definition based..??



Please educate me with your thoughts.

thanks,
RSK

What the safest way to declare a 2D-array, 1D-array on the fly... pointer based or array definition based..??

Neither. Use std::vector and avoid any use of new[] and delete[].
When a program crashes, use the debugger to find the problem.
Last edited on
Have you done some estimations of how much memory you're actually allocating? (I've seen people who tried to allocate 10+TB of RAM, because they haven't done the maths)

When you submit the job, have you requested enough memory? There may be an argument to qsub to specify a minimum amount of memory per node - ask your sysadmin.

Can you create some test code that will run your functions on a single node and then use a malloc logging library and/or debugger of your choice to make sure the memory allocation/deallocation is correct.
Topic archived. No new replies allowed.