Dear all,
I am trying to understand the C++ code I am working with gives out of memory errors. This is a scientific code with several flag variables to turn on/off a bunch of code functionalities. The code works fine when a couple of functions are turned off. However, when these routines are active, it causes 'Out of memory' situations....
Error file created by Qsub, says
Exit status : -4
job terminated due to one or more nodes running out of memory. The function I am talking about used to work fine until I made some additions. I basically created some pointers, intialize to NULL, create a memory chunk to associate with it, store a quantity of interest in it and later delete []*p
I am trying hard to figure out the source of the problem. I wonder what is causing it.. I believe its some C++ programming error (which I am overlooking due to my inexperience with C++). Is there a way to figure out what the bug is .... where it is or how to resolve it.
Some thoughts that ran my mind,
- use try{ } catch {}
- Run some memory program to track the memory usage in the system (in realtime)
- Any other efficient way of debugging a MPI/C++ code for such situations.
- I Read about something on stacks and heaps and how memory is stored... What the safest way to declare a 2D-array, 1D-array on the fly... pointer based or array definition based..??
Have you done some estimations of how much memory you're actually allocating? (I've seen people who tried to allocate 10+TB of RAM, because they haven't done the maths)
When you submit the job, have you requested enough memory? There may be an argument to qsub to specify a minimum amount of memory per node - ask your sysadmin.
Can you create some test code that will run your functions on a single node and then use a malloc logging library and/or debugger of your choice to make sure the memory allocation/deallocation is correct.