Actual memory a program is using

I made a program that uses long lists and keeps on reserving and freeing memory.
So the memory I am actually using varies from a couple of MB to a couple of GB.

What distracts me is, that compiling and running my program on linux,
and watching its memory consumption with either "top" or "ps -eo vsz,cmd"
it looks like the program is always using its peak memory (ie. a couple of GB).

I don't have this problem on OS X for example (there the memory keeps varying).

So, my question is: why does this occur? Is there a way I can see the actual consumption of memory?


Edit: the reason why I'm wondering is that I'm trying to optimize memory usage and it would be good to know when, during the process, memory consumption peaks occur.
Last edited on
Which linux distro are you using?
Are you using Objective-C and relying on the garbage collector?
closed account (D4S8vCM9)
Did you checked out valgrinds massif tool? There is also an KDE app to visualize the data.
I'm on CentOS and using C++.

Well, in OS X, the visualization tool works fine, there are ups and downs in memory usage. But like I said, in CentOS it seems like the the process kind of reserves the peak memory. Its no problem performance-wise, I would just like to understand what's behind it.

I would understand that std::list reserves the peak memory I'm filling it with, but I'm storing pointers of objects that are around 500 bytes big - and I'm taking care of memory leaks, so I'm the deleting and creating the objects while I go.
Memory fragmentation?

Sorry, in manual memory management with paging, this is normal. If you app ever reserves 1 GB of memory in small objects, it is very, very unlikely that memory goes back to the OS, even if your program freed/deleted it. Of course, everything depends on the quality of malloc/free implementation in the glibc. Usually the ones that are fast, suffer heavily from fragmentation problems, and those that do not, are quite slow (= hundreds to thousands of CPU cycles per each malloc/free). A single 4B object may keep the whole 4kB page in your process. This is like 1000x overhead.

What you can do:
1. Try a different malloc allocator with LD_PRELOAD.
2. Try not to better organize your new/deletes. Use region allocators.
3. Switch to Java. Memory usage is much more tunable there (and it is not true that Java memory ovehead is higher than for C++, it is only higher for trivial applications). Seriously, Java is much more memory friendly for programs that sporadically need huge amount of memory but after use need to return it back to the OS.


Are you using Objective-C and relying on the garbage collector?

Garbage collectors suffer from such problems much less than manual general-purpose allocators.
Last edited on
closed account (S6k9GNh0)
Rapidcoder, could you please provide some sort of proof for number 3.
What distracts me is, that compiling and running my program on linux,
and watching its memory consumption with either "top" or "ps -eo vsz,cmd"
it looks like the program is always using its peak memory (ie. a couple of GB).

This isn't the case for me (in Ubuntu 12.01). I use conky to watch memory consumption. It is indicated that allocation and deallocation does give and take what you would expect from and to the OS. In my case, I am watching allocation and deallocation of up to 1 gigabyte at a time.
Last edited on
Proof of what?
I made a few statements there:
1. GC in Java defragments memory and malloc in C++ does not. So by definition GC doesn't suffer from memory fragmentation problems.

2. GC is much more tunable. What parameters can you tune for malloc/new? Nothing. You can eventually use a different implementation with LD_PRELOAD. And with GC you can configure things like memory-performance tradeoff.

3. I use a Java IDE all the time. When I load a few huge projects, the memory usage goes up. If I close them, it drops back to the previous level. I can't see this behaviour on e.g. Firefox. Open 50 tabs, close 50 tabs and memory usage goes over the roof. Also, simply use Firefox for a week without closing - it eats more and more, while Idea keeps the same level of around 200-400 MB (depends on what I actually do).
Last edited on
closed account (S6k9GNh0)
You act like using a GC in C++ is impossible even though its common in large projects.
3. I use a Java IDE all the time. When I load a few huge projects, the memory usage goes up. If I close them, it drops back to the previous level. I can't see this behaviour on e.g. Firefox. Open 50 tabs, close 50 tabs and memory usage goes over the roof. Also, simply use Firefox for a week without closing - it eats more and more, while Idea keeps the same level of around 200-400 MB (depends on what I actually do).


Memory leaks from browsers are actually not so simple.

There are 4 places a browser can leak memory:

The web page

In modern browsers this is fully up to the web developer. Garbage-collected environments don't collect memory that is still being referenced to, and there are a lot of ways to keep referencing memory without meaning to (e.g. create a closure to attach as an event handler and accidentally include a bunch of variables in that closure's scope). A web developer can solve these leaks completely by properly handling variable references in their code. A page reload typically frees up the memory.

Add-ons

If add-ons are also written in a garbage-collected language (like javascript), then they suffer from the same issue. However a page reload will typically not free up this memory, so it appears as if the browser is leaking memory whereas it's actually the add-on developer's fault. To my knowledge this is the biggest cause of browser leaks (which is why the default recommendation is to test whether the leak occurs without add-ons).

Browser engine

All modern browser engines are written in C++. C++ is not garbage-collected, but uses explicit memory allocation instead. If developers allocate memory and then forget to deallocate it, the engine leaks memory. To my knowledge all the browser makers do a lot of testing and code review to find and solve these kinds of leaks. It's not 100% fixed, and never will be, but it's not a huge problem anymore.

Non-leaks

Finally there are a range of caching features that mean the browser's process will grow in scope while using it. These aren't leaks, they're intended to optimally make use of available RAM. Typically the memory footprint grows to a maximum and then hovers there.


http://programmers.stackexchange.com/questions/173627/why-do-browsers-leak-memory


The above pattern will leak due to the circular reference created between a DOM node and a JS element.

Since the JScript garbage collector is a mark and sweep GC, you may think that it would handle circular references. And in fact it does. However this circular reference is between the DOM and JS worlds. DOM and JS have separate garbage collectors. Therefore they cannot clean up memory in situations like the above.


http://www.codeproject.com/Articles/12231/Memory-Leakage-in-Internet-Explorer-revisited

It would seam that it's not because browser engines are written in C++ instead of JAVA that they leak or retain memory.

I've interpreted that memory leaks from failing to call delete in C++ is not one of the main problems anymore.

Like you've said that opening 50 pages and closing them causes memory usage to skyrocket, but this may be intentional caching.

It seams that gc, and the programmers interaction with it is a main part of the issue.
Last edited on
All above is true, but memory fragmentation is also a huge problem:

http://pavlovdotnet.wordpress.com/2007/11/10/memory-fragmentation/

It is better now as they switched to Doug-Lea allocator, but the problem still remains, especially if you need to suddenly release a huge amount of memory. Chances the released memory is in huge contiguous blocks of at least page size are very low. That's why even if there are no memory leaks, it is very hard to give memory back to the OS.
> Switch to Java. Memory usage is much more tunable there ... Seriously ...

>> ...
>> It would seam that it's not because browser engines are written in C++ instead of JAVA that they leak or retain memory.

>>> All above is true.

All the above, except the 'switch to Java' rant quoted at the top.
Or else, by now, every major browser engine would have been re-written in Java.


But then, what about this?
> but memory fragmentation is also a huge problem ...
> especially if you need to suddenly release a huge amount of memory
> Chances the released memory is in huge contiguous blocks of at least page size are very low.
> ... it is very hard to give memory back to the OS.... .

Is it? It is fairly trivial to place objects that are to be released together in contiguous chunks of memory. This happens to be something that is done as a matter of routine in a lot of production C++ code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <boost/pool/pool.hpp>
#include <iostream>
#include <random>

struct small_object
{
    int i[8] {0} ;
};

int main()
{
    constexpr std::size_t N = 1024 * 128 ;
    std::cout << N << " * sizeof(small_object) == " << N * sizeof(small_object) << '\n' ;

    {
        boost::pool<> pool(  sizeof(small_object) ) ;

        static small_object* seq[N] ;

        for( auto& p : seq )
        {
            // allocate large number of small_object intances
            // overloading operators new and delete would make this code more elegant.
            p = new ( pool.malloc() ) small_object ;

            // and just to be nasty, allocate another object of a random size
            // which are deliberately leaked; this is just a snippet to prove a point.
            static std::mt19937 rng ;
            static std::uniform_int_distribution<> distr( 4, 64 ) ;
            new char [ distr(rng) ] ;
        }

        std::cout << "memcheck 1: after allocating " << N << " small objects and "
                   << N << " random objects " ;
        std::cin.get() ;

        for( auto& p : seq ) { p->~small_object() ; pool.free(p) ; p = nullptr ; }
   }
    std::cout << "memcheck 2: after deallocating the " << N << " small objects" ;
    std::cin.get() ;
}


A more sophisticated C++11 solution (which would enable the decision to be made late, at the time of releasing memory) would be to use smart pointers in conjunction with move semantics. This is more involved; for instance, if concurrency is involved, atomic operations would be required. Though from a C++ perspective, even this wouldn't merit the epithet: 'it is very hard'.
(This is not a solution that is seen in production C++ code - move semantics came recently, with C++11.)


> It is better now as they switched to Doug-Lea allocator

The default allocator that Firefox uses is jemalloc.
http://www.canonware.com/jemalloc/
Mozilla switched to jemalloc on all platforms some three years or so back, and have continued to use it since then; efforts are now underway to upgrade to jemalloc 3.0.



> It seams that gc, and the programmers interaction with it is a main part of the issue.

That prrogrammers have to deal with multiple garbage collectors doesn't help either.
http://developer.mozilla.org/en/docs/Choosing_the_right_memory_allocator

Is it? It is fairly trivial to place objects that are to be released together in contiguous chunks of memory.


Yes, it is, but it is not trivial to know which objects are to be released together and which not. Things are only so simple in a forum code snippets, not in production code, in which you may have to call through 10 abstractions layers, in 5 different libraries where you have no control over how they manage memory internally. I agree you can reduce fragmentation problems by careful usage of pools and custom region allocators, but it is not fixable completely. If you really pushed very far to fight fragmentation problems, you'd end up with an informally defined, probably quite buggy and slow, half of Java GC.

BTW: smart pointers have higher memory and CPU overhead than pointers in Java, especially in multithreaded code, so I guess this disqualifies them as an "efficient" solution to fragmentation problem.
rapidcoder wrote:
in production code, in which you may have to call through 10 abstractions layers, in 5 different libraries where you have no control over how they manage memory internally

Sounds like they need to revisit their architecture if this became a business-impacting concern.

To give a personal perspective, the production code I deal with includes a dozen or so custom C++ allocators, polished over at least a decade of use, and providing full control and more options that I can remember for the low-level memory management through 50+ layers of abstraction.

Granted, smaller places have smaller needs, but C++ allocators (especially pools) were pretty much everywhere I've been, so suggestions that C++ approach to memory management is inflexible sound at the very least surprising.

rapidcoder wrote:
What parameters can you tune for malloc/new? Nothing.

For me, there are six malloc() models to choose from at runtime: binary tree, red/black, per-thread heap, per-thread pool, pre-allocated buckets, thread-cached, with various fallbacks to other models under configurable conditions. No LD_PRELOADs needed (but they can certainly be used). Not everyone is using the same OS, just like not everyone is using the same JVM, I imagine.

Look ma, see what all I Rapidcoder has taught me in just one week:

1.Often you see code using the prefix increment of an int; all such code should be simply changed into using the postfix increment, which avoids pipeline stall and *is* faster.

2. All memory allocation in all C++ programs use the general purpose malloc and free in the GNU C library. Of course, everything depends on the quality of malloc/free implementation in the glibc. ... This is like 1000x overhead. Switch to Java; memory usage is much more tunable there. Seriously, Java is much more memory friendly for programs that ...

3. (Therefore,) Firefox switched to the Doug-Lea allocator.

4. In production code, in which you may have to call through 10 abstractions layers, in 5 different libraries, the allocation of memory for objects (of the same type) is done in a round-robin manner by these different layers and libraries.

5. There can be only one implementation of a smart pointer in all of C++, and it has higher memory and CPU overhead than pointers in every implementation of Java. Especially in multithreaded code. This disqualifies it (the one and only C++ smart pointer) as an "efficient" solution.


Rapidcoder, I'm extremely grateful. What else is new?
Topic archived. No new replies allowed.