shamefull Java memory leak

I'm working on an assignment which requires parsing a very large text file. Switching from using scanners, next(), nextInt(), etc, to nextLine(), and then parsing the lines myself resulted in almost double the memory consumption of the entire program (almost 200 extra Mb).

So I tracked down the issue, and it turns out that String.substring() is the cause.

The fix is this,
1
2
3
4
5
6
String line;
String s;
//strangely as subString is supposed to return a new String anyways
s = new String(line.subString(a, b));
//instead of
s = line.subString(a,b);


And as bad as it is that the Java standard API has such an egregious memory leak issue, what's worse is that it's not even documented in Java documentation; it's not even incorrect usage according to the standard.

It really makes you think about the consequences of a language where you aren't controlling the memory usage.

And what's especially shameful is that this bug was reported in 2001. According to their report in their bugDataBase, they decided on a fix as of 2012, 11 years later. However, I'm using javaSE-1.7.0_21 and the issue has definitely not been solved for this implementation.
Last edited on
closed account (1yR4jE8b)
Which version of Java are you using? I seem to recall some kind of String/substring memory leak getting fixed in an update a while back.
1.7.0_21
Last edited on
closed account (1yR4jE8b)
[removed because updated previous post]
Last edited on
> And as bad as it is that the Java standard API has such an egregious memory leak issue,
> what's worse is that it's not even documented in Java documentation

AFAIK, even if it may not be documented, this has been fairly widely known.
https://www.google.com/search?q=Java+substring+memory+leak
gives me 44,100 results.

I remember reading somewhere that this has been fixed in Java 7; that String.subString implementation has been changed to behave as it does in C++; return a new String (instead of a tuple <reference to containing string, offset, count> ).

closed account (1yR4jE8b)
This actually seems like a new bug, after checking the source code in the standard library for the methods use everything seems to check out at first glance, it's probably a new bug.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

// call this constructor

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

// calls this method in java.util.Arrays

public static char[] copyOfRange(char[] original, int from, int to) {
    int newLength = to - from;
    if (newLength < 0)
        throw new IllegalArgumentException(from + " > " + to);
    char[] copy = new char[newLength];
    System.arraycopy(original, from, copy, 0,
                     Math.min(original.length - from, newLength));
    return copy;
}


He should report it.

Here is what i have found after some more experimentation:

Not using new consistently results in between 370 to 390 MB total memory use.

Using new usually results in between 190 to 210 MB, however, about 1 out of 6 times, it end up still using 370 to 390 MB.

Not using new, but adding a print statement for ever scanned line consistently results in about 210 - 250 MB total usage.

Strange. I guess it is probably not the same bug. I guess it's probably the garbage collector deciding if it doesn't want to collect yet or not. The memory is released when the application is terminated, but if it does retain an extra 190 or so MB at the start, it holds it for the duration of the programs execution.
Last edited on
Topic archived. No new replies allowed.