Explaining OutOfMemoryError on Overhead Limit Exceeded

Explaining OutOfMemoryError on Overhead Limit Exceeded

There are several reasons for an application to fail on OutOfMemoryError. Most versions of OutOfMemoryError have meaningful messages such as “Java heap space” or “Requested array size exceeds VM limit.” However, the message “GC overhead limit exceeded” usually puzzles even experienced developers.?

The JVM? throws OOM with this message if an application spends over 98% of its time collecting garbage. Usually, the root cause is that garbage collectors dispose of objects at a lower rate than created, but the JVM manages to wade through.

In this article, we’ll dig deeper into why this might occur and how we can account for this problem.

Reproducing Overhead Limit Exceeded Error

One of the main reasons for exceeding the overhead limit might be the difference between creation and consumption rates. This type of OutOfMemoryError is usually a subtle and hard-to-reproduce problem in the production environment.?

We also might have issues reproducing this problem for educational reasons. The main problem is that we need to have a balance between mutators and collectors. If the creation rate is significantly higher, we immediately hit the heap boundary and get OutOfMemoryError. At the same time, if our garbage collector deals with the garbage, we have lower throughput, but an application manages to slog through.

Another thing to consider is the type of garbage collector we’re using. ParallelGC would be more stable for our case, throwing “OutOfMemoryError: GC overhead limit exceeded.” Because ParallelGC is a stop-the-world collector, decreasing the throughput and grinding the application to a halt is easier. Also, it would be harder to go outside the heap boundaries.?

Let’s consider the following example:

public static void main(String[] args) {
    List<Integer> list = new ArrayList<>();
    int i = 0;
    while (true) {
        list.add(i);
        ++i;
    }
}        

This code would produce “OutOfMemoryError: Java heap space.” The reason is that although we add a single element each time, under the hood, ArrayList allocates increasingly longer arrays. The length doubles every time we hit the limit:

private Object[] grow(int minCapacity) {
    int oldCapacity = elementData.length;
    if (oldCapacity > 0 || elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        int newCapacity = ArraysSupport.newLength(oldCapacity,
          minCapacity - oldCapacity, /* minimum growth */
          oldCapacity >> 1           /* preferred growth */);
        return elementData = Arrays.copyOf(elementData, newCapacity);
    } else {
        return elementData = new Object[Math.max(DEFAULT_CAPACITY, minCapacity)];
    }
}        

Thus, JVM abruptly stops as it cannot allocate enough space. However, getting “OutOfMemoryError: GC overhead limit exceeded” with ArrayList is still possible but requires more setup and testing. It’s easier if we use a LinkedList:

public static void main(String[] args) {
    List<Integer> list = new LinkedList<>();
    int i = 0;
    while (true) {
        list.add(i);
        ++i;
    }
}        

An ArrayList creates an internal array to hold the data. However, when it runs out of space, the default behavior is to double its size, claiming more memory. This way, we approach the heap size in ever-increasing leaps, causing OutOfMemoryError because we cannot allocate enough space.?

At the same time, while working with a LinkedList, we allocate a Node on each iteration and steadily approach the heap’s limit, the garbage collector thrashes, causing “OutOfMemoryError: GC overhead limit exceeded.”

To get OutOfMemoryError faster, we can also decrease the size of the heap. Our VM options might look like this:

-Xmx100m -XX:+UseParallelGC        

Root Causes

In real life, the system, for example, a retail website, might perform well on regular days but has issues on the weekends when the number of customers increases. Thus, request spikes might cause this problem and overwhelm a garbage collector.?

At the same time, we can cause this by implementing finalizers. Additional logic in finalizers prolongs objects’ lifetime and requires two garbage collections on revocation. However, as mentioned previously, both examples need a very tight balance.?

Apart from the message in the error logs, we can recognize this problem by a particular garbage collector behavior. GCeasy helps us to analyze the garbage collector logs, and we might see the following picture:


We can notice a specific pattern of repeated garbage collection cycles. Although it runs often, a garbage collector cannot reclaim space and proceeds with attempts. Seeing this pattern is the first step in identifying the issue.?

1. Plain Old Memory Leak

Our examples used a simple memory leak to replicate this behavior. However, as was mentioned previously, this requires a careful setup. We should adjust the object creation rate, the garbage collector should behave in a certain way, and we should approach the limits of our heap steadily.?

We can analyze a heap dump to get more information about the heap’s state during the failure. The VM option -XX:+HeapDumpOnOutOfMemoryError would create a dump on OutOfMemoryErorr, providing us with valuable information. It’s always a good practice to configure automatic heap dumps as it might help us with troubleshooting and doesn’t cost us much.?

In case of a memory leak, we can analyze a heap dump using HeapHero. We might see a bunch of objects in a heap:


Careful analysis of these objects usually helps to find the problem. After identifying the culprit object, we can track the creation-reclamation process and find the bug in our system. Luckily for us, memory leaks are generally more straightforward to reproduce.

2. Overridden Finalizers

Another possible reason for exceeding the throughput limit is the problem with finalizers. In this case, the problem arises not from taking up the space in the heap but rather because of the difference in production-collection rates.?

HeapHero helps us with this as well. We just need to check a different section:


If we have many objects waiting in the queue, something in our application often prevents a garbage collector from doing its job quickly. Overridden finalizers might be the culprit.?

Unlike a memory leak, we would see many unreachable objects waiting for collection. Thus, we don’t have a memory leak per se, but a slower reclamation rate keeps them in memory. We can think about this situation as a traffic jam: cars are moving, but very slowly.

Often, just looking through the code and implementation of the objects might help us to identify the issue. Additionally, linters and static code analysis tools may help us with the problem. IDEs also can draw our attention to this, as the finalize() method was deprecated starting from Java 9.

3. Slow Threads

Another reason for such a problem might be slow garbage collection threads. Sometimes, JVM picks the threads randomly, so finalizers run in a lower-priority thread, so we spend fewer CPU cycles on garbage collection. To identify such a problem, we need to use additional tools, such as fastThread. This way, we can identify the state and the number of threads working in our application:


However, we don’t have a built-in way to create a thread dump on OutOfMemoryError. Luckily, we can use a yCrash 360° tool to monitor the health of the application during its lifespan:


Technically, we can combine the yCrash 360° tool with -XX:OnOutOfMemoryError, but while the application is dying, sometimes it’s challenging to get meaningful information about it.?

Conclusion

There are several types of OutOfMemoryError. Each of them identifies a different problem in our application. Thus, we need special tools to identify the root cause. Each OutOfMemoryError type requires different approaches and solutions.

yCrash provides various tools to help us identify and resolve the problems we might face with application performance and memory management. Good practices, benchmarking, and monitoring help us to avoid hard-to-debug issues, missing SLAs, and it-worked-in-dev situations.

要查看或添加评论,请登录

yCrash的更多文章

社区洞察

其他会员也浏览了