9 Tips to Reduce Long Garbage Collection Pauses
Automatic Garbage Collection is one of the core features of Java programming language, which facilitates developers to focus on writing business logic code and not worry about de-allocating the objects, which developers had to do in the predecessor language (C, C++). However, this automatic Garbage Collection pauses your application to remove unreferenced objects from memory.?
During this pause period, no customer transactions will be processed. These pauses can range anywhere from a few milliseconds to several seconds (sometimes even minutes). When you reduce this Garbage Collection pause time, your overall application’s throughput and response time improves significantly. Besides that, reducing GC pause time also offers several significant benefits. In this post, I intend to share 9 tips that will help to reduce your GC pause times:
Let’s review these tips and their benefits in this post.
1. Start Tuning from Scratch
Have you looked at the JVM arguments that are configured to your application? When you look at it, the following questions will arise: What are they? What do they do? Are they even relevant? Be advised there are 600+ JVM arguments related to JVM Memory and Garbage Collection. Your application would have gotten its share of arguments over the years. In several applications, contrary to our common belief, these unfamiliar arguments can turn out to be counterproductive, degrading application performance instead of enhancing it. These arguments might have been relevant when your application first went live (5-10 years ago). Since then, your traffic volume would have changed, some of the arguments might have been deprecated, and their default values could have changed. Thus, carrying old JVM arguments can result in counterproductive GC behavior.?
It’s highly recommended to remove all old arguments and start tuning from scratch. JVM itself has many heuristics and intelligence to auto-tune itself. Start with -Xmx (heap size) argument, study the JVM’s performance behavior and then add new JVM arguments.
Here is a real case study from CloudBees, a parent company behind Jenkins, on how they started the GC tuning from scratch and improved GC performance by 3500%.
2. Resize Heap
In most applications, heap size is either under allocated or over allocated. When heap size is under allocated, GCs will run more frequently, resulting in the degradation of the application’s performance.?
Here is a real case study of an insurance application, which was? configured to run with 8gb heap size (-Xmx). This heap size wasn’t sufficient enough to handle the incoming traffic, due to which garbage collector was running back-to-back. As we know, whenever a GC event runs, it pauses the application. Thus, when GC events run back-to-back, pause times were getting stretched and application was becoming unresponsive in the middle of the day. Upon observing this behavior, the heap size was increased from 8GB to 12GB. This change reduced the frequency of GC events and significantly improved the application’s overall availability.
3. Right GC Algorithm
Garbage Collection algorithm plays a pivotal role in influencing the GC pause times. As of now, there are 7 GC algorithms in OpenJDK: Serial GC, Parallel GC, CMS GC, G1 GC, Shenandoah GC, ZGC, Epsilon. This brings the question: ‘How to choose the right GC algorithm for my application?’??
The above flow chart will help you to identify the right GC algorithm for your application. You may also refer to this detailed post which highlights the capabilities, advantages and disadvantages of each GC algorithm.?
Here is a real-world case study of an application, which was used in warehouses to control the robots for shipments. This application was running with the CMS GC algorithm and suffered from long GC pause times of up to 5 minutes. Yes, you read that correctly, it’s 5 minutes, not 5 seconds ??. During this 5-minute window, robots weren’t receiving instructions from the application and a lot of chaos was caused. When the GC algorithm was switched from CMS GC to G1 GC, the pause time instantly dropped from 5 minutes to 2 seconds. This GC algorithm change made a big difference in improving the warehouse’s delivery.
4. Adjust Internal Memory Regions Size
JVM memory has the following internal memory regions:
a. Young Generation
b. Old Generation
c. MetaSpace
d. Others?
You can visit this video post to learn about different JVM memory regions. Changing the internal memory regions size can also result in positive GC pause time improvements. Here is a real case study of an application, which was suffering from 12.5 seconds average GC Pause time. This application’s Young Generation Size was configured at 14.65GB, and Old gen size was also configured at the same 14.65GB. Upon reducing the Young Gen size to 1GB, average GC pause time remarkably got reduced to 138 ms, which is a 98.9% improvement.
5. Tune GC Algorithm Settings
Garbage Collection Pause time is influenced by the specific JVM arguments you configure. As we mentioned in ‘Tip #1 Start Tuning from Scratch’, there are 600+ JVM arguments related to Memory and GC settings. It’s a tedious task for anyone to choose the right arguments from this lengthy poorly documented list. Thus, we have curated less than a handful JVM arguments by each GC algorithm and given them below. Use the arguments pertaining to your GC algorithm and optimize the GC pause time.
6. Address the Causes of GC Events
GC events are triggered due to various causes, such as Allocation Failure, Promotion Failure, Evacuation Failure,… The causes for which GC events are triggered are reported in the GC log file. When you analyze the GC log file using tools like GCeasy, it will present you with a consolidated summary of the causes, as shown in the figure below.
By studying these reasons, you can tune the GC settings accordingly. In fact, the GCeasy tool provides recommendations for GC settings that need to be adjusted based on these GC causes. You can try implementing those recommended settings.
Here is a real case study of an application which was suffering from long GC pauses due to Allocation Failures. Allocation Failures occur when there isn’t sufficient memory in the young generation to create new objects. Thus, the team adjusted the young generation size and saw dramatic reduction in their GC pause time.
7. Disable Explicit GC
Your own application code or third-party libraries/frameworks that are running in your application can invoke the System.gc() API call. When this API is invoked, a Full GC event is triggered in your application. Such explicit GC calls are not advisable as they add needless overhead to the application.?
Consider this scenario: Say your memory got filled up and JVM triggered a GC event, then right after that your application code triggers a System.gc() call. Now two GC events are triggered back-to-back. The second GC event (triggered by the application code) would not reclaim any objects because the first event would have reclaimed all the unreferenced objects. But still the second GC event would have paused your application unnecessarily.
To prevent this unnecessary overhead, you can silence explicit System.gc() calls by passing either one of the following JVM arguments:
a. -XX:+DisableExplicitGC: This JVM argument will silence all System.gc() calls invoked anywhere in your application stack.?
b. -XX:+ExplicitGCInvokesConcurrent: This JVM argument will allow System.gc() call to trigger GC; however, the GC event will run concurrently with the application threads, minimizing the impact on GC pause times and maintaining application responsiveness.
8. Allocate Sufficient System Capacity
Garbage Collection performance can sometimes suffer due to insufficient system-level resources such as threads, CPU, and I/O. GC log analysis tools like GCeasy, identifies these limitations by examining following two patterns in your GC log files:
a. Sys time > User Time: This pattern indicates that the GC event is spending more time on kernel-level operations (system time) compared to executing user-level code. This could be a sign that your application is facing high contention for system resources, which can hinder GC performance. For more details, you can refer to this article.
b. Sys time + User Time > Real Time: This pattern suggests that the combined CPU time (system time plus user time) exceeds the actual elapsed wall-clock time. This discrepancy indicates that the system is overburdened, possibly due to insufficient CPU resources or lack of GC threads. You can find more information about this pattern.
To address these system level limitations, consider taking one of the following actions:
9. Reduce Object Creation rate
There is a famous Chinese proverb in the ‘Art of War’ book: ‘The greatest victory is that which requires no battle’. Similarly, instead of trying to focus on tuning the GC events, it would be more efficient if you can prevent the GC events from running. The amount of time spent in garbage collection is directly proportional to the number of objects created by the application. If the application creates more objects, GC events are triggered more frequently. Conversely, if the application creates fewer objects, fewer GC events will be triggered.
By profiling your application’s memory using tools like HeapHero, you can identify the memory bottlenecks & fix them. Reducing memory consumption will, in turn, reduce the GC impact on your application. However, reducing the object creation rate is a tedious and time-consuming process as it involves studying your application, identifying the bottlenecks, refactoring the code and thoroughly testing it. However, it’s well worth the effort in the long run, as it leads to significant improvements in application performance and more efficient resource usage.
Conclusion
Tuning GC performance provides more significant rewards than tuning any other aspect of your application. It’s the most light-weight, non-intrusive approach to improve your application’s performance. Hopefully tips shared in this post are helpful to you. If you have additional tips to share or any interesting GC tuning experience, please do mention it in the comments section.?