Mastering JVM Garbage Collection Threads: VM Options Simplified
The garbage collection phase is crucial to any Java application. The main goal is maintaining the ever-evasive balance of high throughput and low latency. By configuring the garbage collector, we can increase performance or at least nudge the application in a particular direction.
Shorter garbage collection cycles are better. Thus, more resources allocated to the garbage collector would make it work faster and improve our application overall. Allocating more resources to a garbage collector is a reasonable idea, but it’s not as simple as it sounds.
In this article, we’ll learn about the effect of the number of threads on application performance. This article concentrates on the applications that run directly on a host. However, we can also apply the insights to containerized applications.?
Garbage Collection Threads
Usually, a garbage collector uses several threads to facilitate the correct and timely collection process. However, we’re generally interested in two types of threads connected to it: parallel and concurrent threads.?
Please note that some algorithms might use more threads for different purposes. For example, the G1 garbage collector uses additional threads for some internal tasks.?
1. Concurrent Threads
The names of these threads are self-explanatory. Garbage collectors use concurrent threads during concurrent phases. For example, CMS would use them to clean the old generation. Providing more threads can help reduce the application’s latency.
At the same time, fully STW (stop-the-world) garbage collectors won’t use them because they don’t have concurrent phases.?
2. Parallel Threads
These threads come into play during the STW phases. Despite the name, they don’t run in parallel with an application. These are the threads that would run in parallel with each other to reduce the garbage collection pauses.?
Thus, more parallel threads would decrease the STW time. However, we need to verify all the hypotheses and run appropriate benchmarks. JVM uses reasonable heuristics to identify the default number of garbage collection threads based on the number of CPU cores. The idea is quite simple: more cores equals more threads:
On a machine with N hardware threads where N is greater than 8, the parallel collector uses a fixed fraction of N as the number of garbage collector threads. The fraction is approximately 5/8 for large values of N. At values of N below 8, the number used is N. On selected platforms, the fraction drops to 5/16.
Thus, on an eighth-core machine, we’ll have eight collector threads; on a machine with thirty-two cores, the numbers will be twenty.?
Manual Configuration
While JVM provides reasonable heuristics regarding the default number of threads, sometimes we want to override them. We can configure the number of threads manually using VM options. Explicit configuration provides more control over an application and allows us to override the defaults with more appropriate values.?
We can use a couple of flags to configure the number of threads. As we discussed, the garbage collection process usually uses two types of threads: concurrent and parallel; we have separate parameters to control them:
-XX:ParallelGCThreads=<number>
-XX:ConcGCThreads=<threads>
The main problem with manual configuration is that we should deeply understand our application: not exactly the domain logic and the connection between classes, but resource consumption and processes that affect the heap.?
We also must understand the JVM and the garbage collection algorithm our application uses. Improper understanding might result in decreased performance. At the same time, adding more concurrent threads to a SerialGC won’t affect it at all.?
Thread Count
Let’s check garbage collection threads using a simple application. However, we won’t see all the threads on an idle application. Thus, we need to stress the garbage collector so it will try to utilize all the resources:
public class GcThreadsOnOutOfMemoryErrorBenchmark {
public static void main(String[] args) {
LinkedList<String> strings = new LinkedList<>();
while (true) {
strings.add(new String("Hello World!!!!"));
}
}
}
This code would result in an OutOfMemoryError and crash the application. However, this is what we need to force it to use all available garbage collection threads. To create a thread dump at the moment of OutOfMemoryError, we can use the following command:
领英推荐
-XX:OnOutOfMemoryError= "kill -3 %p"
Let’s change the number of concurrent threads to two and the number of parallel threads to eight. We can use -XX:+PrintCommandLineFlags to verify the parameters:
-XX:ConcGCThreads=2 -XX:G1ConcRefinementThreads=8 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=268435456 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=268435456 -XX:MinHeapSize=6815736 -XX:OnOutOfMemoryError=kill -3 %p -XX:ParallelGCThreads=8 -XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
According to the previous VM flagslisting, we’ll be using G1GC to see both concurrent and parallel threads. Also, we can see additional garbage collection threads that are specific to this algorithm:
Threads class SMR info:
_java_thread_list=0x0000600002d0ae80, length=12, elements={
0x0000000142016c00, 0x000000014380dc00, 0x000000014380b800, 0x00000001420d1600,
0x00000001420cf600, 0x000000014180b000, 0x00000001420d3400, 0x00000001420d6800,
0x000000014180d800, 0x0000000143009800, 0x0000000141812a00, 0x0000000141813000
}
"main" #1 [8451] prio=5 os_prio=31 cpu=68.74ms elapsed=1.41s tid=0x0000000142016c00 nid=8451 runnable [0x000000016d7d6000]
java.lang.Thread.State: RUNNABLE
"Reference Handler" #8 [31747] daemon prio=10 os_prio=31 cpu=0.43ms elapsed=1.40s tid=0x000000014380dc00 nid=31747 runnable [0x000000016e742000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #9 [22531] daemon prio=8 os_prio=31 cpu=0.14ms elapsed=1.40s tid=0x000000014380b800 nid=22531 in Object.wait() [0x000000016e94e000]
java.lang.Thread.State: WAITING (on object monitor)
"Signal Dispatcher" #10 [31235] daemon prio=9 os_prio=31 cpu=0.07ms elapsed=1.40s tid=0x00000001420d1600 nid=31235 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #11 [23043] daemon prio=9 os_prio=31 cpu=0.15ms elapsed=1.40s tid=0x00000001420cf600 nid=23043 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Monitor Deflation Thread" #12 [30723] daemon prio=9 os_prio=31 cpu=0.03ms elapsed=1.40s tid=0x000000014180b000 nid=30723 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #13 [23555] daemon prio=9 os_prio=31 cpu=18.32ms elapsed=1.40s tid=0x00000001420d3400 nid=23555 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"C1 CompilerThread0" #16 [24067] daemon prio=9 os_prio=31 cpu=18.20ms elapsed=1.40s tid=0x00000001420d6800 nid=24067 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Sweeper thread" #17 [30211] daemon prio=9 os_prio=31 cpu=0.04ms elapsed=1.40s tid=0x000000014180d800 nid=30211 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #18 [29699] daemon prio=8 os_prio=31 cpu=0.20ms elapsed=1.40s tid=0x0000000143009800 nid=29699 waiting on condition [0x000000016f7a2000]
java.lang.Thread.State: TIMED_WAITING (parking)
"Monitor Ctrl-Break" #19 [24835] daemon prio=5 os_prio=31 cpu=14.32ms elapsed=1.37s tid=0x0000000141812a00 nid=24835 runnable [0x000000016f9ae000]
java.lang.Thread.State: RUNNABLE
"Notification Thread" #20 [25347] daemon prio=9 os_prio=31 cpu=0.04ms elapsed=1.37s tid=0x0000000141813000 nid=25347 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=31 cpu=16.53ms elapsed=1.41s tid=0x000000014170e060 nid=19971 runnable
"GC Thread#0" os_prio=31 cpu=619.54ms elapsed=1.41s tid=0x0000000141706da0 nid=12547 runnable
"GC Thread#1" os_prio=31 cpu=672.72ms elapsed=1.36s tid=0x000000014161b6d0 nid=28931 runnable
"GC Thread#2" os_prio=31 cpu=559.43ms elapsed=1.36s tid=0x000000014161bb70 nid=25859 runnable
"GC Thread#3" os_prio=31 cpu=914.80ms elapsed=1.36s tid=0x000000014161c010 nid=26115 runnable
"GC Thread#4" os_prio=31 cpu=613.24ms elapsed=1.36s tid=0x000000014161c4b0 nid=28163 runnable
"GC Thread#5" os_prio=31 cpu=491.98ms elapsed=1.36s tid=0x000000014161c950 nid=27907 runnable
"GC Thread#6" os_prio=31 cpu=844.52ms elapsed=1.36s tid=0x000000014161cdf0 nid=27395 runnable
"GC Thread#7" os_prio=31 cpu=662.01ms elapsed=1.36s tid=0x000000014161d290 nid=27139 runnable
"G1 Main Marker" os_prio=31 cpu=0.13ms elapsed=1.41s tid=0x0000000141707470 nid=14339 runnable
"G1 Conc#0" os_prio=31 cpu=17.27ms elapsed=1.41s tid=0x0000000141707d10 nid=13827 runnable
"G1 Conc#1" os_prio=31 cpu=17.35ms elapsed=1.09s tid=0x000000014161dc30 nid=43267 runnable
"G1 Refine#0" os_prio=31 cpu=1.00ms elapsed=1.41s tid=0x0000000141709ba0 nid=16643 runnable
"G1 Service" os_prio=31 cpu=0.41ms elapsed=1.41s tid=0x000000014170a4d0 nid=21507 runnable
"VM Periodic Task Thread" os_prio=31 cpu=0.22ms elapsed=1.37s tid=0x000000014161a930 nid=25603 waiting on condition
While manual configuration might be beneficial, it also can lead to significant problems. We can avoid mistakes by combining automatic calculation with limitations. The following flags can help us automatically calculate the number of threads but also allow us to set some limits:
-XX:+AdaptiveGCThreading
-XX:ParallelGCMaxThreads
Thread Dump Analysis
We can create thread dumps in our application, as they don’t create much overhead. It’s better to create a thread dump several times with a short pause between them, for example, ten seconds.?
While thread dumps are generally readable, it’s more convenient to visualize and compare the information over time. Standard tools can help us visualize, filter, and search, but they don’t provide a convenient way to compare the dumps.
yCrash provides tools that can help analyze and test an application’s performance. One of them is fastThread, which creates a comprehensive report from thread dumps:
These reports can then be compared and review
d:
These reports can be accessed via the yCrash dashboard to access all the reports in the same application.
Also, we can check the thread pools on the same rep
rt page:
Conclusion
The default configurations for the garbage collection threads are generally good. However, the JVM allows manual configuration that we can use to fine-tune an application and improve its performance. This process should include monitoring and profiling.
Thread dumps don’t create much overhead, so we can run them regularly. This way, we can monitor the changes in our application in different phases, which can help identify hot spots.?
However, to better understand an application’s problems, it’s essential to take several thread dumps with short pauses between them.