Controlling Garbage Collection Threads in Docker: Best Practices
Containerized environments have grown in popularity dramatically and have become integral to application development. Because we usually use containers in cloud environments, resource management is crucial.
Allocating more resources than required or misconfiguring JVMs might increase the infrastructure costs. At the same time, improper configuration might affect the performance and SLAs.?
In this article, we’ll discuss garbage collection threads in a containerized environment. Garbage collection optimization is exciting, as the behavior and thread configuration aren’t transparent.
Garbage Collection Threads
Even a simple Java application requires several JVM threads to work correctly. These threads handle internal VM processes, synchronization, notifications, communication with external debugging tools, etc. Unfortunately (or fortunately), we have little control over them.
However, we can control the garbage collection threads—at least their number. Generally, there are two types of garbage collection threads: parallel and concurrent. They work in different garbage collection cycles: parallel threads run during STW events, and concurrent threads, as the name suggests, run during concurrent cycles.?
Some garbage collection implementations might use more threads to handle internal processes. Thus, during optimization, we should take into account which garbage collector we’re using. Not considering this aspect might lead to misconfiguration and a decrease in performance.
JMV provides a reasonable method for calculating the number of garbage collection threads based on the number of CPU cores. The general rule is that more CPU cores equals more threads. However, this method isn’t sufficient when we use containers.
Containerized Environments
Containers provide an excellent way to encapsulate the environment and configuration so an application can run consistently on different hosts. We’ll consider an application running with Docker and Docker Compose. However, we can get the same results running it with Docker Swarm and Kubernetes.
To start an application inside a container, we can use CLI/UI tools, and official documentation provides a comprehensive explanation. However, learning Docker isn’t the goal of this article. That’s why we’ll simplify our task and use the toolset from IntelliJ IDEA to run an application inside a container:
This way, we can avoid unnecessary complexity in our examples. Learning Docker and being comfortable with CLI is essential, but as was mentioned previously, this isn’t the primary goal of this article.?
1. Application
The fastest way to check the number of allocated garbage collection threads is to stress JVM. We can use a simple application that saturates the heap and causes OutOfMemoryError:
public class GcThreadsOnOutOfMemoryErrorBenchmark {
public static void main(String[] args) {
LinkedList<String> strings = new LinkedList<>();
while (true) {
strings.add(new String("Hello World!!!!"));
}
}
}
Please note that idle applications without garbage collection activity won’t use all available threads. We’re creating a memory leak to force the garbage collector to utilize all the resources.
To get the thread dump when our application crashes, we can use the kill -3 command. This way, we can get the thread dump at the moment of the most stress. To do so, we can add the following VM option:
-XX:OnOutOfMemoryError= "kill -3 %p"
We can use XX:+PrintCommandLineFlags to see the configuration and VM options:
-XX:G1ConcRefinementThreads=8 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=268435456 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=268435456 -XX:MinHeapSize=6815736 -XX:OnOutOfMemoryError=kill -3 %p -XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UnlockDiagnosticVMOptions -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
This flag helps to verify the configuration when we start our application.
Docker
We don’t need to do anything else while running this application in Docker using IntelliJ IDEA. We can now be satisfied with the thread dump in our output.?
While running our application with a memory leak, we’ll receive something similar in the output:
Threads class SMR info:
_java_thread_list=0x00006000028d8520, length=12, elements={
0x000000013c008200, 0x000000015c80fc00, 0x000000015c80ca00, 0x000000015c030000,
0x000000014c8a3c00, 0x000000015c030600, 0x000000014c8a4200, 0x000000014c010600,
0x000000014c010c00, 0x000000014c011c00, 0x000000014c063800, 0x000000012d014800
}
"main" #1 [8707] prio=5 os_prio=31 cpu=76.79ms elapsed=1.60s tid=0x000000013c008200 nid=8707 runnable [0x000000016bbce000]
java.lang.Thread.State: RUNNABLE
"Reference Handler" #8 [32259] daemon prio=10 os_prio=31 cpu=0.48ms elapsed=1.59s tid=0x000000015c80fc00 nid=32259 waiting on condition [0x000000016cb3a000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #9 [32003] daemon prio=8 os_prio=31 cpu=0.10ms elapsed=1.59s tid=0x000000015c80ca00 nid=32003 in Object.wait() [0x000000016cd46000]
java.lang.Thread.State: WAITING (on object monitor)
"Signal Dispatcher" #10 [31491] daemon prio=9 os_prio=31 cpu=0.05ms elapsed=1.59s tid=0x000000015c030000 nid=31491 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #11 [30979] daemon prio=9 os_prio=31 cpu=0.15ms elapsed=1.59s tid=0x000000014c8a3c00 nid=30979 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Monitor Deflation Thread" #12 [23299] daemon prio=9 os_prio=31 cpu=0.04ms elapsed=1.59s tid=0x000000015c030600 nid=23299 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #13 [30211] daemon prio=9 os_prio=31 cpu=25.28ms elapsed=1.59s tid=0x000000014c8a4200 nid=30211 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"C1 CompilerThread0" #16 [23811] daemon prio=9 os_prio=31 cpu=20.88ms elapsed=1.59s tid=0x000000014c010600 nid=23811 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Sweeper thread" #17 [29699] daemon prio=9 os_prio=31 cpu=0.02ms elapsed=1.59s tid=0x000000014c010c00 nid=29699 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #18 [29187] daemon prio=8 os_prio=31 cpu=0.28ms elapsed=1.58s tid=0x000000014c011c00 nid=29187 waiting on condition [0x000000016db9a000]
java.lang.Thread.State: TIMED_WAITING (parking)
"Monitor Ctrl-Break" #19 [24067] daemon prio=5 os_prio=31 cpu=15.48ms elapsed=1.56s tid=0x000000014c063800 nid=24067 runnable [0x000000016dda6000]
java.lang.Thread.State: RUNNABLE
"Notification Thread" #20 [28675] daemon prio=9 os_prio=31 cpu=0.04ms elapsed=1.56s tid=0x000000012d014800 nid=28675 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=31 cpu=20.68ms elapsed=1.60s tid=0x000000014be05f50 nid=18179 runnable
"GC Thread#0" os_prio=31 cpu=782.47ms elapsed=1.60s tid=0x000000015be08490 nid=14595 runnable
"GC Thread#1" os_prio=31 cpu=847.37ms elapsed=1.55s tid=0x000000014be225f0 nid=27651 runnable
"GC Thread#2" os_prio=31 cpu=611.90ms elapsed=1.55s tid=0x000000015be0b1d0 nid=24579 runnable
"GC Thread#3" os_prio=31 cpu=763.61ms elapsed=1.55s tid=0x000000015be0b4a0 nid=27139 runnable
"GC Thread#4" os_prio=31 cpu=535.76ms elapsed=1.55s tid=0x0000000105304470 nid=25091 runnable
"GC Thread#5" os_prio=31 cpu=772.13ms elapsed=1.55s tid=0x0000000105304d00 nid=26371 runnable
"GC Thread#6" os_prio=31 cpu=868.92ms elapsed=1.55s tid=0x000000012be06af0 nid=26115 runnable
"GC Thread#7" os_prio=31 cpu=624.98ms elapsed=1.55s tid=0x000000012bf04c50 nid=25603 runnable
"G1 Main Marker" os_prio=31 cpu=0.13ms elapsed=1.60s tid=0x000000014be04e20 nid=14339 runnable
"G1 Conc#0" os_prio=31 cpu=16.38ms elapsed=1.60s tid=0x000000014be056c0 nid=13571 runnable
"G1 Conc#1" os_prio=31 cpu=16.44ms elapsed=1.30s tid=0x000000015bf08040 nid=43267 runnable
"G1 Refine#0" os_prio=31 cpu=0.02ms elapsed=1.60s tid=0x000000014bf05a10 nid=21507 runnable
"G1 Service" os_prio=31 cpu=0.40ms elapsed=1.60s tid=0x000000012be04080 nid=16643 runnable
"VM Periodic Task Thread" os_prio=31 cpu=0.30ms elapsed=1.56s tid=0x000000012be06540 nid=28163 waiting on condition
In addition to the concurrent and parallel threads, we have several threads that are specific to the G1 garbage collector:
"G1 Main Marker" os_prio=31 cpu=0.13ms elapsed=1.60s tid=0x000000014be04e20 nid=14339 runnable
"G1 Conc#0" os_prio=31 cpu=16.38ms elapsed=1.60s tid=0x000000014be056c0 nid=13571 runnable
"G1 Conc#1" os_prio=31 cpu=16.44ms elapsed=1.30s tid=0x000000015bf08040 nid=43267 runnable
"G1 Refine#0" os_prio=31 cpu=0.02ms elapsed=1.60s tid=0x000000014bf05a10 nid=21507 runnable
"G1 Service" os_prio=31 cpu=0.40ms elapsed=1.60s tid=0x000000012be04080 nid=16643 runnable
As mentioned previously, different garbage collectors might use additional threads for internal processing. At the same time, the fully concurrent or fully STW collector might use only concurrent or parallel threads.
We ran the containers on the eight-core machine, and, as we can see, the container uses all the cores. Let’s run several containers using Docker Compose. However, first of all, we need to create an image with our application, the thing we happily avoided previously. For simplicity, we’ll place all the Docker-related files in the directory with our Java class:
FROM openjdk
COPY GcThreadsOnOutOfMemoryErrorBenchmark.java .
CMD ["java", "-Xmx256m", "-XX:OnOutOfMemoryError=kill -3 %p", "GcThreadsOnOutOfMemoryErrorBenchmark.java"]
We don’t need to compile the class, as from Java 11, we can run it directly. This way, we can decrease the number of required steps. After that, we can prepare our docker-compose.yaml:
version: '3.3'
services:
oom-application:
build: .
This is a simple compose file that identifies a service with an image from our Dockerfile. Now, let’s run several replicas at the same time:
领英推荐
$ docker-compose up --scale oom-application=3
As we can see, all the containers use the same number of garbage collection threads. We don’t provide the logs as they’re too excessive and contain the same information. These three replicas allocated more than twenty(!) garbage collection threads, more than thirty(!) if we count additional G1 threads.
Thread Control
Should we limit the number of garbage collection threads? Should a memory leak in one application affect the rest of the containers on a host? Obviously, we would like to have more control over performance and resources.?
We should pay especially close attention to containerized environments. Some JVM parameters consider the overall host resources and allocate resources to containers. For example, each container may start with an excessive heap on hosts with a significant amount of RAM.?
Although we can control the CPU time, this doesn’t give us control over the cores. As mentioned previously, the JVM determines the number of GC threads based on the number of available cores on the host.?
We can limit the garbage collection threads in our containers in several ways. Let’s review some of them.
1. Docker Daemon
First, we can limit the number of cores in the Docker Daemon Configuration. The location of the configuration file depends on the system. However, the setup is the same for any platform. The only thing we need to do is to identify the limit of the cores it can use:
{
"cpus": 2
}
If we have Docker Desktop installed, we can do it using a nice UI:
This way, we set the limit on any container we run on a machine. It’s beneficial when we don’t want containers to use all the cores. However, we cannot allocate more cores if we need to. Let’s check other options.
2. Allocating CPUs Directly
Docker provides several flags to control CPU consumption. While it’s possible to control CPU time, in our case, we’re more interested in limiting the number of CPUs. We can use the –cpus option to set the number of CPUs. We can do this using IntelliJ:
We can use the same flag for docker-compose.yaml. However, we need to set it in the file directly:
version: '3.3'
services:
oom-application:
build: .
cpus: "2"
We can also allocate a fraction of cores. To do this, we just need to pass a fractional number. For example, —-cpus=0.5 forces our container to use not more than half of a core.
Another way is to allocate specific cores to a container to ensure each uses a separate core. To achieve this, we can use the –cpuset-cpus option:
We can achieve the same using docker-compose.yaml:
version: '3.3'
services:
oom-application:
build: .
cpuset: "0-1"
However, we need to ensure that we don’t have VM flags that override default heuristics and explicitly set the number of threads. Let’s talk about these parameters in more detail.
3. VM Flags for Garbage Collection Threads
Sometimes, we want more control over the number of garbage collection threads. One of the benefits of the default heuristics is that containers adjust the number of threads on more performant hosts more flexibly. However, if we need to override it, we can use the following flags:
-XX:+AdaptiveGCThreading
-XX:ParallelGCMaxThreads
Please note that these options would allocate the number of threads regardless of how many CPU cores are available. This solution isn’t as flexible as the previous ones but allows more control over the threads.?
Conclusion
Optimizing the number of garbage collection threads in a container can improve the performance of individual applications. However, the most important thing is that it can help us improve the host’s overall performance.
If we use such a container in cloud environments, the number of threads and CPU usage can dramatically affect our costs. Even for medium-sized applications, this might be a significant sum of money.?
As usual, any optimization and changes in the JVM configuration require testing and monitoring. fastThread is an excellent tool for checking our hypothesis and ensuring that optimization won’t backfire.