Java JVM Performance Tuning
Understanding Java Memory Model is essential learning for serious Java developers who develop, deploy, monitor, test, and tune the performance of a Java application. Then we focus on java applications and GC (garbage collector) tuning, as a way to make small and meaningful changes that can affect the system in big ways under certain circumstances.
JVM Memory Model
First, you should have a look at JVM architecture
You should already use some JVM configurations like this
JAVA_OPTS=”-server -Xms2560m -Xmx2560m -XX:NewSize=1536m -XX:MaxNewSize=1536m -XX:MetaspaceSize=768m -XX:MaxMetaspaceSize=768m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:MaxTenuringThreshold=5″
But have you ever wondered how your JVM resides on memory? JVM consumes the available space on host OS memory.
However, inside JVM, there exist separate memory spaces (Heap, Non-Heap, Cache) in order to store runtime data and compiled code.
Heap Memory
The following are general guidelines regarding heap sizes for server applications:
Young Generation
Old Generation
Non-Heap Memory
Cache Memory
What is GC?
Java provides automatic memory management through a program called Garbage Collector.
"Remove objects that are not used anymore."
Everything above is done inside Heap, a space for dynamic memory allocation on runtime to contain all java Objects. Along with Heap, there is Stack that contains local variables and function call that support for thread executions.
How Java Garbage Collection Really Works
Many people think garbage collection collects and discards dead objects. In reality, Java garbage collection is doing the opposite! Live objects are tracked and everything else designated garbage. As you'll see, this fundamental misunderstanding can lead to many performance problems.
Let's start with the heap, which is the area of memory used for dynamic allocation. In most configurations the operating system allocates the heap in advance to be managed by the JVM while the program is running. This has a couple of important ramifications:
Figure 2.1: New objects are simply allocated at the end of the used heap.
Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory. As simple as this sounds, it raises a question: what is the first reference in the tree?
Garbage-Collection Roots—The Source of All Object Trees
Every object tree must have one or more root objects. As long as the application can reach those roots, the whole tree is reachable. But when are those root objects considered reachable? Special objects called garbage-collection roots (GC roots; see Figure 2.2) are always reachable and so is any object that has a garbage-collection root at its own root.
There are four kinds of GC roots in Java:
Figure 2.2: GC roots are objects that are themselves referenced by the JVM and thus keep every other object from being garbage-collected.
Therefore, a simple Java application has the following GC roots:
Marking and Sweeping Away Garbage
Marking Reachable Objects
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. As you might intuit, it's a straightforward, two-step process:
Live objects are represented as blue on the picture above. When the marking phase finishes, every live object is marked. All other objects (grey data structures on the picture above) are thus unreachable from the GC roots, implying that your application cannot use the unreachable objects anymore. Such objects are considered garbage and GC should get rid of them in the following phases.
There are important aspects to note about the marking phase:
Removing Unused Objects
Removal of unused objects is somewhat different for different GC algorithms but all such GC algorithms can be divided into three groups: sweeping, compacting and copying.
Sweep
Mark and Sweep algorithms use conceptually the simplest approach to garbage by just ignoring such objects. What this means is that after the marking phase has completed all space occupied by unvisited objects is considered free and can thus be reused to allocate new objects.
The approach requires using the so called free-list recording of every free region and its size. The management of the free-lists adds overhead to object allocation. Built into this approach is another weakness – there may exist plenty of free regions but if no single region is large enough to accommodate the allocation, the allocation is still going to fail (with an OutOfMemoryError in Java).
It is often referred as mark-sweep algorithm.
Compact
Mark-Sweep-Compact algorithms solve the shortcomings of Mark and Sweep by moving all marked – and thus alive – objects to the beginning of the memory region. The downside of this approach is an increased GC pause duration as we need to copy all objects to a new place and to update all references to such objects. The benefits to Mark and Sweep are also visible – after such a compacting operation new object allocation is again extremely cheap via pointer bumping. Using such approach the location of the free space is always known and no fragmentation issues are triggered either.
It is often referred as mark-sweep-compact algorithm.
Copy
Mark and Copy algorithms are very similar to the Mark and Compact as they too relocate all live objects. The important difference is that the target of relocation is a different memory region as a new home for survivors. Mark and Copy approach has some advantages as copying can occur simultaneously with marking during the same phase. The disadvantage is the need for one more memory region, which should be large enough to accommodate survived objects.
It is often referred as mark-copy algorithm.
Stop-the-world (STW)
All minor garbage collections are "Stop the World" events. This means that all application threads are stopped until the operation completes. Minor garbage collections are always Stop the World events.
The Old Generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually the old generation needs to be collected. This event is called a major garbage collection.
Major garbage collection are also Stop the World events. Often a major collection is much slower because it involves all live objects. So for Responsive applications, major garbage collections should be minimized. Also note, that the length of the Stop the World event for a major garbage collection is affected by the kind of garbage collector that is used for the old generation space.
Visual GC
When the application starts and allocates memory on Eden space. A blue is a live object and a grey is a dead object (unreachable). When the given space is full, the application tries to create another object and JVM tries to allocate something on the Eden but the allocation fails. That actually causes minor GC.
After the first minor GC, all live objects will be moved to Survivor 1 with the age is 1 and the dead objects will be deleted.
Your application is running, new objects get allocated in Eden space again. There are some objects that become unreachable on both Eden space and Survivor 1
After the second Minor GC, all alive objects will be moved to Survivor 2 (from both Eden with age 1 and Survivor 1 with age 2) and the dead object will be deleted.
Your application is still running, new objects are allocated on Eden space, after a few moments some objects are unreachable from both Eden and Survivor 2
After the third minor GC, all live objects will be move from both Eden and Survivor 2 to Survivor 1 with age increase and dead objects will be deleted.
An object that is living longer in Survivor will be promoted to the old generation (Tuner) if the age is greater than -XX:MaxTenuringThreshold
We can use VisualGC a plugin of VisualVM to attaches to an instrumented HotSpot JVM and collects and graphically displays garbage collection, class loader, and HotSpot compiler performance data.
Performance Basics
Typically, when tuning a Java application, the focus is on one of two main goals: responsiveness or throughput. We will refer back to these concepts as the tutorial progresses.
Responsiveness
Responsiveness refers to how quickly an application or system responds with a requested piece of data. Examples include:
For applications that focus on responsiveness, large pause times are not acceptable. The focus is on responding in short periods of time.
Throughput
Throughput focuses on maximizing the amount of work by an application in a specific period of time. Examples of how throughput might be measured include:
High pause times are acceptable for applications that focus on throughput. Since high throughput applications focus on benchmarks over longer periods of time, quick response time is not a consideration.
What types of GC are there?
Concurrent mark sweep (CMS) garbage collection
CMS garbage collection is essentially an upgraded mark and sweep method. It scans heap memory using multiple threads. It was modified to take advantage of faster systems and had performance enhancements.
It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads. It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.
To use CMS GC, use below JVM argument:
-XX:+UseConcMarkSweepGC
Serial garbage collection
This algorithm uses mark-copy for the Young Generation and mark-sweep-compact for the Old Generation. It works on a single thread. When executing, it freezes all other threads until garbage collection operations have concluded.
Due to the thread-freezing nature of serial garbage collection, it is only feasible for very small programs.
To use Serial GC, use below JVM argument:
-XX:+UseSerialGC
Parallel garbage collection
Simimar to serial GC, It uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Multiple concurrent threads are used for marking and copying / compacting phases. You can configure the number of threads using -XX:ParallelGCThreads=N option.
Parallel Garbage Collector is suitable on multi-core machines in cases where your primary goal is to increase throughput by efficient usage of existing system resources. Using this approach, GC cycle times can be considerably reduced.
To use parallel GC, use below JVM argument:
-XX:+UseParallelGC
G1 garbage collection
The G1 (Garbage First) garbage collector was available in Java 7 and is designed to be the long term replacement for the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.
This approach involves segmenting the memory heap into multiple small regions (typically 2048). Each region is marked as either young generation (further devided into eden regions or survivor regions) or old generation. This allows the GC to avoid collecting the entire heap at once, and instead approach the problem incrementally. It means that only a subset of the regions is considered at a time.
G1 keep tracking of the amount of live data that each region contains. This information is used in determining the regions that contain the most garbage; so they are collected first. That’s why it is name garbage-first collection.
Just like other algorithms, unfortunately, the compacting operation takes place using the Stop the World approach. But as per it’s design goal, you can set specific performance goals to it. You can configure the pauses duration e.g. no more than 10 milliseconds in any given second. Garbage-First GC will do its best to meet this goal with high probability (but not with certainty, that would be hard real-time due to OS level thread management).
If you want to use in Java 7 or Java 8 machines, use JVM argument as below:
-XX:+UseG1GC
G1 Optimization Options
-XX:G1HeapRegionSize=16m Size of the heap region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.
-XX:MaxGCPauseMillis=200 Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.
-XX:G1ReservePercent=5 This determines the minimum reserve in the heap.
-XX:G1ConfidencePercent=75 This is the confidence coefficient pause prediction heuristics.
-XX:GCPauseIntervalMillis=200 This is the pause interval time slice per MMU in milliseconds.
Suggestion
G1 Config
-XX:+UseG1GC \
-XX:+UseStringDeduplication \
-XX:+ParallelRefProcEnabled \
-XX:+AlwaysPreTouch \
-XX:+DisableExplicitGC \
-XX:ParallelGCThreads=8 \
-XX:GCTimeRatio=9 \
-XX:MaxGCPauseMillis=25 \
-XX:MaxGCMinorPauseMillis=5 \
-XX:ConcGCThreads=8 \
-XX:InitiatingHeapOccupancyPercent=70 \
-XX:MaxTenuringThreshold=10 \
-XX:SurvivorRatio=6 \
-XX:-UseAdaptiveSizePolicy \
-XX:MaxMetaspaceSize=256M \
-Xmx4G \
-Xms2G \
Optimize Result
References
https://medium.com/platform-engineer/understanding-java-memory-model-1d0863f6d973
https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html
https://www.dynatrace.com/resources/ebooks/javabook/how-garbage-collection-works/
https://howtodoinjava.com/java/garbage-collection/all-garbage-collection-algorithms/
https://www.baeldung.com/java-memory-leaks
https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm
Principal Engineer at Arcesium
2 年"Post of the day". Amazing content!