Java JVM Performance Tuning
Heap Memory

Java JVM Performance Tuning

Understanding Java Memory Model is essential learning for serious Java developers who develop, deploy, monitor, test, and tune the performance of a Java application. Then we focus on java applications and GC (garbage collector) tuning, as a way to make small and meaningful changes that can affect the system in big ways under certain circumstances.

JVM Memory Model

First, you should have a look at JVM architecture

No alt text provided for this image

  • Classloader: Classloader is a subsystem of JVM which is used to load class files. Whenever we run the java program, it is loaded first by the classloader.
  • Class(Method) Area: Class(Method) Area stores per-class structures such as the runtime constant pool, field and method data, the code for methods.
  • Heap: It is the runtime data area in which objects are allocated.
  • Stack: Java Stack stores frames. It holds local variables and partial results, and plays a part in method invocation and return. Each thread has a private JVM stack, created at the same time as the thread. A new frame is created each time a method is invoked. A frame is destroyed when its method invocation completes.
  • Program Counter Register (PC): PC (program counter) register contains the address of the Java virtual machine instruction currently being executed.
  • Native Method Stack: It contains all the native methods used in the application.
  • Execution Engine: It contains: A virtual processor; Interpreter: Read bytecode stream then execute the instructions.
  • Just-In-Time(JIT) compiler: It is used to improve performance. JIT compiles parts of the byte code that have similar functionality at the same time, and hence reduces the amount of time needed for compilation. Here, the term "compiler" refers to a translator from the instruction set of a Java virtual machine (JVM) to the instruction set of a specific CPU.
  • Java Native Interface: Java Native Interface (JNI) is a framework that provides an interface to communicate with another application written in another language like C, C++, Assembly, etc. Java uses the JNI framework to send output to the Console or interact with OS libraries.

You should already use some JVM configurations like this

JAVA_OPTS=”-server -Xms2560m -Xmx2560m -XX:NewSize=1536m -XX:MaxNewSize=1536m -XX:MetaspaceSize=768m -XX:MaxMetaspaceSize=768m -XX:InitialCodeCacheSize=64m -XX:ReservedCodeCacheSize=96m -XX:MaxTenuringThreshold=5″        

  • -server - Enables “Server Hotspot VM”; this parameter is used by default in 64 bit JVM.
  • -Xms - Initial space for heap.
  • -Xmx - Maximum space for heap.
  • -XX:NewSize - Initial new space. Setting the new size to half of the total heap typically provides better performance than using a smaller new size.
  • -XX:MaxNewSize - Maximum new space.
  • -XX:MetaspaceSize - Initial space for static content.
  • -XX:MaxMetaspaceSize - Maximum space for static content.
  • -XX:InitialCodeCacheSize - Initial space for JIT-compiled code. Too small a code cache (48m is the default) reduces performance, as the JIT isn't able to optimize high-frequency methods.
  • -XX:ReservedCodeCacheSize - Maximum space for JIT-compiled code.
  • -XX:MaxTenuringThreshold - Keeps survivors in the survivor space for up to 15 garbage collections before promotion to the old generation space.

But have you ever wondered how your JVM resides on memory? JVM consumes the available space on host OS memory.

No alt text provided for this image

However, inside JVM, there exist separate memory spaces (Heap, Non-Heap, Cache) in order to store runtime data and compiled code.

Heap Memory

  • Heap is divided into 2 parts: Young Generation and Old Generation
  • Heap is allocated when JVM starts up (Initial size: -Xms)
  • Heap size increases/decreases while the application is running
  • Maximum size: -Xmx

No alt text provided for this image
No alt text provided for this image

The following are general guidelines regarding heap sizes for server applications:

  • Unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size is often too small.
  • Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. However, the virtual machine is then unable to compensate if you make a poor choice.
  • In general, increase the memory as you increase the number of processors because allocation can be made parallel.

Young Generation

  • This is reserved for containing newly allocated objects
  • Young Gen includes three parts — Eden Memory and two Survivor Memory spaces (S0, S1)
  • Most of the newly-created objects go to Eden space.
  • When Eden space is filled with objects, Minor GC (a.k.a. Young Collection) is performed and all the survivor objects are moved to one of the survivor spaces.
  • Minor GC also checks the survivor objects and moves them to the other survivor space. So at a time, one of the survivor's spaces is always empty.
  • Objects that are survived after many cycles of GC are moved to the Old generation memory space. Usually, it’s done by setting a threshold for the age of the young generation objects before they become eligible to promote to Old generation (-XX:MaxTenuringThreshold).

Old Generation

  • This is reserved for containing long-lived objects that could survive after many rounds of Minor GC
  • When Old Gen space is full, Major GC (a.k.a. Old Collection) is performed (usually takes longer time)

Non-Heap Memory

  • This includes Permanent Generation (Replaced by Metaspace since Java 8)
  • Perm Gen stores per-class structures such as runtime constant pool, field and method data, and the code for methods and constructors, as well as interned Strings
  • Its size can be changed using -XX:PermSize and -XX:MaxPermSize

No alt text provided for this image

Cache Memory

  • This includes Code Cache
  • Stores compiled code (i.e. native code) generated by JIT compiler, JVM internal structures, loaded profiler agent code, and data, etc.
  • When Code Cache exceeds a threshold, it gets flushed (and objects are not relocated by the GC).

What is GC?

Java provides automatic memory management through a program called Garbage Collector.

"Remove objects that are not used anymore."

Everything above is done inside Heap, a space for dynamic memory allocation on runtime to contain all java Objects. Along with Heap, there is Stack that contains local variables and function call that support for thread executions.

How Java Garbage Collection Really Works

Many people think garbage collection collects and discards dead objects. In reality, Java garbage collection is doing the opposite! Live objects are tracked and everything else designated garbage. As you'll see, this fundamental misunderstanding can lead to many performance problems.

Let's start with the heap, which is the area of memory used for dynamic allocation. In most configurations the operating system allocates the heap in advance to be managed by the JVM while the program is running. This has a couple of important ramifications:

  • Object creation is faster because global synchronization with the operating system is not needed for every single object. An allocation simply claims some portion of a memory array and moves the offset pointer forward (see Figure 2.1). The next allocation starts at this offset and claims the next portion of the array.
  • When an object is no longer used, the garbage collector reclaims the underlying memory and reuses it for future object allocation. This means there is no explicit deletion and no memory is given back to the operating system.

No alt text provided for this image

Figure 2.1: New objects are simply allocated at the end of the used heap.

Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory. As simple as this sounds, it raises a question: what is the first reference in the tree?

Garbage-Collection Roots—The Source of All Object Trees

Every object tree must have one or more root objects. As long as the application can reach those roots, the whole tree is reachable. But when are those root objects considered reachable? Special objects called garbage-collection roots (GC roots; see Figure 2.2) are always reachable and so is any object that has a garbage-collection root at its own root.

There are four kinds of GC roots in Java:

  • Local variables are kept alive by the stack of a thread. This is not a real object virtual reference and thus is not visible. For all intents and purposes, local variables are GC roots.
  • Active Java threads are always considered live objects and are therefore GC roots. This is especially important for thread local variables.
  • Static variables are referenced by their classes. This fact makes them de facto GC roots. Classes themselves can be garbage-collected, which would remove all referenced static variables.
  • JNI References are Java objects that the native code has created as part of a JNI call. Objects thus created are treated specially because the JVM does not know if it is being referenced by the native code or not.

No alt text provided for this image

Figure 2.2: GC roots are objects that are themselves referenced by the JVM and thus keep every other object from being garbage-collected.

Therefore, a simple Java application has the following GC roots:

  • Local variables in the main method
  • The main thread
  • Static variables of the main class

Marking and Sweeping Away Garbage

Marking Reachable Objects

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. As you might intuit, it's a straightforward, two-step process:

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.

No alt text provided for this image

Live objects are represented as blue on the picture above. When the marking phase finishes, every live object is marked. All other objects (grey data structures on the picture above) are thus unreachable from the GC roots, implying that your application cannot use the unreachable objects anymore. Such objects are considered garbage and GC should get rid of them in the following phases.

There are important aspects to note about the marking phase:

  • The application threads need to be stopped for the marking to happen as you cannot really traverse the graph if it keeps changing under your feet all the time. Such a situation when the application threads are temporarily stopped so that the JVM can indulge in housekeeping activities is called a safe point resulting in a Stop The World pause. Safe points can be triggered for different reasons but garbage collection is by far the most common reason for a safe point to be introduced.
  • The duration of this pause depends neither on the total number of objects in heap nor on the size of the heap but on the number of alive objects. So increasing the size of the heap does not directly affect the duration of the marking phase.
  • When the mark phase is completed, the GC can proceed to the next step and start removing the unreachable objects.

Removing Unused Objects

Removal of unused objects is somewhat different for different GC algorithms but all such GC algorithms can be divided into three groups: sweeping, compacting and copying.

Sweep

Mark and Sweep algorithms use conceptually the simplest approach to garbage by just ignoring such objects. What this means is that after the marking phase has completed all space occupied by unvisited objects is considered free and can thus be reused to allocate new objects.

The approach requires using the so called free-list recording of every free region and its size. The management of the free-lists adds overhead to object allocation. Built into this approach is another weakness – there may exist plenty of free regions but if no single region is large enough to accommodate the allocation, the allocation is still going to fail (with an OutOfMemoryError in Java).

It is often referred as mark-sweep algorithm.

No alt text provided for this image

Compact

Mark-Sweep-Compact algorithms solve the shortcomings of Mark and Sweep by moving all marked – and thus alive – objects to the beginning of the memory region. The downside of this approach is an increased GC pause duration as we need to copy all objects to a new place and to update all references to such objects. The benefits to Mark and Sweep are also visible – after such a compacting operation new object allocation is again extremely cheap via pointer bumping. Using such approach the location of the free space is always known and no fragmentation issues are triggered either.

It is often referred as mark-sweep-compact algorithm.

No alt text provided for this image

Copy

Mark and Copy algorithms are very similar to the Mark and Compact as they too relocate all live objects. The important difference is that the target of relocation is a different memory region as a new home for survivors. Mark and Copy approach has some advantages as copying can occur simultaneously with marking during the same phase. The disadvantage is the need for one more memory region, which should be large enough to accommodate survived objects.

It is often referred as mark-copy algorithm.

No alt text provided for this image

Stop-the-world (STW)

All minor garbage collections are "Stop the World" events. This means that all application threads are stopped until the operation completes. Minor garbage collections are always Stop the World events.

The Old Generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually the old generation needs to be collected. This event is called a major garbage collection.

Major garbage collection are also Stop the World events. Often a major collection is much slower because it involves all live objects. So for Responsive applications, major garbage collections should be minimized. Also note, that the length of the Stop the World event for a major garbage collection is affected by the kind of garbage collector that is used for the old generation space.

Visual GC

When the application starts and allocates memory on Eden space. A blue is a live object and a grey is a dead object (unreachable). When the given space is full, the application tries to create another object and JVM tries to allocate something on the Eden but the allocation fails. That actually causes minor GC.

No alt text provided for this image

After the first minor GC, all live objects will be moved to Survivor 1 with the age is 1 and the dead objects will be deleted.

No alt text provided for this image

Your application is running, new objects get allocated in Eden space again. There are some objects that become unreachable on both Eden space and Survivor 1

No alt text provided for this image

After the second Minor GC, all alive objects will be moved to Survivor 2 (from both Eden with age 1 and Survivor 1 with age 2) and the dead object will be deleted.

No alt text provided for this image

Your application is still running, new objects are allocated on Eden space, after a few moments some objects are unreachable from both Eden and Survivor 2

No alt text provided for this image

After the third minor GC, all live objects will be move from both Eden and Survivor 2 to Survivor 1 with age increase and dead objects will be deleted.

No alt text provided for this image

An object that is living longer in Survivor will be promoted to the old generation (Tuner) if the age is greater than -XX:MaxTenuringThreshold

No alt text provided for this image

We can use VisualGC a plugin of VisualVM to attaches to an instrumented HotSpot JVM and collects and graphically displays garbage collection, class loader, and HotSpot compiler performance data.

No alt text provided for this image

Performance Basics

Typically, when tuning a Java application, the focus is on one of two main goals: responsiveness or throughput. We will refer back to these concepts as the tutorial progresses.

Responsiveness

Responsiveness refers to how quickly an application or system responds with a requested piece of data. Examples include:

  • How quickly a desktop UI responds to an event
  • How fast a website returns a page
  • How fast a database query is returned

For applications that focus on responsiveness, large pause times are not acceptable. The focus is on responding in short periods of time.

Throughput

Throughput focuses on maximizing the amount of work by an application in a specific period of time. Examples of how throughput might be measured include:

  • The number of transactions completed in a given time.
  • The number of jobs that a batch program can complete in an hour.
  • The number of database queries that can be completed in an hour.

High pause times are acceptable for applications that focus on throughput. Since high throughput applications focus on benchmarks over longer periods of time, quick response time is not a consideration.

What types of GC are there?

Concurrent mark sweep (CMS) garbage collection

CMS garbage collection is essentially an upgraded mark and sweep method. It scans heap memory using multiple threads. It was modified to take advantage of faster systems and had performance enhancements.

It attempts to minimize the pauses due to garbage collection by doing most of the garbage collection work concurrently with the application threads. It uses the parallel stop-the-world mark-copy algorithm in the Young Generation and the mostly concurrent mark-sweep algorithm in the Old Generation.

To use CMS GC, use below JVM argument:

-XX:+UseConcMarkSweepGC        

Serial garbage collection

This algorithm uses mark-copy for the Young Generation and mark-sweep-compact for the Old Generation. It works on a single thread. When executing, it freezes all other threads until garbage collection operations have concluded.

Due to the thread-freezing nature of serial garbage collection, it is only feasible for very small programs.

To use Serial GC, use below JVM argument:

-XX:+UseSerialGC        

Parallel garbage collection

Simimar to serial GC, It uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Multiple concurrent threads are used for marking and copying / compacting phases. You can configure the number of threads using -XX:ParallelGCThreads=N option.

Parallel Garbage Collector is suitable on multi-core machines in cases where your primary goal is to increase throughput by efficient usage of existing system resources. Using this approach, GC cycle times can be considerably reduced.

To use parallel GC, use below JVM argument:

-XX:+UseParallelGC        

G1 garbage collection

The G1 (Garbage First) garbage collector was available in Java 7 and is designed to be the long term replacement for the CMS collector. The G1 collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.

This approach involves segmenting the memory heap into multiple small regions (typically 2048). Each region is marked as either young generation (further devided into eden regions or survivor regions) or old generation. This allows the GC to avoid collecting the entire heap at once, and instead approach the problem incrementally. It means that only a subset of the regions is considered at a time.

No alt text provided for this image

G1 keep tracking of the amount of live data that each region contains. This information is used in determining the regions that contain the most garbage; so they are collected first. That’s why it is name garbage-first collection.

Just like other algorithms, unfortunately, the compacting operation takes place using the Stop the World approach. But as per it’s design goal, you can set specific performance goals to it. You can configure the pauses duration e.g. no more than 10 milliseconds in any given second. Garbage-First GC will do its best to meet this goal with high probability (but not with certainty, that would be hard real-time due to OS level thread management).

If you want to use in Java 7 or Java 8 machines, use JVM argument as below:

-XX:+UseG1GC        

G1 Optimization Options

-XX:G1HeapRegionSize=16m Size of the heap region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.

-XX:MaxGCPauseMillis=200 Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.

-XX:G1ReservePercent=5 This determines the minimum reserve in the heap.

-XX:G1ConfidencePercent=75 This is the confidence coefficient pause prediction heuristics.

-XX:GCPauseIntervalMillis=200 This is the pause interval time slice per MMU in milliseconds.

Suggestion

G1 Config

-XX:+UseG1GC \
-XX:+UseStringDeduplication \
-XX:+ParallelRefProcEnabled \
-XX:+AlwaysPreTouch \
-XX:+DisableExplicitGC \
-XX:ParallelGCThreads=8 \
-XX:GCTimeRatio=9 \
-XX:MaxGCPauseMillis=25 \
-XX:MaxGCMinorPauseMillis=5 \
-XX:ConcGCThreads=8 \
-XX:InitiatingHeapOccupancyPercent=70 \
-XX:MaxTenuringThreshold=10 \
-XX:SurvivorRatio=6 \
-XX:-UseAdaptiveSizePolicy \
-XX:MaxMetaspaceSize=256M \
-Xmx4G \
-Xms2G \        

Optimize Result

No alt text provided for this image

References

https://medium.com/platform-engineer/understanding-java-memory-model-1d0863f6d973

https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html

https://www.dynatrace.com/resources/ebooks/javabook/how-garbage-collection-works/

https://howtodoinjava.com/java/garbage-collection/all-garbage-collection-algorithms/

https://www.baeldung.com/java-memory-leaks

https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm

Surinder Kumar Mehra

Principal Engineer at Arcesium

2 年

"Post of the day". Amazing content!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了