登录查看更多内容

Log4J Odyssey: Unraveling the Mysteries of Code, ClassLoaders, and the Fast Lane to the Slow Path

Aliaksei Dubrouski

Sr Staff Software Engineer at LinkedIn

发布日期: 2024年3月10日

Last year, we observed a decline in the performance of our systems in specific scenarios under heavy load. Despite the automated systems effectively preventing damage, our engineers were tasked with identifying the root cause. This investigation unfolded as one of the most extensive and captivating endeavors of the year.

Considerable time was dedicated to exploring the issue alongside Dave Messink and other engineers. Eventually, we traced the root cause to a version change in the Log4J library, specifically from 2.17.1 to versions beyond. Research revealed that the default behavior of Log4J loggers had shifted, with the "includeLocation" flag becoming true by default, resulting in significant overhead. In this context, including location meant that Log4J loggers would print stack traces with not only class names but also method names and exact line numbers, necessitating extended class information. Since all loggers were expected to include location in versions 2.17.2 and above, this was identified as the core issue.

Dave Messink submitted a patch (LOG4J2-3487) aimed at mitigating the problem, and internally, we implemented additional safeguards. To our surprise, performance bottlenecks persisted with identical symptoms. Subsequently, a workgroup was formed to delve deeper into the problem.

Troubleshooting a performance bottleneck in a real application can be challenging due to the sheer complexity of business logic and the specific conditions required to trigger the problem. In this instance, stress testing the system while logging a high number of exceptions was necessary.

To streamline the investigation process, a demo project was created, minimizing the overhead of setting up, building, and running the actual application. The demo comprised a single Java class and a Log4J configuration file. This class threw an exception and logged it using a logger similar to what we use in our production applications.

Surprisingly, every recent version of Log4J in the demo application exhibited the same problem, despite all configuration changes. The symptom was clear: every time Log4J needed to log a class, it used a slow path in the code, attempting to load a class via classloader and cache it. Older versions were able to immediately match class names and use a fast path, avoiding the expensive class loading operation.

developrec 4 个月前

LeetCode 2801 (Hard, Acceptance Level 14.5%). Count…

Sergey Leschev ?? 1 年前

C# 9.0 Features

Muhammad Mazhar 6 个月前

The importance of avoiding class loading became evident due to the synchronized nature of the loadClass method in the classloader's architecture. Multiple threads attempting to load the same class would result in one proceeding while others waited. Additionally, loading classes could involve reading jar files from disk, making the operation exceptionally costly.

The behavior of the demo project was perplexing, given the initial belief that the root cause lay in a complex Log4J configuration and a change in the default behavior of loggers. After thorough investigation, it became evident that recent commits merged to the release branch (CH1 and CH2), coinciding with the default behavior change in version 2.17.2, were responsible. These commits aimed to enhance performance by swapping the fully synchronized Stack data structure with a non-synchronized one, ArrayDeque, and reducing memory allocation. However, the differing logic of these two data structures meant that the same methods of the Deque interface would return different data.

As a result, we consistently entered the slow path, disregarding the current thread's stack and loading classes via the classloader.

We also took a proactive step by developing a Java Microbenchmarking Harness (JMH) benchmark specifically tailored to test the previously mentioned scenario. This benchmark was designed to replicate the conditions leading to the significant drop in throughput observed in our systems under heavy load.

A pull request was submitted and merged, rectifying the logic of accessing elements in the Deque and introducing unit tests to prevent similar issues in the future.

One of the most enlightening takeaways from this experience is the importance of not expecting a single release to have only one major change that might affect performance.

Log4J Odyssey: Unraveling the Mysteries of Code, ClassLoaders, and the Fast Lane to the Slow Path

Aliaksei Dubrouski

Sr Staff Software Engineer at LinkedIn

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Mutex ,RwLock, Poison Error

Summary of 'Clean code' by Robert C. Martin

Go Beyond Nil: The Power of Options for Robust?Code

Kubernetes Dictionary

Change coupling: visualize the cost of change

Exploring Spring Boot's Auto-Configuration

SOLID-Principles

Variable Definitions inside the Body of a Switch in C++

ASCII to UTF

领英推荐

Elusive Java Exception

2024年6月17日

How to overflow an integer in a jiffy.

2024年5月12日

Vectorized Quick Sort Part 2

2024年5月5日

Vectorized Quick Sort In JDK21

2024年4月28日

Pitfalls Of Code Generation

2024年4月21日

Diagnosing Performance After Linux Kernel Upgrade

2024年4月14日

JSSE vs BoringSSL for Java

2024年4月7日

Hunting Down Elusive Memory Issues in a Java Applications

2024年3月31日

Digging Inside the JVM

2024年3月24日

Peeking Inside the JVM

2024年3月17日