Pitfalls Of Code Generation
Fast Avro framework is the fastest serialization framework available for Java (at least in terms of deserialization speed). Originally developed by RTBHouse (avro-fastserde), it has one of the smallest message payloads, second only to Kryo. LinkedIn engineers further enhanced Fast Avro by introducing limited compatibility with various vanilla Avro versions and implementing several memory allocation optimizations (some details were presented in my QCon talk on optimizing Venice DB performance).
The secret to Fast Avro's deserialization speed lies in its ability to dynamically generate and compile specialized SerDe classes for each unique Avro schema encountered. These classes are tailored to the specific data structures defined in the schema, leading to performance gains in several ways:
While these optimizations sound impressive on paper, significant effort was invested behind the scenes to develop and integrate these techniques effectively. Even with these advancements, challenges can still arise.
Today I would like describe one interesting performance issue we faced while using Fast Avro. Since SerDe classes are generated on the fly, they are compiled using the javac compiler into bytecode and then further optimized with the help of the Just-In-Time (JIT) compiler. Although most Java developers don't have to dive into the specifics of this process, it contains interesting pitfalls.
JIT compiler's primary unit of compilation is the method. Additionally, there are strict limitations on the size of these methods (8000 bytes). These size thresholds are established for several reasons:
These factors can create a challenge for Fast Avro. If a particularly complex Avro schema necessitates a large SerDe class with extensive methods, it might exceed the JIT compilation method size limit. This can hinder performance gains as the JIT compiler struggles to optimize the oversized methods.
Following a recent deployment with a new Avro schema version, one of our development teams encountered a surge in CPU usage and latency. Profiling revealed the culprit: the JVM was utilizing Interpreted-to-Compiled (I2C) and Compiled-to-Interpreted (C2I) adapters.
These adapters, inserted by the compiler, act as bridges between compiled and interpreted code. Their presence on a frequently executed code path (hot path) is a red flag, as interpreted code is significantly slower than compiled code.
Fast Avro dumps the generated bytecode to disk. This allowed me to take a closer look using javap:
领英推荐
javap -c GenericDeserializer_2322729746181669945_207929154085253998.class | less
The culprit was the deserialize0 method within the generated GenericDeserializer class with length of 8044 bytes
public class GenericDeserializer_2322729746181669945_207929154085253998
public org.apache.avro.generic.IndexedRecord deserialize0(java.lang.Object, org.apache.avro.io.Decoder) throws java.io.IOException;
Code:
0: aload_1
....
8043: aload_3
8044: areturn
(Code column represent byte offset)
In human-written code, we limit the size of the methods to improve readability and maintainability. However, auto-generated code often prioritizes logic over aesthetics, resulting in methods that mirror the complexity of the underlying schema.
We have two solutions to solve this issue:
The -XX:-DontCompileHugeMethods flag instructs the JIT compiler to bypass its size threshold for compilation. While this might solve the immediate performance bottleneck, it can have unforeseen consequences.
Fast Avro offers a configuration option, fast.avro.field.limit.per.method, that allows to set a limit on the number of fields processed within a single method. This effectively splits the large deserialization method into smaller, more manageable chunks, ensuring the JIT compiler can optimize each one effectively.
Thanks Gaojie Liu for your help.
Staff Software Engineer at Meta
11 个月How does Fast Avro compare to Fury (https://github.com/apache/incubator-fury)?