Classloaders to the Rescue!
How I used three esoteric, obscure features of Java to solve a real-world problem at Amazon
[cross post of my original 10/26/21 blog since that blog may be behind a paywall sometimes.]
Sometimes I read about a weird, esoteric feature in the JVM and I think ok, sure, but when the heck is this ever actually used? Back in 2013, I was facing a tricky little problem back at Amazon, and I ended up using three weird, esoteric features in the JVM to solve it.
It all started when I needed to load test my service, so I wrote a bunch of load generation code. It was responsible for executing transactions at a desired rate (so that I could answer questions like, “what happens when my system receives a load of 250,000 transactions per second?”). This sounds easy, but it gets tricky at high throughput and requires careful thread management (hint: ScheduledExecutorService on steroids, but also distributed among hundreds or thousands of machines). It also allowed for blending transaction types: if you wanted to load test an RPC service that had 3 APIs X, Y, Z, you could write a piece of code to simulate a customer calling X, one to simulate a customer calling Y, and one to simulate a customer call to Z. Then you could ask the platform to execute your code not just with a desired rate, but also with a desired blend of operations, eg. run at 10,000 transactions per second for 30 minutes with 70% X, 20% Y, 10% Z. Lastly, the core engine aggregated metrics for the run, such as throughput, error rate and percentile latency for all these RPC calls over time, etc. and created dashboards.
I realized that most of the code I had written to load test my service was reusable so others could leverage it. My framework was generic, engineers could write their product-specific code to be executed by it. So I decided to clean it up, refactor it and offer it as a platform more broadly for other amazonians (I didn’t know it back then, but this would would eventually become the platform that tens of thousands of services use today to ensure they’re ready for peak traffic). The desire to open up the platform got me into the business of vending my software to other engineers at Amazon.
Some software companies, like Google, have their code in a giant monorepo. Everybody works in the same repository, with the same dependencies, and building off the same version (head). Other companies, like Amazon, have smaller granularity, often per team. Amazon calls these things versionsets (“VS”)— they are isolated build and runtime closures (your Java classpath). You create a versionset for your product, initially empty. And you add Java packages (JARs) to it. Those JARs bring their dependencies into your VS, because they need them to operate. Those dependencies bring their dependencies, which brings their dependencies, and so forth. So when you added a JAR to your VS it could bring hundreds of transitive JARs. Your VS could get bloated fast (and unless you regularly cleaned up unused packages they would live in it forever). To keep things stable, a VS only accepted specific versions of Java packages. For example, you brought Log4j 2.14 into your VS. If there was a Log4j 2.15 version, you had to bring it explicitly. This kept your VS from breaking when there were API changes in dependencies. That’s where “versionset” got its name: it was an isolated set of Java JARs and versions, to be used as build and runtime closure for a product.
In a monorepo world, you just say to your customers, “hey! grab my code from this path in the repository.” Done. In a versionset world, those customers need to import your Java package into their VS explicitly. That was the status quo of vending software at Amazon, to just say “hey bring my Java package version x.y into your versionset and start using it!”
[ If you’re wondering: why didn’t I just build a service? That would have solved this. There were other, more complicated reasons I couldn’t vend this as a service at that time. ]
I did not love that software vending model, for many reasons.
I disliked everything about the status quo. I needed to think outside the box.
Classloaders to the Rescue!
The basic problem was that if you had a piece of code that was built in one VS (my framework’s), and you tried to use a JAR from a different VS (the customer’s), the JVM wasn’t happy, because they had different build and runtime closures. I was chatting with one of my mentors, Cary (a Principal at Amazon), and he casually mentioned classloaders. I didn’t know anything about classloaders, but my friend Trevor did and he helped me bootstrap the thing.
What are Java classloaders? “The Java Class Loader is a part of the Java Runtime Environment that dynamically loads Java classes into the Java Virtual Machine. Usually classes are only loaded on demand. The Java run time system does not need to know about files and file systems as this is delegated to the class loader.”
Normally, you give classloaders very little thought. Classes are just magically loaded when you need them. How? Not my problem. Most programs have a straight forward, single classpath, so things are straight forward. You don’t normally think about how those classes are actually being loaded.
Turns out you can have multiple classloaders running in the same JVM, which gives you classpath isolation. I could have also achieved that by running different processes and communicating via sockets, but this was for code that needed to execute hundreds of thousands of times per second, so performance was very important. I profiled and verified that two classloaders in the same JVM was significantly faster than 2 JVMs.
Trevor was right: this seemed like potentially a good solution for my little problem. This was going to allow me to have multiple versions of the same dependency, one in the classloader responsible for the platform code, and one in the classloader responsible for the customer code, happily co-existing with each other.
My platform booted in the primary, default classloader. It then created a child classloader, and it loaded the customer JAR (and its dependencies) into the child classloader. It worked!
I did run into one little problem. Java’s default classloader behavior is parent-first strategy. That means that whenever a class needs to be loaded, it will first search in the parent context and if not found, it will search in the child context. This generally makes sense, but in my case it meant my customers’ code could end up using my dependencies instead of theirs, which led to very subtle runtime failures, very hard to debug and understand. So Daniel and I ended up writing our own classloader, switching to a child-first strategy (good article about parent-first vs. child-first classloader delegation). There was the occasional weird class that had its own classloader logic (log4j, I’m looking at you…) so we even had to have some custom logic in the classloader to treat them differently.
领英推荐
I had a bigger problem though. The JVM does not consider a class loaded from classloader A to be castable to the exact same class loaded from classloader B. Sure, you, the human, know they’re the same class. But if they’re living in different classloaders, the JVM thinks they’re different.
Reflection to the Rescue!
I had used one obscure, esoteric feature of the JVM to solve the first problem (create a secondary classloader to load customer code with clashing dependencies). So I turned to another obscure, esoteric feature of the JVM to solve my second problem (call code from one classloader in a different classloader): reflection.
What is Java reflection? From here, “Reflection is an API which is used to examine or modify the behavior of methods, classes, interfaces at runtime.”
Reflection is ugly to look at, but incredibly powerful. Say I wanted to call a method doSomething in an object foo from my secondary class loader. Instead of doing this:
foo.doSomething();
I could do this:
Method method = foo.getClass().getMethod("doSomething", null);
method.invoke(foo, null);
This is hideous, verbose code, but it works very well across classloaders! It gets even uglier when parameters are involved in those method calls.
Annotations to the Rescue!
I had one more decision to make. Traditionally to write code that executes within a framework, the framework exposes a Java interface, and you must create a Java class that implements the interface. This is a fine and tried way but I disliked a few things about it. One, sometimes customers already had test code that behaved like transactions X, Y and Z. So I didn’t want to force them to have to refactor it or have multiple copies of it around. Secondly I didn’t have 100% certainty that I had ironed out the exact interface and I suspected many cases were going to surface as my platform gained more customers, so I wanted to harden the API but leave the door open for growth. Instead of Java interfaces, I turned to yet another somewhat esoteric feature of the JVM: annotations.
What are Java annotations? From here: Java annotations are used to provide metadata for your Java code. These can be runtime instructions that tell others what to do with your code.
Annotations turned out to be an elegant solution to my problem of how to evolve my platform, and keep it flexible, while having a reasonably hardened API. The annotations could encode all kinds of interesting metadata for my customers to tell my platform. You could annotate your initialization code, your termination code, and your transactions. TestNG and jUnit both do a nice job with this, with @BeforeClass, @BeforeTest, @Test, @DataProviders, etc, so I took a lot of my inspiration from TestNG. You could annotate pre-existing code in a couple of minutes and have a working load test!
And miraculously it all worked!
In my final solution, I vended my software via an interface package that contained annotations. Customers brought the interface package into their VS. Because it was just an interface, it brought no dependencies into their VS, just itself. And because the interface was hardened, I didn’t have to worry about customers refreshing the interface regularly. The interface offered a bunch of annotations, so it afforded me flexibility to grow it in the future. And lastly, since I loaded my closure into the primary classloader and my customer’s closure into a secondary classloader, incompatible classes could happily coexist.
To be honest, the code necessary to make all this happen is probably the ugliest code I’ve ever written. The APIs for dealing with classloaders, reflection and annotations are powerful, but they aren’t particularly elegant or beautiful, and neither was my code. But it was effective, and it has survived a decade in production, being executed millions of times per second in production right now as you’re reading this story. The tradeoff was product complexity for me, or a more complicated onboarding story for my customers, so I chose the former. Customer Experience always wins. I think that had a lot to do with the product usage growing to tens of thousands of services at Amazon using it every day to validate they can scale. The fact that it’s still running today is a testament to those design choices holding up to the passage of time.
Next time you see an esoteric feature, just think: it may end up being just what you need some day!
SDE at Amazon Business
6 个月Nice read
Senior Software Engineer / Java Community Champion at Andela
6 个月Thanks for sharing, Carlos Arguelles 1. Some software companies, like Google, have their code in a giant monorepo. There are some notable exceptions to the use of this single widely accessible repository, particularly the two large open-source projects Chrome and Android, which use separate open-source repositories, and some high-value or security-critical pieces of code for which read access is locked down more tightly [1] 2. Reflection is ugly to look at, but incredibly powerful... Reflections comes with hidden price i. You lose all the benefits of compile-time type checking ii. Reflective method invocation is much slower than normal method invocation [2] [1]Fergus Henderson, Software Engineering at Google, Revised 19 Feb 2019. , Available at < https://arxiv.org/pdf/1702.01715 > [Accessed: September 7, 2024] [2] Joshua Bloch, Effective Java Programming Third Edition, pp 282-284
Top System Design Voice | SDE@OneCard | Backend Developer | Problem Solver | Health and Tech
6 个月With each line of read, my interest was increasing. However, in a VS where topologically packages get saved, is it a jar or branch reference of code? As I see some teams build RPMs over VS. Also, the classloader concept is interesting and would definitely be using it on demand. Very insightful
Client Technical Specialist and Chief Database Architect at Mphasis, a Blackstone company || Health AI @ DocNote.ai || GenAI Search @ MetaRAG.ai || GRC @ NIST.ai || KYC @ OFAC.ai
6 个月Carlos Arguelles good read: 'the code necessary to make all this happen is probably the ugliest code I’ve ever written.' That's because you are using java ??. Hope you were SUPing for Labor Day.... Thanks again for your post.
The Almighty TPS Gen! I kinda loved that bit of every service launch at Amazon, It Just Worked! Testament to all those great, and shall I say, tasteful design decisions. Thanks for the deep thoughts into it!