TL; DR Java Dependency Management is mostly broken. But we can fix it together!
Photo by https://unsplash.com/@jontyson

TL; DR Java Dependency Management is mostly broken. But we can fix it together!

Recently I’ve spent considerable time working on dependency management and I’ve concluded that it is a miracle of modern software engineering that dependency management works even some of the time. I started writing java software professionally at the time when dependency management was accomplished by copying jars into the lib folder manually and then updating the -cp argument by hand. This friction might have been a factor that limited software reuse. The java ecosystem has come a long way since then with the introduction of open-source community built and maintained dependency management tools like Maven and Gradle. Maven Central has also made it simpler to find reusable libraries. Additionally, the java open-source ecosystem has flourished and is now several orders of magnitude larger than when I started building software professionally. This is all great news, it means we can build more complicated software from reusable components and solve new problems, because we don’t need to solve every problem again from scratch! At the same time, the job of managing dependencies has become more difficult. There isn’t a single root cause that we can just solve to make it better. It is a combination of multiple factors that make this such a challenging task.?

Technical Challenges

Machine gears

Let’s start with the support for dependency management in java itself. The problem here is that it doesn’t really exist. Java 9 introduced the notion of java modules which added some building blocks to the java toolchain that can be used to build better dependency management tools. Other than java modules all we have is the classpath and the class loader hierarchy to work with. We were able to do so much with this. Both Maven and Gradle only use the classpath as a mechanism for managing dependencies for java projects. The dependency data you provide in a pom or build file eventually gets translated into the classpath argument passed to the java process. That’s it. After that it’s all up to the class loader to convert the import statement in your code into the correct class to load from one of the jar files in the classpath. Java doesn’t support the notion of version, the information provided by the compiler doesn’t capture what version of a class the code was compiled against because that notion is not supported by the java toolkit. OSGi, the java module system developed as part of the Eclipse IDE, uses class loader hierarchies to support a more complex dependency management system.?

Next let’s talk a little bit about solving version constraints, something that both Maven and Gradle need to do to convert the information you provided in the metadata files into the classpath argument. “Dependency Solving Is Still Hard, but We Are Getting Better at It” is an optimistic look at the challenges in solving dependency constraints. When the authors say hard, what they mean is NP-complete, it turns out that converting all those version constraints into a selection of jar files to put on the classpath is an instance of a boolean satisfiability problem which was proven to be NP-complete. There are SAT solvers that are available like Sat4j which was used by OSGi, however there are additional heuristics that are applied by package managers when a solution can’t be found. The most famous problem is the diamond dependency conflict where two dependencies in your application depend on a different version of a common dependency. In this case there is only one correct solution, keep both versions, however that violates the java classpath highlander rule, there can only be one! Now that rule is not really enforced by any java tooling, you won’t even get a warning if you have two versions of the same class in the classpath, it’s just that the class loader will load the one it finds first! Different package managers will solve this conflict in different ways, but they will arbitrarily pick one version, sometimes it will be the “latest” version and sometimes it will be the version closest to the root of the dependency graph. You as a java developer need to know the algorithm to predict the outcome! How many of us can predict which dependencies Maven will pick in our head?

Social Challenges

Group of people sitting down and pointing at a laptop screen

So far we’ve covered the technical challenges which are not sufficient since we work in a socio-technical ecosystem, we need to consider the social challenges as well. The participants in the java software ecosystem fall into three groups, Producers those who publish reusable components into the ecosystem, Consumers those who use the components available in the ecosystem and Maintainers, those who oversee the overall ecosystem. Most folks who interact with the Java ecosystem are consumers, there are far fewer producers, and even fewer folks who are Maintainers.?

The first challenge is versioning. Dependency managers support the notion of Semantic Versioning, which is a system that uses three numbers to encode a lot of information about the changes in the library. The challenge with SemVer is that in the Java open-source ecosystem it is not actually followed. The authors of “Semantic versioning and impact of breaking change in the Maven repository” learned that breaking changes are widespread even in non-major release versions. Producers struggle to follow semantic versioning rules. It’s difficult to predict if a particular change will break a consumer especially when you don’t know who your consumers are. Consumers in the Maven ecosystem also struggle to use Semantic versioning, according to “Dependency Versioning in the Wild” a tiny minority of dependency version specification, less than 1%, use semantic versioning correctly, most use other formats like ranges, or wild cards. The maven ecosystem doesn’t have built in standard tooling to check for semantic versioning compliance in version constraint definitions.

The next challenge, which is related to versioning, is backward compatibility. The authors of “Historical and Impact Analysis of API Breaking Changes: A Large-Scale Study” found that out of 500K+ API changes by producers in the java ecosystem 27% were breaking changes for consumers, they also noted that the frequency of breaking changes in a particular library increased overtime. Thankfully, the observed impact on consumers was not large with only 2.54% of client applications being impacted at the media for a particular API breaking change. Because of the potential for breaking changes consumers tend to not upgrade their dependency versions. “Do Developers Update Their Library Dependencies?” The answer is that most consumers rarely update the dependencies of their systems, with 81.5% choosing to remain with a “popular” older version of dependencies. Consumers don’t even upgrade when there is a known security vulnerability because they mostly don’t know about them. Sixty-nine percent of developers who were notified about the security vulnerability by the authors of the paper said they were not aware of the vulnerability and took action to mitigate it.?

Finally, the dependency ecosystem is large, complicated, and bloated. A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem found that unnecessary artifacts account for up to 75% of dependencies, 57% are transitive dependencies. As a consumer you might only be using a small part of the full API you depend on, however you are still on the hook to maintain the large, complicated dependency graph, where even a small unrelated change can break your application. It is a wonder that this whole thing works at all.

Moving forward together

The backs of a group of people with their hands on each other's back.

We all have a role to play in improving the state of java dependency management and there is a growing number of tools that can help us all. Whether you participate in this ecosystem as a consumer, producer, or maintainer you can make changes in the way you work to help improve the situation.

Consumers can take steps to reduce the risk of a dependency version upgrade breaking their system. Use automated dependency upgrade tools like Dependabot. These tools help you upgrade more often than you would otherwise according to the findings in “Can Automated Pull Requests Encourage Software Developers to Upgrade Out-of-Date Dependencies?” You will learn about upgrades that might break you sooner, you’ll know which security vulnerabilities you are exposed to, and you’ll take advantage of bug fixes and performance improvements without doing a ton of work or subscribing to a lot of newsletters. You can only take advantage of these tools if you write automated tests for libraries you depend on and run them during continuous integration. You can limit how much of your code uses dependencies directly by adding an adapter between your code and the dependency which will give you a target for your unit tests, as well as reduce the cost of handling breaking changes. Additionally, you can leverage tools like DepClean to remove unused and unnecessary dependencies. You can also use RIDDLE to help you identify potential version conflicts in your dependency graph.

Producers can take steps to minimize the impact of their changes on their consumers which will in turn make it more likely that they will upgrade frequently. The Google Best Practices for Java Libraries guide has a lot of great advice for you to follow. Additionally, you can use tools like the Java API Compliance Checker to make sure that the changes you’re making are binary and source compatible, adhere to Semantic Versioning and do your best to limit breaking changes to just Major versions. Finally, I would highly recommend that you implement a compatibility testing process for your project. It is your best tool to validate behavior compatibility. You can use Google’s Open Source Insights tool to find your consumers and use their tests to validate that your changes will not break them.

Maintainers can modernize the tools available to the community by looking at progress made in other language ecosystems. The Java language maintainers can continue to improve on the java module system and learn from newer language toolchains like go-lang and their implementation of modules. Maven and Gradle can improve their support for dependency management by introducing features like lock files available in more modern tools like yarn, npm, and cargo and by encouraging better adherence to semantic versioning. There is also work to be done to improve how we implement adherence to semantic versioning as outlined in “Putting the semantics into semantic versioning”. It is impossible to express all the nuance and complexity of software changes with three integers.?

Together we can move forward to making software reuse simpler and safer so that we can all benefit from a healthy, and growing java software ecosystem. What do you think we can all do to make dependency management simpler in java? (Leave your suggestion in the comments below).

Chris Krycho

I help teams adopt TypeScript and Rust, develop technical strategy, and solve thorny software problems. Previously LinkedIn.com tech lead; Ember TypeScript & Framework team emeritus. Theologian, writer, composer.

2 年

We should sync up sometime: we’ve been working on this same problem domain in parallel in the JS/TS ecosystem, and the details are very different because of different language affordances, history, and cultural norms, but a lot of the fundamentals (especially around SemVer) overlap a bunch. I would definitely love to pick your brain on the Java side at a minimum, and maybe bounce some ideas off of you about language-agnostic problems and possible approaches.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了