Kafka Client Library Comparison
Rob Golder
Director & Co Founder at Lydtech Consulting - Consultant Lead Engineer & Architect
Introduction
The decision on which client library to use when interacting with Kafka will in the majority of cases simply come down to the programming language used by the organisation.? However, if this is the case it is important to understand that not every client is equal, and this choice can have ramifications that may not otherwise be appreciated until late into a project.??
If the system requires a resilient messaging backbone with no message loss, guaranteed message ordering, and managed duplicate messages, then the client library plays a huge role in this.? Changing the library once the project is well underway is expensive.
This blog post looks at what the client library brings to the table, why it is such an important part of the technology stack, and compares two of the most commonly used libraries in today’s Kafka backed messaging systems.
The Role Of The Client Library
The Kafka client library is a thick client, meaning it has more responsibilities and so plays a larger role in the application than a typical thin client would.? Being a smart client it pushes around some of the boundaries of responsibility.? It helps with aspects of high availability, it has responsibilities related to exactly-once semantics, and it knows much more about the overall topology of the system.? This is all necessary in order to deliver the resiliency that Kafka guarantees.
The client library is bundled with the application, allowing the developer to use its APIs in order to interact with Kafka, typically to consume or produce from/to a Kafka topic.? The developer can often defer aspects such as the consumer offset management to the library. This is recommended as it is complex and error prone and it is always best to use a Production proven battle-hardened library.? Instead they can concentrate on the business logic needs of the application.
Client Library Options
Overview
The main client library options are the official Java Apache client library if using a JVM language, or if using a language that can link to C (such as Python, Go, Ruby, Node.js) then a language binding library built on the Apache C/C++ librdkafka client library.
The responsibilities that the client library has means it is an extremely non-trivial task to write a new client from the ground up, as you are exposed to writing driver level aspects of the system.? However this is what the development team at Klarna have done, writing the popular KafkaJS library in pure Javascript.
Apache Java Client
The Java client library is at version 3.0.0 at the time of writing, and will be the library most developers using Kafka will be familiar with.
If writing a Spring framework based Java application then the Spring Kafka module can be used providing a higher level of abstraction for the developer to code against, and offering many rich features.
librdkafka?
Selecting a binding library for the Apache C/C++ librdkafka library is a popular (and often only) option for non-JVM languages.? At the time of writing the librdkafka library is at version 1.8.
There are then many language specific libraries that provide a binding to librdkafka:
Each binding library has its own characteristics, with a set of supported features, missing features, different levels of maturity, and size of user community, so must be carefully evaluated before selection.
For example, a popular selection if developing a Node.js application would be to use node-rdkafka, which provides Node.js bindings to librdkafka:
The current version for this library is 2.11.0.? This uses librdkafka version 1.6.1.
This highlights that, as with any core library/framework, when new features and fixes are added to the librdkafka library, there is a lag while any binding library upgrades to that version, if indeed it does.? If needing to troubleshoot issues when using a binding library then this will typically mean debugging librdkafka.
KafkaJS
For development with Node.js, the main binding library for librdkafka is node-rdkafka.? However for this language there is an additional choice, KafkaJS.? This is a complete reimplementation of librdkafka in native Javascript, rather than a binding library that wraps librdkafka itself.
At the time of writing this is at version 1.15.0.
KafkaJS has effectively superseded the node-rdkafka library as the primary choice of Kafka client library for NodeJS development.
领英推荐
Apache Java Client & KafkaJS Comparison
Overview
Given the importance of the role the thick client plays, it is important to understand the ramifications of the library selected.
The Apache Java Client is the primary library used in many thousands of organisations, given so many distributed systems being Java based.
For many reasons Node.js has become a popular language for writing enterprise applications too.? One such reason is the increase in popularity of serverless development, as Node.js is a natural fit for writing lambdas given their fast spin up time.? It is an enabler for Javascript professionals to use their expertise in Back End development.? KafkaJS is now the most popular client library selected for this language.
To that end a comparison of these two libraries highlights some of the considerations, trade-offs and possible pitfalls to be aware of when selecting the library.
Comparison Summary
Library Maturity
The KafkaJS feature support is not as comprehensive as the Java client library.? Features are considered to be typically around two years behind the Java client.? While some features are almost at parity, other features have not been looked at.? One example of this would be Kafka Connect.? If wanting to take advantage of this API, which is built on top of the Consumer/Producer APIs for streaming integration between two data stores, then the Java client is the only option here.
Most of the big new features in KafkaJS are no longer written by the maintainers, but rather by the community, with the original developers playing more of an oversight role.? It is a community project, not supported by any company, so expectations must be set accordingly.
In comparison, the Java client is considered fully matured, with official backing by Confluent (founded by the original Apache Kafka developers), adding additional community and commercial features.? It is proven to be battle-hardened in Production at thousands of organisations.
A further advantage of the Java library that cannot be overstated is the option to use Spring Kafka.? This framework applies Spring’s core concepts to the development of Kafka-based messaging solutions.? It provides a high level abstraction making developing with the client much easier, while still allowing the developer to interact at a low level when necessary.
Spring Kafka adds features such as fully configurable error handling and stateful retry, and all the usual benefits Spring brings such as dependency injection and the removal of boilerplate code.
Testing
Testing an application is an essential aspect of software development, so features that facilitate more thorough tests, and tests that are more straightforward to write, are a huge differentiator.
Treating the application as a black box with component testing ensures that applications based on either library would receive the same level of testing.? However the advantage of using the Java library becomes apparent with local integration testing with Spring Kafka.? Spring Kafka provides an embedded Kafka broker for in-memory testing, giving the same kind of benefit that the H2 in-memory database brings for testing database interactions.? Tests can be run with Kafka consumers and producers interacting with the embedded broker from within the developer’s IDE as they write the code, meaning immediate feedback on any breakages or issues.
For local testing NodeJs libraries would typically use the Jest mocking library to mock the broker.? But here the developer is writing the behaviour they require the mock to have, based on their expectation of how it should work, which might not be correct.? Only once the code is deployed to run against a real Kafka broker will they discover whether their code is working as it should.? This means a longer turnaround time in the iterative develop / test / fix cycle.
Documentation & Resources
Comprehensive documentation and a large active user group are important differentiators in any library selection.?
The Java client library behaviour is fully documented, with extensive blog posts, tutorials, example videos and user forums.? This is not least due to the backing of Confluent, which have many excellent tutorial videos on all aspects of Kafka development.? A web search on a complex question usually brings back a wealth of information on any topic/issue/bug.? This is likewise the case for Spring Kafka.
KafkaJS documentation is in comparison relatively brief and often does not go into the detail necessary to fully understand the behaviour.? From experience it has proven necessary to debug the KafkaJS source code in order to determine actual behaviour.? An example being understanding the retry, batching, polling and timeout behaviours, which are so comprehensively documented for the Java library. ? A web search on a complex question will not bring back the same wealth of useful, pertinent information as it would for the Java library.
Library Behaviour
Two different implementations for the thick client unsurprisingly result in different behaviours and feature support, and understanding whether limitations could impact an application is vital.
This section looks at a couple of identified differences that have proven significant in recent experience.
Idempotent Producer
In order to preserve message order and stop duplicates being written to the topic partition by the Producer when retrying due to a transient error, the producer should be configured to be idempotent.
While fully supported in the Apache Java library, in KafkaJS (version 1.1.50) the flag to enable an idempotent Producer is marked as 'Experimental'.? It would not be a sensible recommendation to rely on an experimental flag to give the behaviour required in Production as there is no guarantee as to its correctness or possibility of unexpected side-effects.
Consumer Retry
Consumer retry is an important area to get right in any Kafka-based messaging system as there are common pitfalls to avoid.
Stateless retry means that if the time to retry exceeds the consumer poll timeout the message will be re-delivered resulting in duplicate messages.
Stateful retry means that the consumer re-polls the message from the broker on each retry.? To avoid message re-delivery then only the maximum retry backoff period should be less than the poll timeout.
KafkaJS and the Java library both offer stateless retry.? However Java coupled with Spring Kafka offers stateful retry.