登录查看更多内容

Kafka Client Library Comparison

Rob Golder

Director & Co Founder at Lydtech Consulting - Consultant Lead Engineer & Architect

发布日期: 2021年11月6日

Introduction

The decision on which client library to use when interacting with Kafka will in the majority of cases simply come down to the programming language used by the organisation.? However, if this is the case it is important to understand that not every client is equal, and this choice can have ramifications that may not otherwise be appreciated until late into a project.??

If the system requires a resilient messaging backbone with no message loss, guaranteed message ordering, and managed duplicate messages, then the client library plays a huge role in this.? Changing the library once the project is well underway is expensive.

This blog post looks at what the client library brings to the table, why it is such an important part of the technology stack, and compares two of the most commonly used libraries in today’s Kafka backed messaging systems.

The Role Of The Client Library

The Kafka client library is a thick client, meaning it has more responsibilities and so plays a larger role in the application than a typical thin client would.? Being a smart client it pushes around some of the boundaries of responsibility.? It helps with aspects of high availability, it has responsibilities related to exactly-once semantics, and it knows much more about the overall topology of the system.? This is all necessary in order to deliver the resiliency that Kafka guarantees.

The client library is bundled with the application, allowing the developer to use its APIs in order to interact with Kafka, typically to consume or produce from/to a Kafka topic.? The developer can often defer aspects such as the consumer offset management to the library. This is recommended as it is complex and error prone and it is always best to use a Production proven battle-hardened library.? Instead they can concentrate on the business logic needs of the application.

Client Library Options

Overview

The main client library options are the official Java Apache client library if using a JVM language, or if using a language that can link to C (such as Python, Go, Ruby, Node.js) then a language binding library built on the Apache C/C++ librdkafka client library.

The responsibilities that the client library has means it is an extremely non-trivial task to write a new client from the ground up, as you are exposed to writing driver level aspects of the system.? However this is what the development team at Klarna have done, writing the popular KafkaJS library in pure Javascript.

Apache Java Client

The Java client library is at version 3.0.0 at the time of writing, and will be the library most developers using Kafka will be familiar with.

https://kafka.apache.org/

If writing a Spring framework based Java application then the Spring Kafka module can be used providing a higher level of abstraction for the developer to code against, and offering many rich features.

https://spring.io/projects/spring-kafka

librdkafka?

Selecting a binding library for the Apache C/C++ librdkafka library is a popular (and often only) option for non-JVM languages.? At the time of writing the librdkafka library is at version 1.8.

https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md

There are then many language specific libraries that provide a binding to librdkafka:

https://github.com/edenhill/librdkafka#language-bindings

Each binding library has its own characteristics, with a set of supported features, missing features, different levels of maturity, and size of user community, so must be carefully evaluated before selection.

For example, a popular selection if developing a Node.js application would be to use node-rdkafka, which provides Node.js bindings to librdkafka:

https://github.com/Blizzard/node-rdkafka

The current version for this library is 2.11.0.? This uses librdkafka version 1.6.1.

This highlights that, as with any core library/framework, when new features and fixes are added to the librdkafka library, there is a lag while any binding library upgrades to that version, if indeed it does.? If needing to troubleshoot issues when using a binding library then this will typically mean debugging librdkafka.

KafkaJS

For development with Node.js, the main binding library for librdkafka is node-rdkafka.? However for this language there is an additional choice, KafkaJS.? This is a complete reimplementation of librdkafka in native Javascript, rather than a binding library that wraps librdkafka itself.

https://kafka.js.org/

At the time of writing this is at version 1.15.0.

KafkaJS has effectively superseded the node-rdkafka library as the primary choice of Kafka client library for NodeJS development.

领英推荐

Java in Focus: September Highlights

developrec 6 个月前

32-bit Java loses the main maintainer... and Vavr…

Artur Skowroński 8 个月前

Java Digest # VI: Spring Boot III.II, AWS Lambdas on…

Andrew Petryk 1 年前

Apache Java Client & KafkaJS Comparison

Overview

Given the importance of the role the thick client plays, it is important to understand the ramifications of the library selected.

The Apache Java Client is the primary library used in many thousands of organisations, given so many distributed systems being Java based.

For many reasons Node.js has become a popular language for writing enterprise applications too.? One such reason is the increase in popularity of serverless development, as Node.js is a natural fit for writing lambdas given their fast spin up time.? It is an enabler for Javascript professionals to use their expertise in Back End development.? KafkaJS is now the most popular client library selected for this language.

To that end a comparison of these two libraries highlights some of the considerations, trade-offs and possible pitfalls to be aware of when selecting the library.

Comparison Summary

Library Maturity

The KafkaJS feature support is not as comprehensive as the Java client library.? Features are considered to be typically around two years behind the Java client.? While some features are almost at parity, other features have not been looked at.? One example of this would be Kafka Connect.? If wanting to take advantage of this API, which is built on top of the Consumer/Producer APIs for streaming integration between two data stores, then the Java client is the only option here.

Most of the big new features in KafkaJS are no longer written by the maintainers, but rather by the community, with the original developers playing more of an oversight role.? It is a community project, not supported by any company, so expectations must be set accordingly.

In comparison, the Java client is considered fully matured, with official backing by Confluent (founded by the original Apache Kafka developers), adding additional community and commercial features.? It is proven to be battle-hardened in Production at thousands of organisations.

A further advantage of the Java library that cannot be overstated is the option to use Spring Kafka.? This framework applies Spring’s core concepts to the development of Kafka-based messaging solutions.? It provides a high level abstraction making developing with the client much easier, while still allowing the developer to interact at a low level when necessary.

Spring Kafka adds features such as fully configurable error handling and stateful retry, and all the usual benefits Spring brings such as dependency injection and the removal of boilerplate code.

Testing

Testing an application is an essential aspect of software development, so features that facilitate more thorough tests, and tests that are more straightforward to write, are a huge differentiator.

Treating the application as a black box with component testing ensures that applications based on either library would receive the same level of testing.? However the advantage of using the Java library becomes apparent with local integration testing with Spring Kafka.? Spring Kafka provides an embedded Kafka broker for in-memory testing, giving the same kind of benefit that the H2 in-memory database brings for testing database interactions.? Tests can be run with Kafka consumers and producers interacting with the embedded broker from within the developer’s IDE as they write the code, meaning immediate feedback on any breakages or issues.

For local testing NodeJs libraries would typically use the Jest mocking library to mock the broker.? But here the developer is writing the behaviour they require the mock to have, based on their expectation of how it should work, which might not be correct.? Only once the code is deployed to run against a real Kafka broker will they discover whether their code is working as it should.? This means a longer turnaround time in the iterative develop / test / fix cycle.

Documentation & Resources

Comprehensive documentation and a large active user group are important differentiators in any library selection.?

The Java client library behaviour is fully documented, with extensive blog posts, tutorials, example videos and user forums.? This is not least due to the backing of Confluent, which have many excellent tutorial videos on all aspects of Kafka development.? A web search on a complex question usually brings back a wealth of information on any topic/issue/bug.? This is likewise the case for Spring Kafka.

KafkaJS documentation is in comparison relatively brief and often does not go into the detail necessary to fully understand the behaviour.? From experience it has proven necessary to debug the KafkaJS source code in order to determine actual behaviour.? An example being understanding the retry, batching, polling and timeout behaviours, which are so comprehensively documented for the Java library. ? A web search on a complex question will not bring back the same wealth of useful, pertinent information as it would for the Java library.

Library Behaviour

Two different implementations for the thick client unsurprisingly result in different behaviours and feature support, and understanding whether limitations could impact an application is vital.

This section looks at a couple of identified differences that have proven significant in recent experience.

Idempotent Producer

In order to preserve message order and stop duplicates being written to the topic partition by the Producer when retrying due to a transient error, the producer should be configured to be idempotent.

While fully supported in the Apache Java library, in KafkaJS (version 1.1.50) the flag to enable an idempotent Producer is marked as 'Experimental'.? It would not be a sensible recommendation to rely on an experimental flag to give the behaviour required in Production as there is no guarantee as to its correctness or possibility of unexpected side-effects.

Consumer Retry

Consumer retry is an important area to get right in any Kafka-based messaging system as there are common pitfalls to avoid.

Stateless retry means that if the time to retry exceeds the consumer poll timeout the message will be re-delivered resulting in duplicate messages.

Stateful retry means that the consumer re-polls the message from the broker on each retry.? To avoid message re-delivery then only the maximum retry backoff period should be less than the poll timeout.

KafkaJS and the Java library both offer stateless retry.? However Java coupled with Spring Kafka offers stateful retry.

要查看或添加评论，请登录

Rob Golder的更多文章

Kafka Consume & Produce: Spring Boot Demo

2022年7月30日

Kafka Consume & Produce: Spring Boot Demo

Introduction The Kafka Consumer and Producer APIs provide the ability to read and write data from topics in the Kafka…
Kafka Streams: Transactions & Exactly-Once Messaging

2022年7月16日

Kafka Streams: Transactions & Exactly-Once Messaging

Introduction Kafka Transactions guarantee that when a message is received, processed, and resulting message or messages…
Kafka Streams: State Store

2022年7月2日

Kafka Streams: State Store

Introduction Kafka Streams stateful processing enables the grouping of related events that arrive at different times by…
Kafka Streams: Testing

2022年6月18日

Kafka Streams: Testing

Introduction Writing comprehensive tests for a Kafka Streams application is essential, and there are multiple types of…
Kafka Streams Spring Boot Demo

2022年6月4日

Kafka Streams Spring Boot Demo

Introduction This is part of a series of articles focussing on Kafka Streams. The first article gave an introduction to…
Kafka Streams Introduction

2022年5月21日

Kafka Streams Introduction

Introduction Kafka Streams provides an API for message streaming that incorporates a framework for processing…
Kafka Consumer Group Rebalance (2 of 2)

2022年5月7日

Kafka Consumer Group Rebalance (2 of 2)

Introduction This is the second in a two part article on Consumer Group Rebalance. In the first part consumer groups…

3 条评论
Kafka Consumer Group Rebalance (1 of 2)

2022年4月23日

Kafka Consumer Group Rebalance (1 of 2)

Introduction Consumer groups are an important characteristic of Kafka’s distributed message processing for managing…
Kafka Consumer Auto Offset Reset

2022年4月9日

Kafka Consumer Auto Offset Reset

Introduction The auto offset reset consumer configuration defines how a consumer should behave when consuming from a…

1 条评论
Kafka Producer Configuration

2022年3月26日

Kafka Producer Configuration

Introduction From maximising throughput, to ensuring idempotency, to achieving exactly-once messaging, there are many…

See all articles

Kafka Client Library Comparison

Rob Golder

Director & Co Founder at Lydtech Consulting - Consultant Lead Engineer & Architect

Introduction

The Role Of The Client Library

Client Library Options

Overview

Apache Java Client

librdkafka?

KafkaJS

领英推荐

Apache Java Client & KafkaJS Comparison

Overview

Comparison Summary

Library Maturity

Testing

Documentation & Resources

Library Behaviour

Rob Golder的更多文章

社区洞察

其他会员也浏览了

List of Java keywords

Feature Freeze for JDK 22: What Will the New Edition Bring? - JVM Weekly vol. 63

Project Babylon: Chance for LINQ (and more) in Java - JVM Weekly vol. 56

Exploring GraalVM: A New Era in the Java Ecosystem

Java Record

Java vs. Python: Which is Better for Enterprise Applications?

Hidden Benefits of Analyzing Java Garbage Collection

Java 8 - New Features

Let's write a simple microservice in Clojure

Why Java Remains a Relevant Programming Language in 2025

Introduction

The Role Of The Client Library

Client Library Options

Overview

Apache Java Client

librdkafka?

KafkaJS

领英推荐

Apache Java Client & KafkaJS Comparison

Overview

Comparison Summary

Library Maturity

Testing

Documentation & Resources

Library Behaviour

Rob Golder的更多文章

Kafka Consume & Produce: Spring Boot Demo

Kafka Streams: Transactions & Exactly-Once Messaging

Kafka Streams: State Store

Kafka Streams: Testing

Kafka Streams Spring Boot Demo

Kafka Streams Introduction

Kafka Consumer Group Rebalance (2 of 2)

Kafka Consumer Group Rebalance (1 of 2)

Kafka Consumer Auto Offset Reset

Kafka Producer Configuration

社区洞察

其他会员也浏览了

List of Java keywords

Feature Freeze for JDK 22: What Will the New Edition Bring? - JVM Weekly vol. 63

Project Babylon: Chance for LINQ (and more) in Java - JVM Weekly vol. 56

Exploring GraalVM: A New Era in the Java Ecosystem

Java Record

Java vs. Python: Which is Better for Enterprise Applications?

Hidden Benefits of Analyzing Java Garbage Collection

Java 8 - New Features

Let's write a simple microservice in Clojure

Why Java Remains a Relevant Programming Language in 2025