Kafka vs. Pulsar
Kafka is here for a long time. Perhaps too long...
I bumped into this article (titled: Pub/sub messaging: Apache Kafka vs. Apache Pulsar) the other day. And I was thinking now to put a few words of my experience into this battle field.
So, let's get down to business...
Kafka pros:
- - It's very mature with a very rich and useful documentation.
- - As it's here for long, a mature and an extensive community of active users
- - Kafka Streams.
- - Seems like simpler to operate in production - less components as broker node provides storage.
- - A kind of transactions.
- - Offsets are provided, so you have the flexibility of fetching messages (yet, you can't fetch a specified message).
Kafka cons:
- - Consumer can't acknowledge message from a different thread.
- - No multitenancy.
- - No robust Multi Datacenter replication - yet, offered in Confluent Enterprise.
Pulsar pros:
- - Features rich – persistent and non-persistent topics, multitenancy, ACLs, Multi Datacenter replication, and more.
- - A more flexible client API that includes CompletableFutures, fluent interfaces and more.
- - For those that work multi-threaded, the java client components are a thread safe - consumer can acknowledge messages from different threads.
-It seems like it's a bit easier to use. In Kafka, the broker is dumb and the consumers do the job of structuring communications as they see fit. This flexibility comes at the price of the user of Kafka having to understand how to make the pieces fit together.- - You can do things that are not easily done, or maybe impossible in Kafka, such as, multi-tenancy (for security, and isolation), resource management (for topic throttling and quotas), geo-replication.
- - It has some features that Kafka currently lacks, like seeking to a particular via MessageId (yet you are lucking offsets).
- - Pulsar scales to millions of topics, which Kafka is limited by the way it structures data in Zookeeper.
- - Easier deployment. A standalone Pulsar will start its own local Zookeeper. No need to start it manually.
- - It's written in Java, Kafka on the other hand, is a mix of Scala and Java code.
Pulsar cons:
- - In terms of documentation, the java client has little to no documentation.
- - A small community, a plenty room to grow.
- - Thought very useful (for instance, MessageId can also be stored outside Pulsar and be used to rollback to specific message), MessageId concept is heavily tied to BookKeeper - consumers cannot easily position itself on the topic compared to Kafka offset which is continuous sequence of numbers.
- - Reader cannot easily read last message in the topic - need to go through all the messages to the end.
- - At the moment no transactions are offered.
- - More complexity as Zookeeper, Broker nodes and BookKeeper – are involved.
What you can take out of this, well this 's up to you... My thoughts are around to give Pulsar a fair chance on my next project.
Staff Customer Success Technical Architect at Confluent
5 年No multi tenancy in Kafka? Not sure I follow. Data center replication? How about stretch clusters and mirror maker? I think you were only considering confluent replicator?