Kafka vs. Pulsar

Kafka vs. Pulsar

Kafka is here for a long time. Perhaps too long...

I bumped into this article (titled: Pub/sub messaging: Apache Kafka vs. Apache Pulsar) the other day. And I was thinking now to put a few words of my experience into this battle field.

So, let's get down to business...

 Kafka pros:

  1. -      It's very mature with a very rich and useful documentation.
  2. -      As it's here for long, a mature and an extensive community of active users
  3. -      Kafka Streams.
  4. -      Seems like simpler to operate in production - less components as broker node provides storage.
  5. -      A kind of transactions.
  6. -      Offsets are provided, so you have the flexibility of fetching messages (yet, you can't fetch a specified message).

 Kafka cons:

  1. -      Consumer can't acknowledge message from a different thread.
  2. -      No multitenancy.
  3. -      No robust Multi Datacenter replication - yet, offered in Confluent Enterprise.

 Pulsar pros:

  1. -      Features rich – persistent and non-persistent topics, multitenancy, ACLs, Multi Datacenter replication, and more.
  2. -      A more flexible client API that includes CompletableFutures, fluent interfaces and more.
  3. -      For those that work multi-threaded, the java client components are a thread safe - consumer can acknowledge messages from different threads.
  4. -      It seems like it's a bit easier to use. In Kafka, the broker is dumb and the consumers do the job of structuring communications as they see fit. This flexibility comes at the price of the user of Kafka having to understand how to make the pieces fit together.
  5. -      You can do things that are not easily done, or maybe impossible in Kafka, such as, multi-tenancy (for security, and isolation), resource management (for topic throttling and quotas), geo-replication.
  6. -      It has some features that Kafka currently lacks, like seeking to a particular via MessageId (yet you are lucking offsets).
  7. -      Pulsar scales to millions of topics, which Kafka is limited by the way it structures data in Zookeeper.
  8. -      Easier deployment. A standalone Pulsar will start its own local Zookeeper. No need to start it manually.
  9. -      It's written in Java, Kafka on the other hand, is a mix of Scala and Java code.

 Pulsar cons:

  1. -      In terms of documentation, the java client has little to no documentation.
  2. -      A small community, a plenty room to grow.
  3. -      Thought very useful (for instance, MessageId can also be stored outside Pulsar and be used to rollback to specific message), MessageId concept is heavily tied to BookKeeper - consumers cannot easily position itself on the topic compared to Kafka offset which is continuous sequence of numbers.
  4. -      Reader cannot easily read last message in the topic - need to go through all the messages to the end.
  5. -      At the moment no transactions are offered.
  6. -      More complexity as  Zookeeper, Broker nodes and BookKeeper – are involved.

What you can take out of this, well this 's up to you... My thoughts are around to give Pulsar a fair chance on my next project.

Anthony Davis

Staff Customer Success Technical Architect at Confluent

5 年

No multi tenancy in Kafka? Not sure I follow. Data center replication? How about stretch clusters and mirror maker? I think you were only considering confluent replicator?

回复

要查看或添加评论,请登录

Eran Shaham的更多文章

  • Microservices Chatbot and Coronavirus

    Microservices Chatbot and Coronavirus

    A few weeks ago I shared a short post about a new initiative of mine to have a fun bot to make life much easier in…

  • Docker image build vs. jib

    Docker image build vs. jib

    Jib is an open-source Java containerizer originally coming from Google. Jib allows to build Docker images from Java…

  • A JSON schema validator

    A JSON schema validator

    A simple JSON schema validator for the Vert.x world.

    2 条评论
  • vertx-lucene-classification

    vertx-lucene-classification

    Lucene is here for a long time, ML was added to Lucene for a few releases now, yet some aspects were left out. ML can…

  • UMLet- an open source UML tool

    UMLet- an open source UML tool

    Some aspects of my day job work are drawing many diagrams. That's part of an architect role to create design documents…

    2 条评论
  • Revive- a Single Page Application framework

    Revive- a Single Page Application framework

    I'm uploading a short presentation about a new open sourced Revive which I've made public. Revive is a new light open…

  • A few words on Docker and Kubernetes

    A few words on Docker and Kubernetes

    We all know Docker Engine; it’s a container runtime. We can run “docker run” on a host whether it’s a server or a VM…

    2 条评论
  • A poor man Dependency Injection

    A poor man Dependency Injection

    Dependency Injection (DI) has been around for a while now. A typical use case would be, for instance, the same piece of…

  • Apache Storm and big data

    Apache Storm and big data

    A background: Big data is here for a while now. At the practical level, big data helps us to better understand our…

  • Cassandra VS. MongoDB

    Cassandra VS. MongoDB

    Cassandra and MongoDB became to be the two of the most popular NOSQL databases that are running around in the last few…

    4 条评论

社区洞察

其他会员也浏览了