Kafka Alternatives
Kafka Alternatives
Apache Kafka?is an open source software messaging bus that uses stream processing.?Because it’s a distributed platform known for its scalability, resilience, and performance, Kafka has become very popular with large enterprises. In fact,?80% of the Fortune 500 use Kafka.
However, there’s no such thing as a one-size-fits-all solution. What’s best for Uber or PayPal may not be ideal for your application. Fortunately, there are several alternative messaging platforms available.?
Knowing which platform is right for you requires understanding the pros and cons of each. To help you make the right choice, we’ll take an in-depth look at Apache Kafka and four popular Kafka alternatives.?
Summary of Kafka alternatives
Four of the most popular Kafka alternatives are:
Before taking a detailed look at Kafka and those four alternatives, let’s start with a high-level overview of features.?
Understanding stream processing and other messaging types
Understanding why performant messaging is essential requires taking a step back and understanding how modern systems scale. Today, applications are generating data at a much higher rate than ever before. As a result, systems need to be scaled to meet the data processing requirements. There are two basic approaches to achieve the required scaling:
Scaling up is efficient from a performance perspective but very expensive. On the other hand, scaling out is cheap since it leverages commodity hardware, but it introduces communication challenges in distributed systems.
In recent years, scaling out has become a popular choice for modern applications. To accommodate scaling out, different machines in a cluster need to coordinate control and data messages efficiently. Because of this,?messaging systems?are the backbone of any of the distributed systems.
Message queues, which connect different systems like streaming data between microservices, work differently. Multiple senders (producers) can send a message to the queue. However, a message can only be consumed by a single consumer. Some message queue implementations provide a mechanism for a consumer to acknowledge if a message is successfully processed. Message queues are easily scalable because systems can add more producers and consumers independently.
To summarize, the basic concepts in the messaging systems we’ve reviewed are:
An overview of Kafka and Kafka alternatives
Apache Kafka is a well-known open source platform for data ingestion and processing in real-time. More than just a message broker, Kafka is a distributed streaming platform. Kafka’s three main features are:?
Kafka is written in?Scala?and combines both queueing and pub/sub messaging patterns. Queueing enables higher scalability since it allows multiple consumers to read the same data and ensures that they receive a message exactly once. Kafka performs better than traditional queues, which don’t support multiple consumers.?
On the other hand, the pub/sub model sends each message to each subscriber, which isn’t the same as distributing the work. Hence, Kafka uses partitioned transaction logs to overcome that challenge and remain scalable. Each log (topic) is a set of ordered messages broken into partitions, and each consumer subscribes to a partition.?
Kafka uses partitioned transaction logs at the storage layer for streaming messages. This approach enables Kafka to handle trillions of events per day. The default storage configuration is seven days, but it can scale up to the full disk size.
To manage the offset, Kafka needs?ZooKeeper. Unfortunately, setting up Kafka is complex. Set up requires two components: Kafka brokers and ZooKeeper nodes. Additionally, on-prem infrastructure requires domain expertise and significant operational efforts. While it is possible to use a managed Kafka service, it can be very expensive.
The ZooKeeper requirement is the biggest bottleneck to Kafka’s scalability. Fortunately, in the latest Kafka version,?the ZooKeeper dependency will be removed.?
Google Pub/Sub
Google Pub/Sub is a service for messaging that leverages the pub/sub messaging pattern. Setting up an instance to run an application using Google Pub/Sub is easy because it’s a fully-managed cloud service. That means there can be less complexity with Google Pub/Sub than with Kafka (which requires machines, brokers, and ZooKeeper configuration).
With Google Pub/Sub, topics differentiate messages. Consumers use subscriptions to receive message notifications. Once they receive a message for processing, they send back an acknowledgment, as shown in the diagram below.?
领英推荐
Google Pub/Sub supports “at least once” delivery, and it doesn’t offer any order guarantees. On the other hand, Kafka provides ordering guarantees per partition.?
Google Pub/Sub has durable storage and real-time message delivery. Users can configure the retention policy, but the max is seven days. It’s cheap to use for smaller projects since the first 10GB is free.
Google handles pub/sub operations, and other Google Cloud services can use Pub/Sub APIs for integration. Expansion into new regions is straightforward since Google already has data centers across the globe. Comparatively, Kafka requires a lot of operational effort to scale across regions. More importantly, cross data center replication happens using the Google network, which provides robust performance.?
Google Pub/Sub has good performance, and it scales quickly. However, the more we scale, the more expensive it gets.?
Google Pub/Sub offers a lite version that can be less expensive with lower availability and durability. However, users need to manually manage resources because it doesn’t scale automatically, and storage also needs to be provisioned manually.?
Overall, if you’re already using GCP for other services, Google Pub/Sub is a much easier integration than Kafka.
RabbitMQ
RabbitMQ is the most commonly used multi-purpose messaging tool. It’s a “distributed message broker” and supports background tasks. It is written in?Erlang?and has commercial support available. It uses both message queueing and pub/sub.
RabbitMQ is?recommended for communication or integration among long-running tasks or background jobs?compared to Kafka, primarily used to stream, store, and re-read the data.?
RabbitMQ uses the message exchange concept where the publisher sends the messages to the exchange, and each consumer creates a queue out of the exchange. It lets users define routing rules and filter the messages based on their specific needs. Kafka lacks this ability and doesn’t have a mechanism to filter the messages. With Kafka, a subscriber will receive all the messages published on a particular topic.?
RabbitMQ is designed for vertical scaling. It will impact performance with horizontal scaling due to the coordination among the nodes. On the other hand, Kafka is designed for horizontal scaling.?
RabbitMQ provides message ordering guarantees only for the message published on one channel, passing through one exchange and one queue. It supports retries on messages that aren’t acknowledged but acknowledge messages are removed once they are consumed. This aspect makes it less resilient compared to Kafka (which uses the available disk space to retain old messages) when it comes to recovering from an outage.?
RabbitMQ is a mature platform that has been on the market since 2007. As a result, there is plenty of documentation and a large user base.?You can find a lot of case studies and best practices online to help optimize performance.?
Apache Pulsar
Apache Pulsar, an open source distributed messaging system, is a recent addition to the available messaging technology choices. Pulsar started as a queuing system but evolved to support event streaming. It leverages the approaches used by several other messaging systems in a single platform.?
Apache Pulsar uses a tiered architecture, with Apache BookKeeper providing storage. Adding dedicated BookKeeper storage is easy. Pulsar has a stateless broker that can connect to multiple Bookkeepers.?Since the broker is stateless, it can scale up and down based on requirements. This loosely coupled architecture makes Pulsar highly scalable. However, repartitioning and replication are required once a broker is added to a cluster, and those tasks take time.?
Apache Pulsar allows storage to scale without limit, but that can be expensive. It uses tiered storage where older messages are offloaded from Bookkeeper to cheaper storage, e.g., S3 (Amazon Simple Storage Service), GCS (Google Cloud Storage), or a similar file system. This architecture allows unlimited cost-effective storage scaling.
Pulsar provides some features Kafka lacks, such as tiered storage and geo-replication. However, most of the features in Pulsar are also supported by Kafka. On the other hand, Kafka has quite a few features that Pulsar lacks. These include long-term storage, reduced infrastructure requirements (number of servers), and single save to disk for data.
Apache Pulsar provides support for both message queueing and event streaming in a single solution. However, the feature set is limited compared to what Kafka provides such as exactly-once delivery, fault-tolerant state management, event-based processing message?XA transactions, or message filtering.?
Setting up Apache Pulsar is complex, even compared to Kafka. It requires setting up four different components: brokers, Apache Bookkeeper, RocksDB, and Apache ZooKeeper. That means there are two additional components Kafka doesn’t need. As a result, Pulsar requires more work to set up, debug, and maintain.?
Apache Pulsar has potential, but it will take some time to mature and?capture significant market share.
Macrometa streams
Macrometa GDN (Global Data Network) enables the building of real-time applications and APIs instantly across the globe without the operational hassle of infrastructure management. It also supports messaging queue and pub/sub via streams to build stateful low latency applications and data pipelines. In addition, it supports stateful event processing.?
Macrometa streams are straightforward to set up compared to Kafka. With a few clicks, you can have your applications running in different regions of the world and close to your clients. It doesn’t require extra operation efforts to set up the replications. Streams support both message queues and pub/sub messaging patterns.?
Macrometa streams have persistent storage that retains the messages as long as the consumer does not acknowledge them for three days. Once processed, they are removed unless configured for retention. Streams also support time to live (TTL) for messages that haven’t been acknowledged. Additionally, it does load-balancing automatically across consumers.?
Macrometa streams support both synchronous and asynchronous modes for both consumer and producer as well. As shown in the diagram below, pub/sub does support three different subscription modes - exclusive, shared, and failover.
The exclusive mode supports only a single consumer, while the shared mode allows multiple consumers in a round-robin fashion. Finally, failover mode allows multiple consumers with a master consumer and failover consumers who will receive messages only if the master consumer disconnects.?
Compared to Kafka, Macrometa is a relatively newer technology with a more limited user base, but documentation for developers is robust.?
Conclusion
Apache Kafka is one of the most widely used messaging systems, but it is far from your only option. Apache Pulsar provides similar capabilities with its tiered architecture and provides enhanced scalability. RabbitMQ is more of a traditional messaging system for communication instead of storing the messages. Google Pub/Sub provides a pub/sub messaging pattern with almost no effort to set up but best when used with other Google services. Finally, Macrometa offers similar messaging features integrated into an event stream processing platform with built-in geo-replication.
This article was originally published on https://www.macrometa.com/event-stream-processing/kafka-alternatives