Is Kafka good for only big data streaming and Event-driven systems should only use Azure Service Bus?
Piyush Porwal
Tech Enthusiast & Learner| Engineering Manager at Microsoft | Career Growth Mentor | Architect | Sharing Food For Thoughts
Recently, I was in discussions with a colleague for one problem to decide should we use Kafka queues or Azure Service Bus and we learnt some good concepts in trying to answer that question. Sharing that to answer here in which use case which one to use and why?
Let’s learn the basics and architecture first of both of them.
While working with Microsoft, Service Bus becomes like a default tech to use in event based programming. This supports pub-sub approach where publisher could send messages using different topics and subscriber could opt to receive which one it is interested in. This approach greatly enables two services within one application to communicate with each other with low coupling and enabling each service to plan their own scaling needs. This is a huge win in building a event driven distributed application. I talked about the communication designs in one of video here as well, feel free to check for other options.
High Level architecture:
Messages in queues are ordered and timestamped on arrival. Once the broker accepts the message, the message is always held durably in triple-redundant storage, spread across availability zones if the namespace is zone-enabled. Service Bus keeps messages in memory or volatile storage until client reports them as accepted.
Features:
Service bus supports various features on top of being a basic queue, namely you can not only do topic-based message delivery, but also could have auto-forwarding or dead-lettering kind of features available to use.
Designing Scalable Architecture with SB
Now how does this benefit me as an application developer who is looking for a scalable architecture to be built? Let’s consider a scenario where we have a web application that needs to communicate with a backend service to process orders. We’ll use Azure Service Bus to facilitate this communication.
领英推荐
using System;
using Microsoft.Azure.ServiceBus;
using System.Text;
using System.Threading.Tasks;
class ServiceBusMessageSender
{
static async Task Main(string[] args)
{
string serviceBusConnectionString = "YOUR_SERVICE_BUS_CONNECTION_STRING";
string queueName = "YOUR_QUEUE_NAME";
IQueueClient queueClient = new QueueClient(serviceBusConnectionString, queueName);
// Create a new message
string messageBody = "Your order message data";
Message message = new Message(Encoding.UTF8.GetBytes(messageBody));
try
{
// Send the message to the queue
await queueClient.SendAsync(message);
Console.WriteLine("Message sent to the queue successfully.");
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
}
finally
{
// Close the queue client
await queueClient.CloseAsync();
}
}
}
By following these steps, we can effectively use Azure Service Bus to facilitate reliable and asynchronous communication between components in our architecture, ensuring seamless order processing and scalability for our application.
Moving to Apache Kafka
Kafka is a well-known name in IT industry hence may not need lot of introductions here but let’s understand some of the key areas which makes it shine for building real-time streaming data pipelines compared to RabbitMQ or Amazon Kinesis:
Events in Kafka represent occurrences in the world or in a business and are composed of a key, value, timestamp and optional metadata headers. Producers publish events to Kafka, while consumers subscribe to and process them. Kafka’s design allows for high scalability by decoupling producers and consumers, ensuring no waiting time for either party. Events are stored in topics, akin to folders in a filesystem and are durably retained based on configurable settings. Topics are partitioned across Kafka brokers for scalability, with events written to partitions based on their keys. Replication ensures fault tolerance and high availability by maintaining multiple copies of data across brokers.
Now let’s use it to write a trading app:
using Confluent.Kafka;
using Newtonsoft.Json;
using System;
using System.Threading;
class Program
{
static async Task Main(string[] args)
{
// Kafka broker configuration
var config = new ProducerConfig
{
BootstrapServers = "localhost:9092"
};
// Initialize Kafka producer
using (var producer = new ProducerBuilder<Null, string>(config).Build())
{
// Sample data for trading
var tradeData = new
{
Symbol = "AAPL",
Price = 150.20,
Quantity = 100
};
// Serialize trade data to JSON
var tradeJson = JsonConvert.SerializeObject(tradeData);
// Publish a trade message
await producer.ProduceAsync("trading_topic", new Message<Null, string> { Value = tradeJson });
}
// Initialize Kafka consumer configuration
var consumerConfig = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "trading_group",
AutoOffsetReset = AutoOffsetReset.Earliest
};
// Initialize Kafka consumer
using (var consumer = new ConsumerBuilder<Ignore, string>(consumerConfig).Build())
{
// Subscribe to the trading topic
consumer.Subscribe("trading_topic");
// Consume trade messages
while (true)
{
try
{
var consumeResult = consumer.Consume(CancellationToken.None);
Console.WriteLine($"Received trade message: {consumeResult.Message.Value}");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occurred: {e.Error.Reason}");
}
}
}
}
}
Which one to use for messaging system for your use case?
The choice is tough to make because both of them provide really good solutions for being scalable, fault tolerant and event driven design. Earlier, I used to believe Kafka is good use case only for log streaming or big data streaming use cases, but that turns out to be inaccurate. While, ASB is a popular choice for the applications I am working on and it is indeed a great solution, there is no doubt that Kafka also supports lot of those use cases too.
One place where Kafka has a huge benefit is the large community using it in the industry. Service bus has a really good?.NET ecosystem and Azure level of high documentation too. Kafka suits really well for big data needs where as service has message ordering preserved, which Kafka does only at partition level.