登录查看更多内容

How Kafka achieves its design goals (Part II)

Hoan Tran Viet

? DevOps Engineer - A Soldier's Son

发布日期: 2024年12月25日

Following the previous article, we continue to explore key features of Kafka's design that help it achieve the target goals.

This section will focus on Kafka's solutions for message brokers, distribution, and replication.

Reminding the Kafka's design requirements:

High-throughput
Gracefully with large data backlogs
Low-latency delivery
Guarantees fault tolerance

Publish-Subscribe model

Kafka producers batch messages and asynchronously send them to the target brokers without routing to reduce latency delivery. The consumers fetch batch messages with self-defined offset to control process rate when working with large data and ensure no messages are lost.

Kafka Producer and Brokers Communication

Producing routing: Producers send messages directly to the target consumer, which is the leader of the target partition. The producers request metadata from any broker to keep updated server status and identify the leader of each partition.
Asynchronous send: Producers batch messages in memory to send larger requests, improving throughput and efficiency.
Pull model consuming: Consumers "fetch" requests from brokers leading the target partitions to control when and how much data to fetch. Pull mode prevents consumers from being overwhelmed and enables efficient batching consumption.
Consumer offset tracking: The consumer specifies its offset in the log and receives logs behind that position. The consumer thus has significant control over the start points and can rewind it to re-consume data if need be.

Replication and Partition

Kafka topics are divided into partitions, each partition is distributed across different Kafka brokers (a single leader and zero or more followers). This architecture helps it achieve parallelism and load balancing, enabling high throughput and real-time processing.

领英推荐

Big Data Architectural patterns - Lambda (λ), Kappa…

Deepanshu Kalra 2 年前

Change Data Capture (CDC) Events Ingestion

Isha Rani 2 年前

Advice for CIOs - How to Build a Suitable IT…

Huawei IT Products & Solutions 1 年前

Kafka replicates data across multiple brokers and automatically elects new leaders when a broker fails to ensure fault tolerance

Partition: Kafka topics are split into partitions, which can be distributed across multiple brokers. A topic can be processed in parallel to improve throughput. Moreover, this allows horizontal scaling by adding more brokers and more partitions.
Replication: Kafka replicates each partition across multiple brokers. This means that even if a broker fails, the data is still available from the replicas on other brokers.
Leader-Follower Model: Each partition has one leader replica and multiple follower replicas. The leader by default handles all read and write operations for that partition. If a leader fails, one of the followers is promoted to be the new leader, ensuring that the partition remains available with minimal downtime.
In-Sync Replicas (ISR) and Broker Liveness: An ISR is a replica that is up to date with the leader broker for a partition. Via heartbeats and log's delay, the leader checks If a follower fails, gets stuck, or falls behind to remove it from the list of in-sync replicas and add a new one.
Election algorithm: Any replica in the ISR can become a new leader upon failure. This model requires fewer replicas than a strict quorum approach. If all the nodes replicating a partition die, Kafka waits for the first ISR or non-ISR replica (unclean leader election) to return.
Committed Messages: Producers can choose how long to wait for acknowledgments (no wait, the leader, or all ISR replicas). Waiting for "all" increases durability but can reduce throughput or availability.

Keynote

In summary, Kafka’s design thoughtfully addresses the challenges of high-throughput, low-latency data streaming and fault tolerance.

By combining producer-side batching and asynchronous sends with consumer-driven pull models, Kafka ensures that data is delivered efficiently without overwhelming consumers. The consumer group and offset help Kafka consumers know and control where to start or continue reading and processing messages after failures.

Partitioning topics and implementing a leader-follower replication strategy allows Kafka to scale seamlessly and maintain availability despite broker failures. The in-sync replica mechanism further reinforces fault tolerance by dynamically managing the replication state and promoting new leaders with minimal disruption.

References:

要查看或添加评论，请登录

Hoan Tran Viet的更多文章

What exactly are VPN secure tunnels?

2025年3月16日

What exactly are VPN secure tunnels?

Most of us have used a VPN at least once—maybe to bypass website restrictions or securely access private company…

2 条评论
MAC vs. IP Addresses: Why We Need Both?

2025年3月2日

MAC vs. IP Addresses: Why We Need Both?

I'm writing this article after drinking a couple of beers. It will not be formal and concise, but it is my spontaneous…

1 条评论
How are Secret keys exchanged through insecure networks?

2025年2月23日

How are Secret keys exchanged through insecure networks?

In the previous post, we learned about the combination of symmetric keys (used for session data encryption) and…

7 条评论
How Kubernetes authenticate internal access?

2025年2月9日

How Kubernetes authenticate internal access?

When you access the Kubernetes API server, you authenticate as a regular user. But what happens when Pods start making…
How are types of Cryptography combined in our daily activities?

2025年1月19日

How are types of Cryptography combined in our daily activities?

Nowadays, we spend much time on the Internet for reading news, watching videos, or surfing social networks. But have…
How Kafka achieves its design goals (Part I)

2024年12月14日

How Kafka achieves its design goals (Part I)

In recent years, almost of us have been using Kafka for many use cases such as message brokers, activity tracking, and…

2 条评论
How does HDD physically work?

2024年11月24日

How does HDD physically work?

I've used hard disk drives since I first started using computers. Before SSDs and cloud storage became prevalent, HDDs…
Analog recording history (Part III - Vinyl)

2024年11月17日

Analog recording history (Part III - Vinyl)

In the previous parts, we have explored phonograph cylinders which used cylinders as the medium to store audio signals.…
How the Edison Phonograph works

2024年11月2日

How the Edison Phonograph works

In the previous part, we explored the early history of analog sound recording. Edison's phonograph, invented by Thomas…
Analog audio recording history (Part I - Phonograph)

2024年10月27日

Analog audio recording history (Part I - Phonograph)

Cassette players preserve many memories of Vietnamese people from the 1970s to 1990s. At that time, my country was…

2 条评论

See all articles

How Kafka achieves its design goals (Part II)