How partition and consumer groups works in kafka internally and how a consumer determines which partitions to read from?
?? Sai Ashish
Full Stack Engineer | Ex- Tekion, VideoVerse, LambdaTest, Paytm, ?????????????? MintTea (SwiftRobotics, Stewards, Velvet), ???? SigIQ.AI (PadhAI.ai) | 10 internships | System Design | Problem Solver
Kafka operates on a pull-based approach, where the consumer drives the process, unlike RabbitMQ, which is producer-driven.
In Kafka, the producer sends messages to partitions of topic, and consumers pull messages from these partitions based on their availability. This asynchronous implementation highlights Kafka's decoupling mechanism, ensuring efficient message processing and distribution.
Learn more about how kafka implements asynchronous communication:
Apache Kafka is a distributed streaming platform that uses partitions and consumer groups to scale and manage message consumption efficiently. Here’s a breakdown of how partitions and consumer groups work internally and how a consumer determines which partitions to read from:
Partitions: 1. Data Distribution:
- Each Kafka topic is divided into partitions. Partitions allow Kafka to parallelize message processing and increase throughput.
- Each partition is an ordered, immutable sequence of messages that is continually appended to. Messages are identified by their offset within the partition.
2. Replication:
- Partitions are replicated across multiple brokers to ensure fault tolerance. Each partition has one leader and multiple followers. The leader handles all read and write operations, while followers replicate the data.
Consumer Groups:
1. Group Coordination:
- A consumer group is a collection of consumers that work together to consume messages from a set of partitions. Each consumer in a group is assigned a subset of the topic’s partitions.
- Kafka’s coordinator (usually one of the brokers) manages the assignment of partitions to consumers within the group.
2. Partition Assignment:
- Kafka uses several strategies to assign partitions to consumers within a group, such as range, round-robin, and sticky assignment strategies. This ensures that each partition is assigned to only one consumer within the group.
- The assignment can be dynamic and may change as consumers join or leave the group, or as partitions are added or removed.
3. Offset Management: - Consumers within a group keep track of their offsets (the position in a partition) to know which messages have been processed. Kafka can store these offsets in a special internal topic or manage them using external systems.
Consumer Behavior:
1. Fetching Metadata:
- When a consumer starts or joins a consumer group, it queries the Kafka broker to get metadata about the topic. This metadata includes information about partitions and their leaders.
2. Partition Assignment:
- The consumer receives partition assignments from the Kafka coordinator. This tells the consumer which partitions it is responsible for. Consumers poll these partitions for new messages.
3. Message Reading:
- Consumers read messages from their assigned partitions based on the last committed offset. They continuously fetch messages from the leader of each partition they are assigned to.
4. Rebalancing:
- When there are changes in the consumer group (e.g., a new consumer joins or an existing consumer leaves), Kafka triggers a rebalance. During this process, partitions may be reassigned among consumers to maintain a balanced workload.
How partitions and consumer groups work in Kafka?
Scenario:
Imagine we have a Kafka topic named orders that is used to process customer orders in an e-commerce platform. This topic is configured with 4 partitions (P0, P1, P2, P3). We also have a consumer group named order-processors with 2 consumers (C1 and C2).
Initial Setup
- Topic orders with 4 partitions:
- P0
- P1
- P2
- P3
- Consumer Group order-processors with 2 consumers:
- C1
- C2
Steps:
1. Metadata Fetching:
- When C1 and C2 start, they connect to the Kafka broker to fetch metadata about the orders topic, including the number of partitions and their leaders.
2. Partition Assignment:
- Kafka's coordinator assigns partitions to consumers in the group. The assignment might look like this:
- C1 is assigned P0 and P1.
- C2 is assigned P2 and P3.
This ensures that each partition is read by only one consumer in the group.
3. Reading Messages:
- C1 will read messages from partitions P0 and P1.
- C2 will read messages from partitions P2 and P3.
- Each consumer keeps track of the last processed offset in each of its partitions to ensure it knows where to resume reading in case of a restart.
Example Message Flow
- Partition P0: Receives messages with offsets 0, 1, 2, ...
- Partition P1: Receives messages with offsets 0, 1, 2, ...
- Partition P2: Receives messages with offsets 0, 1, 2, ...
- Partition P3: Receives messages with offsets 0, 1, 2, ...
Suppose messages arrive in the following order:
- Message A to P0 (offset 0)
- Message B to P1 (offset 0)
- Message C to P2 (offset 0)
- Message D to P3 (offset 0)
- Message E to P0 (offset 1)
- Message F to P1 (offset 1)
- Message G to P2 (offset 1)
- Message H to P3 (offset 1)
Consumer Processing
- C1 will process:
- Message A from P0 (offset 0)
- Message E from P0 (offset 1)
- Message B from P1 (offset 0)
- Message F from P1 (offset 1)
- C2 will process:
- Message C from P2 (offset 0)
- Message G from P2 (offset 1)
- Message D from P3 (offset 0)
- Message H from P3 (offset 1)
Rebalancing
Now, let's say a new consumer, C3, joins the order-processors group. Kafka will trigger a rebalance to distribute the partitions among the three consumers.
- New Partition Assignment:
领英推荐
- C1 is assigned P0.
- C2 is assigned P1 and P2.
- C3 is assigned P3.
After rebalancing:
- C1 will process messages from P0.
- C2 will process messages from P1 and P2.
- C3 will process messages from P3.
How the Kafka coordinator assigns partitions to consumers within a consumer group?
Scenario
Let's use the same orders topic with 4 partitions (P0, P1, P2, P3) and a consumer group order-processors.
We'll start with 2 consumers (C1 and C2) and then add a third consumer (C3) to see how the assignment changes.
Initial Setup
- Topic orders with 4 partitions:
- P0
- P1
- P2
- P3
- Consumer Group order-processors with 2 consumers:
- C1
- C2
Partition Assignment by Kafka Coordinator: 1. Initial Partition Assignment:
When C1 and C2 join the consumer group, the Kafka coordinator assigns partitions to each consumer. One common strategy is the round-robin assignment. Here's how it could work:
- C1 is assigned P0 and P1.
- C2 is assigned P2 and P3.
The assignments might look like this:
C1: [P0, P1]
C2: [P2, P3]
2. Reading Messages:
- C1 will read messages from partitions P0 and P1.
- C2 will read messages from partitions P2 and P3.
3. Adding a New Consumer (C3):
Now, let's say C3 joins the order-processors group. The Kafka coordinator will trigger a rebalance to redistribute the partitions among the three consumers. Using a round-robin strategy, the new assignments might look like this:
- C1 is assigned P0.
- C2 is assigned P1 and P2.
- C3 is assigned P3.
The new assignments:
C1: [P0]
C2: [P1, P2]
C3: [P3]
4. Handling Rebalance:
During the rebalance, consumers will temporarily stop fetching messages, commit their current offsets, and then receive their new partition assignments.
5. Reading Messages Post-Rebalance:
After the rebalance:
- C1 will read messages from P0.
- C2 will read messages from P1 and P2.
- C3 will read messages from P3.
Example Message Flow Before and After Rebalance
Before Rebalance:
- C1 reads:
- P0: [M1, M2, M3]
- P1: [M4, M5, M6]
- C2 reads:
- P2: [M7, M8, M9]
- P3: [M10, M11, M12]
After Rebalance:
- C1 reads:
- P0: [M13, M14]
- C2 reads:
- P1: [M15, M16]
- P2: [M17, M18]
- C3 reads:
- P3: [M19, M20]
Partition Assignment Steps:
1. Consumers Join Group:
- Each consumer sends a join request to the Kafka coordinator.
2. Coordinator Determines Group Membership:
- The coordinator collects the list of consumers and partitions.
3. Assignment Strategy:
- The coordinator uses an assignment strategy (e.g., round-robin, range) to allocate partitions to consumers.
4. Assignment Distribution:
- The coordinator sends the assignment to each consumer.
- Consumers acknowledge the assignment and begin reading from their designated partitions. This example demonstrates how Kafka dynamically manages partition assignments to balance the workload across consumers in a group, ensuring efficient and fault-tolerant message processing.
Key Points:
#softwaredevelopment #softwareengineering #applicationdevelopment #kafka #systemdesign