Internal Architecture of Kafka
Deboshree Choudhury
LinkedIn Top Algorithms Voice'24 | Senior Software Engineer @ Tesco | Educator | Ex Informatica
Apache Kafka is a powerful event streaming platform that enables developers to process and respond to data events in real time.
Its architecture has two layers -the storage and compute layers which are designed to handle large-scale, distributed systems. The storage layer is optimized for efficient data retention and scales horizontally, making it easy to expand storage capacity as needed.
On the other hand, the compute layer handles data processing and interaction, comprising four key components: Producers, Consumers, Streams, and Connector APIs. These components work together to ensure that Kafka can scale distributed applications while managing real-time data streams effectively.
Architecture Diagram
This Kafka architecture diagram outlines the separation between the Compute Layer and the Storage Layer of the Kafka system, highlighting the interaction between the core components used for data streaming and processing.
1. Compute Layer
This layer is responsible for creating, processing, and consuming data streams in real-time, interacting with the Kafka cluster and the storage layer.
2. Storage Layer
The storage layer in Kafka ensures data durability, fault tolerance, and scalability. This is where Kafka's distributed system comes into play.
Internal Architecture of Kafka Broker
This diagram represents the internal architecture of a Kafka Broker and outlines how client requests are processed within the system.
When the Kafka Client sends a request to the broker, the request is first received by the Socket Receive Buffer. This is a memory buffer that temporarily holds the incoming data from clients before it is processed.
Network Threads picks it up and processes the request. The thread reads the data from the buffer and decides if it is a produce request (writing data) or a fetch request (retrieving data). The request can be of two types:
- Produce Requests: These are requests to write a batch of data to a specific Kafka topic.
- Fetch Requests: These are requests to read data from a Kafka topic.
The Request Queue holds incoming produce requests. These requests are picked up by I/O threads for further processing. The queue ensures that requests are processed in the order they were received, ensuring fairness and maintaining data integrity.
The I/O Threads are responsible for reading and writing data to and from disk. After picking up a produce request from the request queue, the I/O thread performs several key functions:
Kafka uses an in-memory Page Cache to buffer data before it is written to disk. This helps reduce the number of disk I/O operations and improves performance by keeping frequently accessed data in memory.
Once data is written into the page cache, it is eventually flushed to disk for persistent storage. Kafka organizes its on-disk storage using commit logs and segments:
Tiered Fetch Threads handle reading data from different layers of storage (in-memory cache, on-disk storage, or even cloud-based object stores) and serve it back to the client. These threads help in responding to fetch requests efficiently by managing data across different storage tier
领英推荐
Kafka employs a purgatory structure to handle requests that cannot be processed immediately. This typically occurs with produce requests waiting for replication across brokers or fetch requests waiting for sufficient data to be available.
Kafka ensures fault tolerance and data consistency using replication. The broker coordinating the produce request must ensure that data is replicated to other Kafka brokers. Until replication is completed across the necessary brokers, the produce request remains in purgatory. Once completed, the broker acknowledges the client.
After processing a client request, a response (such as acknowledgement of a write or delivering fetched records) is added to the Response Queue. Each network thread maintains its own response queue to manage client responses efficiently.
Kafka can integrate with external Object Stores (like AWS S3) for long-term storage of large data sets. The Tiered Fetch Threads handle requests that need data from these external stores, providing an efficient mechanism to scale Kafka 0storage beyond the capacity of the local disk.
Kafka Data Replication and Leader-Follower Dynamics
By distributing data across multiple brokers, Kafka maintains fault tolerance and high availability, even in the case of broker failures.
Replication is configured at the topic level. When creating a topic, users can define the number of replicas for each partition (replication factor)
A replication factor of "N" allows Kafka to withstand up to "N-1" broker failures without losing data or sacrificing availability.
Once replicas are established for each partition of a topic, one of them is designated as the leader replica, and the broker that holds this replica is responsible for managing reads and writes to the partition. The remaining replicas are referred to as followers, which replicate data from the leader to stay in sync
The In-Sync Replica (ISR) set includes the leader and all followers that are fully caught up with the leader’s data. Ideally, all replicas remain part of the ISR to ensure data consistency and availability.
Kafka Consumer Groups will be covered in the next article.
Special thanks to Jun Rao for explaining the concepts so well in confluent.io architecture documentation.
To receive new posts and support my work, subscribe to the newsletter.
Resources