Understanding Kafka System Design: Diving into Kafka Persistence
Lavakumar Thatisetti
Senior Software Engineer @ Atlassian | Ex-SSE @ Arcesium (D.E. Shaw) | Algorithms & Problem Solving ???? | 1 US Patent | Mentor | System Design?? | Exploring AI ??
Hello everyone! In this article, we're about to embark on an exciting journey exploring the inner workings of Kafka system design. Our primary focus will be Kafka Persistence - an essential component of Apache Kafka that ensures fault tolerance, durability, and high data availability.
To truly appreciate the genius behind Kafka's persistence, we'll first need to understand what it is and why it matters. Kafka Persistence ensures no data is lost during a system failure, making it a cornerstone of Kafka's robust architecture.
We'll delve into the mechanisms that make Kafka Persistence so reliable.
Don't fear the file system! - Official line from Kafka
Yes, you heard it correctly Kafka uses file system !!!
A common belief is that "disks are slow", which often leads to skepticism about the performance efficiency of a persistent structure. However, it's essential to understand that the speed of disks can significantly vary - they can be much slower or faster than anticipated, depending on their usage. A properly designed disk structure can often be as fast as the network.
Let's focus on how Kafka leverages disk storage and caching to ensure data durability and high performance.
Understanding Disk Performance:
One of the key principles underpinning Kafka's high performance is its efficient use of disk storage. Modern hard drives are optimized for linear reads and writes - operations that access data in a sequential, continuous manner. This contrasts with random reads and writes, which access data scattered across different sections of the disk, resulting in higher latency due to increased disk seek time. Kafka takes full advantage of this by writing new messages to disk in an append-only fashion, resulting in linear writes. Similarly, when consuming messages, Kafka reads them in the order they were written, resulting in linear reads.
Optimizing Disk Reads and Writes:
Operating systems employ techniques like read-ahead and write-behind to optimize disk operations. Read-ahead preemptively loads data blocks that will likely be requested next into memory, speeding up linear reads. Write-behind, or write-back caching, groups smaller logical writes into larger physical writes before they are written to the disk. This process reduces the time spent on disk seek, making write operations more efficient. Both these techniques play a crucial role in Kafka's ability to handle large volumes of data with low latency.
Operating System Disk Caching:
Modern operating systems use the main memory for aggressive disk caching, where all disk reads and writes go through a unified cache. This cache can flexibly allocate its total capacity based on the demand of disk I/O operations and user processes. Kafka, instead of maintaining an in-process cache, relies on this OS page cache. This decision avoids the inefficiency of storing data twice and reduces the burden on the Java Virtual Machine (JVM), on which Kafka is built.
JVM and Garbage Collection:
Garbage collection in JVM can introduce latency due to "stop-the-world" pauses.
However, Kafka is designed to minimize the amount of garbage created to reduce the impact of garbage collection. Here's how:
- Minimized Object Creation: Kafka aims to minimize the creation of temporary objects. The fewer objects created, the less work the garbage collector has to do, reducing the frequency and impact of garbage collection pauses.
- Zero-Copy Method: Kafka uses a zero-copy optimization to transfer data directly from the disk file to the network socket, bypassing the JVM's user space (and consequently, the garbage collector). This significantly reduces the amount of memory needed and lowers the garbage collection pressure.
- Use of Page Cache: As mentioned in the description you shared, Kafka relies heavily on the OS page cache for its data, instead of loading it into its JVM heap. This further reduces the amount of memory used on the JVM heap and the amount of work the garbage collector has to do.
- Tuning JVM and GC Settings: Kafka allows you to tune JVM and garbage collection settings for optimal performance. For instance, you can use different garbage collectors (like Concurrent-Mark-Sweep or G1) that aim to reduce pause times, or tune parameters like heap size, depending on your specific workload and requirements.
This ensures high-throughput and low-latency performance, despite the potential challenges posed by JVM's garbage collection.
Kafka's Log Writing Design Approach:
Kafka's design approach involves writing all data immediately to a persistent log on the filesystem, effectively transferring it to the kernel's pagecache. This cache, which is a portion of the main memory (RAM), holds frequently accessed data and metadata for the system. Because reading from and writing to RAM is much faster than doing so from a disk, this process can significantly speed up data operations.
if the Kafka service is restarted, the cache remains "warm". This is because the data in the pagecache (which is in RAM) persists across restarts of the Kafka service (unless the whole machine is rebooted), allowing for quick access to data and hence better performance right from the start. It also simplifies the code by delegating the responsibility of maintaining coherency between the cache and filesystem to the OS.
领英推荐
Persistent Data Structure:
Unlike traditional messaging systems that use a per-consumer queue with associated random-access data structures like BTrees. However, BTree operations are O(log N) and disk seeks come at high overhead.
Instead, Kafka's data structure is built on simple reads and appends to files. This provides O(1) operations and non-blocking reads and writes, providing a performance that is completely decoupled from the data size.
Here's how it works:
- Writes (Appends): Kafka stores all records in append-only logs. When new messages come into Kafka, they are appended to the end of these logs. Because the system is merely adding new information at the end of the existing log, the time it takes to write new messages does not grow with the size of the data. This is why writing to Kafka is considered to have O(1) complexity.
- Reads: When reading data, Kafka consumers keep track of the offset, the position of the next record that they'll read. Consumers just need to make a single disk seek to the next offset to fetch the required message. Since the position of the next record to read is known, the time it takes to read a message is constant, regardless of the size of the data or the position of the record within the log. This makes reading from Kafka an O(1) operation as well.
The O(1) time complexity for reads and writes in Kafka is one of the reasons why Kafka can handle high-throughput and low-latency delivery of real-time data feeds.
Unlimited Virtual Space
Kafka's design allows it to use virtually unlimited disk space without any performance penalty. This feature enables Kafka to retain messages for a long period, providing flexibility for consumers.
Kafka's durability
Three aspects play a critical role in Kafka's durability and data availability.
- Kafka's Distributed Commit Log: At the heart of Kafka's architecture is its distributed commit log. A commit log is a data structure that captures the history of all actions executed on a system. In Kafka, each message that's written is appended to the end of the log, and each one gets a unique sequential identifier known as an offset. Kafka retains all messages for a set amount of time, allowing for the possibility of system recovery after a crash.Kafka's commit log is distributed, meaning it's spread across multiple servers or nodes. Each message published to a Kafka topic is written to this distributed log and given an offset. Kafka guarantees that messages within a partition will be stored in the order they arrive, providing strong durability guarantees.
- Kafka's Log Flush Policies: Kafka's log flush policy controls when data is moved out of the operating system's disk buffer (pagecache) and onto disk storage. Kafka allows for two settings: 'log.flush.interval.messages' and 'log.flush.interval.ms'. The former flushes the log after a certain number of messages have been appended, while the latter flushes it after a certain period of time. These settings are crucial because they dictate the trade-off between latency and durability. If Kafka is configured to flush data more frequently, the latency increases, but the data is more durable because it's written to disk more often. On the other hand, if Kafka flushes data less frequently, the latency decreases, but there's a higher risk of data loss if a system crashes before the data is flushed to disk.
- Replication in Kafka: Replication is a key aspect of Kafka's architecture that ensures data availability and fault-tolerance. In Kafka, topics are divided into partitions, and these partitions are replicated across multiple brokers in a Kafka cluster.Each partition has one broker designated as the leader, and the rest are followers. All writes (producers) and reads (consumers) for a partition go through the leader, and the followers replicate the leader's log. If the leader fails, one of the followers will automatically take over as the leader.The number of replicas for a partition and the time Kafka waits before declaring a message as committed are configurable and play a crucial role in determining the durability and availability of data. More replicas mean higher data durability but also increased storage requirements.
In summary, these mechanisms work together to ensure fault tolerance, durability, and high availability of data in Kafka. The distributed commit log allows for data recovery and ensures the order of messages, the log flush policies control the trade-off between latency and durability, and replication guarantees data availability even in the case of a system failure. Understanding these intricate details is key to leveraging Kafka's capabilities fully.
Conclusion:
As we conclude this in-depth look at Kafka Persistence, we hope it serves as a valuable resource in understanding Kafka system design. Kafka’s robust persistence layer makes it a popular choice for real-time data processing.
But, our exploration into Kafka continues. My upcoming articles cover more aspects of Kafka system design, from Kafka's producer and consumer architectures to its stream processing capabilities.
Subscribe Now:
Before signing off, I invite you to subscribe to my newsletter. By subscribing, you'll receive updates on future discussions surrounding Kafka and other fascinating topics in the realm of system design. Stay tuned for a deeper understanding of the technologies shaping our world.
Make sure you take advantage of these insightful articles. Subscribe today and join our community of future-forward thinkers!
Thank you for your time, and we look forward to bringing you more explorations into the world of system design.
SWE @Microsoft | ME(React/Nextjs)N Developer | Specialist @codeforces | Computer Science DTU'24
11 个月Very useful