How Kafka achieves its design goals (Part I)

How Kafka achieves its design goals (Part I)

In recent years, almost of us have been using Kafka for many use cases such as message brokers, activity tracking, and event sourcing. Have you ever been curious about why so many companies even big ones choose Kafka?

This series will explore the motivations behind Apache's development of Kafka and how its design addresses these challenges. In the first article, we will focus on Kafka's solutions in data storing, caching, and transferring.

Motivations

According to Apache, Kafka is designed as a unified platform for handling all the real-time data feeds a large company might have. This goal requires the following requirements:

  • High-throughput to support high-volume event streams such as real-time log aggregation.
  • Gracefully with large data backlogs to support periodic data loads from offline systems
  • Low-latency delivery to handle more traditional messaging use cases.
  • Guarantees fault tolerance in the presence of machine failures

Solutions

Data storing and caching

Kafka stores messages in append-only logs, and their contents are immutable. This approach improves throughput and can deal gracefully with large data backlogs.

  • Sequential Disk Access: Modern disks are extremely efficient at sequential reads and writes (hundreds of MB/s). Kafka uses a log-structured approach instead of random seeks. For example, a 7200rpm SATA RAID-5 array performs about 600MB/sec for sequential writes but only 100k/sec for random writes—this is a 6000x difference.
  • Pagecache and Memory: Kafka?writes data to a persistent log?immediately and relies on the operating system's?page cache?rather than maintaining large in-memory caches. This ensures that?data persists?(even after a restart), avoids slow rebuilds of the in-memory cache, and allows better?cache coherency?handled by the OS.
  • Message Retention: Kafka’s persistent logs design allows it to retain messages for extended periods (e.g., a week) instead of deleting them immediately after consumption.
  • Simplify Data Structures: Kafka’s append-only log (O(1) operations) bypasses seek time issues from complex data structures like BTREE. It provides stable performance even as data volumes grow.

Data transfering

Kafka uses several techniques to speed up data transfer like batching, compression, common file format, and especially Zero-copy. This helps Kafka achieve high throughput and low latency delivery.

  • Batching I/O: Grouping messages together into sets reduces the overhead of numerous small I/O operations and network roundtrips. It leads to larger network packets, larger sequential disk operations, and contiguous memory blocks.
  • Minimizing Byte Copies: A standardized binary message format shared by producers, brokers, and consumers avoids unnecessary copying. Data chunks can be transferred without modification.
  • Zero-Copy Optimization: Kafka applies zero-copy optimization to transfer data directly from the OS page cache to a network socket, avoiding redundant data copies and system calls for user-space buffers and socket buffers.
  • End-to-End Batch Compression: Kafka optimizes network bandwidth by supporting batch compression. Instead of compressing individual messages, Kafka compresses batches of messages together, improving compression ratios and reducing network load.

Keynote

Kafka's design prioritizes high throughput, persistent, low latency, and fault tolerance. The log-centric architecture and sequential disk access help it utilize modern disk's read, write, and cache operations. Combined with data transfer techniques like zero-copy and batch compression, Kafka can handle enormous streams of events efficiently and reliably.

In the next article, we will explore Kafka's solutions for message brokers, distribution, replication, and fault tolerance.

References:


Jewel Luther

Application Architect, Microservices, Java, Kafka, ML, Docker/Kubernetes, Azure, AWS, PCF, GCP

3 个月

To investigate why Kafka solution failing we can monitory the performance of our Kafka topics and track the flow of data. We can use updated connectors. I suggest use Kafka Tools (KT) for live monitoring, here is the link https://kafkatools.com/

Vu Duc Huy

?Backend Developer

3 个月

Good to know !

回复

要查看或添加评论,请登录

Hoan Tran Viet的更多文章

  • What exactly are VPN secure tunnels?

    What exactly are VPN secure tunnels?

    Most of us have used a VPN at least once—maybe to bypass website restrictions or securely access private company…

    2 条评论
  • MAC vs. IP Addresses: Why We Need Both?

    MAC vs. IP Addresses: Why We Need Both?

    I'm writing this article after drinking a couple of beers. It will not be formal and concise, but it is my spontaneous…

    1 条评论
  • How are Secret keys exchanged through insecure networks?

    How are Secret keys exchanged through insecure networks?

    In the previous post, we learned about the combination of symmetric keys (used for session data encryption) and…

    7 条评论
  • How Kubernetes authenticate internal access?

    How Kubernetes authenticate internal access?

    When you access the Kubernetes API server, you authenticate as a regular user. But what happens when Pods start making…

  • How are types of Cryptography combined in our daily activities?

    How are types of Cryptography combined in our daily activities?

    Nowadays, we spend much time on the Internet for reading news, watching videos, or surfing social networks. But have…

  • How Kafka achieves its design goals (Part II)

    How Kafka achieves its design goals (Part II)

    Following the previous article, we continue to explore key features of Kafka's design that help it achieve the target…

  • How does HDD physically work?

    How does HDD physically work?

    I've used hard disk drives since I first started using computers. Before SSDs and cloud storage became prevalent, HDDs…

  • Analog recording history (Part III - Vinyl)

    Analog recording history (Part III - Vinyl)

    In the previous parts, we have explored phonograph cylinders which used cylinders as the medium to store audio signals.…

  • How the Edison Phonograph works

    How the Edison Phonograph works

    In the previous part, we explored the early history of analog sound recording. Edison's phonograph, invented by Thomas…

  • Analog audio recording history (Part I - Phonograph)

    Analog audio recording history (Part I - Phonograph)

    Cassette players preserve many memories of Vietnamese people from the 1970s to 1990s. At that time, my country was…

    2 条评论

社区洞察

其他会员也浏览了