登录查看更多内容

Kafka Oversimplified

Vitalii Kozlovskyi

Geek | DevOps | SRE | Software Engineer (Golang/Python)

发布日期: 2022年7月14日

So, Kafka Topic is a queue… Kinda queue… More like distributed set of queues. Distributed set of queues with routing and some consistency guarantees.

Partitioning

In?Kafka?terms, partition is separate queue. Then you have Topic, which is a group of partitions. That predefined number of partitions is spread across nodes.

5 partitions across 3 nodes.

Real number of partitions is usually way higher, to allow even load balancing.How do we know the partition for each specific message?

Each message is sent to random (or specifically selected) partition.

Log Segments and Offsets

Each partition is stored as append-only immutable log file.

New messages are appended, replicated, and read by offsets. Old log files are deleted later.

Offsets:

Current?position, where new messages are written;
“HighWatermark” position that was replicated (if needed);
Read?position

Here?your custom software received, process, and commits messages

Commit?position

old log file is deleted after retention window

Once again, Kafka creates log files for each partition separately, so

Message Order

Understanding internals, one may ask how does Kafka keeps topic message ordering?

It does not. Messages are ordered ONLY within a partition. If you need them ordered, make sure they appear in same partition. You can set it manually, but it’s easier to use message key.

partition = hash(key) % len(patitions)

Usually user_id, transaction_id, content_id or suchlike fields are set as message key.

Consistency

Usually you have to choose between duplicated messages (at least once) or lost messages (at most once)

There is no magic, you either loose message or get it duplicated

Kafka claims to support Exactly once mode, via two phase commit, but that’s a separate topic of discussion.

Source

Kafka Oversimplified

Vitalii Kozlovskyi

Geek | DevOps | SRE | Software Engineer (Golang/Python)

Partitioning

Log Segments and Offsets

Message Order

Consistency

社区洞察

其他会员也浏览了

Beyond Kafka and Fivetran: How RabbitMQ Became Our Messaging MVP????

Basic terminologies in Kubernetes

Optimizing Kafka Consumer Services: Key Learnings

The Power of Distributed Processing in Ab Initio Architecture in DataEngineering

Understanding Asynchronous Communication and How Apache Kafka Drives It

Secure inter-micro-service communication with Spring Boot, Kafka, Vault and Kubernetes -- Part 3 : Setting up Vault

Kafka: Optimizing Performance, Reliability, and Scalability in Distributed Systems

?? Latest Updates from Middleware ??

Managing TLS Secure Kafka on Kubernetes using KubeDB

Activities worked behind any Kubernetes command