Practical Tips for Kafka Partitioning, Compression, Retention, and Swimlanes
Kafka is widely used in distributed environments. It's very important to understand a few things in detail.
1) How many partitions kafka topic should have :
In kafka, as you know partition is unit of parallelism. I have seen that while creating topic, first question comes to mind , how many partitions topic should have. Confluent has given a formula for this
max(t/p, t/c) partitions, where:
t = target throughput
p = measured throughput on a single production partition
c = consumption
but it's really difficult to figure out target throughput when you actually think about scale. This is bit difficult formula to apply, because we need to consider many factors those can impact throughput. Like if replication is synchronous then we will get less throughput, similarly if compression is used or not . If yes then which one , so and on so forth. Generally, I take the below basic things into consideration and then I do load test
I get the details of replication factor and broker size being used in production , and then I start load testing in dev environment with same configuration. Generally I start with 8 partitions, and then observe, am I getting real time updates or not , or is system able to sustain ? if not then I try to increase number of partitions and also apply other optimisations.But we should not have too many partitions because More partitions requires more open file handles.
Each partition maps to a directory in the file system in the broker.So, the more partitions, the higher that one needs to configure the open file handle limit in the underlying operating system. But if we are using managed services in cloud , then it may not be possible to change this limit.
2) Compression: There is high message volume, large message size and limited network bandwidth then go for compression , so that network bandwidth can be optimised. In most of the cases, compression works well, but yes as compression comes into picture then there is CPU overhead.
3) retention period : The retention period determines how long Kafka will retain messages in a topic before they are eligible for deletion. Don't forget to set appropriate value to "retention.ms" at topic level. I have seen that if we don't set the correct value, then Kafka brokers' disk gets full, and the broker goes down, causing the entire cluster to go into a toss. This results in production escalations.
领英推荐
4) Kafka Swimlanes : This is not the term used by Kafka open source community, but Hubspot introduced this to handle imbalanced traffic.
In case of Hubspot, multiple customers were using same topic. So if there is sudden burst of traffic from particular customer then it will build up lag and introduce delays. The fundamental problem here is that all of traffic, for all of customers, is being produced to the same queue.
If there is sudden burst of traffic that’s coming in faster than the real time swimlane can accommodate, then excess traffic will be sent to the overflow swimlane. Now when to send to overflow traffic will be decided at source based on rate limiter.