How Kafka is so efficient?
Tushar Goel
Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead
What is Apache Kafka?
Apache Kafka is a distributed streaming platform that allows you to::
Kafka supports a high-throughput, highly distributed, a fault-tolerant platform with low-latency delivery of messages. Here, we are going to focus on the low-latency delivery aspect
A common cause for in-efficiency
·?????Disk access pattern
·?????Too many small I/O operations
·?????Excessive byte coping
Disk access pattern:
Kafka relies on the filesystem for storage and caching. Writing data on random locations is very slow as compared with writing data in append mode. i.e. writing data in one after another in a linear way. Disks are slow due to disk seek operation. Seek operation is the operation to take disk head to a particular disk sector where data is present.
A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes. This is much better than maintaining a cache in a JVM application because
1.???In case of the restart we need to fetch data again
2.???Memory overhead of objects is very high
3.???JVM garbage collection is slowing as heap data increases
4.???Effective compress data storage in the hard disk
Due to the above reason, using the filesystem and relying on page cache is a better option than in-memory cache. So instead of writing in memory and flushing out to the filesystem in case of space runs out, all data are immediately written on a filesystem. More details about page-centric design are explained article.
Don't use trees:
Most often persistent storage used by messaging systems uses BTree implementation to maintain metadata information. It guarantees to provide log(n) performance to search metadata from the tree but disk operations are not that optimized. They depend upon the disk seek time and also each disk can do 1 seek at a time so parallelism is limited in this case. Due to this as the data grows performance reduces.
A persistent queue can be maintained with log structure format where writing can be done one after another in append mode that gives us the performance of O(1) and readers can read independently to write operations.
Too many small I/O operations
Instead of sending 1 message at a time, Kafka allows network requests to group messages together to reduce the network overhead. Server appends this chunk of messages to its log in one go and then a consumer can fetch the large chunk at a time. This helps to increase the performance multi-fold. So due to the batching producer can send the large size packets that help sequentially disk access that will follow as a stream of messages to the consumer.
?To make this more efficient Kafka also uses the compression of batches (and not individual messages) using?LZ4,?SNAPPY,?or?GZIP?codecs. This can lead to better compression ratios.
?Excessive byte copying
One of the major inefficiencies of data processing systems is the?serialization and deserialization?of data during transfers. This can be made faster by using better binary data formats, such as protocol buffers or Flat buffers, instead of JSON. But how can you avoid serialization/deserialization altogether? Kafka handles this in two ways :
·?????Use a standardized binary data format for?producers, brokers, and consumers?(so data can be passed without modification)
·?????Don’t copy the data in the application (“zero-copy”)
For the 2nd point, we need to understand the following things. A common data transfer from file to socket might go as follows:
1.???The OS reads data from the disk into pagecache in the kernel space
2.???The application reads the data from kernel space into a user-space buffer
3.???The application writes the data back into kernel space into a socket buffer
4.???The OS copies the data from the socket buffer to the NIC buffer, where it is sent over the network
However, if we have the same standardized format for data that doesn’t require modification, then we have no need for step 2 (copying the data from kernel space to user space).
If we keep data in the same format as it will be sent over the network, then we can directly copy data from pagecache to NIC buffer. This can be done through an OS?sendfile system call. More details on the zero-copy approach can be found in this?article.
References: