Kafka Producer: Secrets to Efficient Real-Time Data Processing

Kafka Producer: Secrets to Efficient Real-Time Data Processing

In modern architecture, there are many reasons why an application might need to write messages to Kafka: recording user activities for auditing or analysis, recording information from smart devices, asynchronous communication with other applications, and many more.

Each architecture requires diverse and different needs, such as: the importance of the message, or can we tolerate the loss of some messages? Can we accept accidentally duplicating messages? Are there strict latency or throughput requirements?

In financial transactions involving money or inventory management in e-commerce, there will be a requirement never to lose a single message and not to process duplicate messages. The requirement for low latency, allowing latency <= 500 ms, and the need for high throughput are also crucial. On the other hand, tasks like sending customer notifications can tolerate the loss of some messages. To meet the requirements, we first need to understand the architecture of Kafka. As I described in my previous post about the high-level design of Kafka, you can refer back to it for a comprehensive overview. Once we understand the architecture of Kafka, we can break down its components for detailed analysis and control. We can confidently tweak the parameters and make accurate assessments when applying Kafka to our architecture. In today's post, I will share about the “Producer,” a very important component that is often overlooked and under-analyzed. Regarding the “Producer,” I have two main topics:

  • High-Level Design
  • Configuration

These two topics will be covered in separate posts; today’s post will focus on the “High-Level Design” of the producer.



Let's dive straight into the producer's processing flow. You can follow the flow in the attached diagram. To start producing messages to Kafka, the first step is to create a ProducerRecord object. The required attributes are the topic and value information. Key and partition information are optional and may or may not be included. The producer then sends the ProducerRecord. In this step, the producer performs the following tasks:

  • Serialize the key and value into byte[] so they can be sent over the network.
  • Next, the data will be sent to the partitioner. If the partition information is already present in the ProducerRecord, the partitioner does nothing and returns the pre-configured partition information. If we have not specified the partition in the ProducerRecord, the partitioner processes it to select a partition based on the key information. Once the partition is selected, the producer determines which topic and specific partition the record will go to. Then, the producer adds the record to the batches being sent to the same topic and partition. A separate thread is responsible for sending the batch records to the corresponding Kafka broker.
  • When the Kafka broker receives the message, it sends back a response:If the message is successfully written to Kafka, it returns RecordMetadata, including the topic, partition, and offset of the record in the partition.If the broker fails to write the message, it returns an error. The producer receives the error information, retries a few times before giving up, and throws an error.

By reading this, you should understand the producer's processing flow and identify potential bottlenecks. When high latency and throughput are required, the first thing to consider is the producer. Any issues at the source will impact subsequent flows. In the next post, I will share how to fine-tune and explain the configuration parameters of the producer to optimize for various requirements.

Manh Pham Duc

?Senior Software Engineer

7 个月

Good to know!

Tran Khanh Ly

Software Tester at National Citizen Bank (NCB)

8 个月

Very helpful!

Trung V? Tr?ng

?Technical Lead at VERP | .Net, SQL Server, SQL optimizer, Angular

8 个月

Thanks for sharing

?Hoang Van Quy

Database Developer | Database Administrator | Database Oracle #performanceturning, and #toiuucosodulieu

8 个月

Good to know!, Hóng post ti?p theo

Vincent Granville

Co-Founder, BondingAI.io

8 个月

See also SingleStore for speed and scalability (petabytes of data, real time): https://mltblog.com/4fgKryU

要查看或添加评论,请登录

Nguyen Trung Nam的更多文章

  • Stream Processing Pattern

    Stream Processing Pattern

    M?i h? th?ng stream processing ??u có s? khác bi?t, t? m? hình ??n gi?n ch? g?m consumer, logic x? ly và producer, ??n…

    3 条评论
  • Stream Processing Fundamental

    Stream Processing Fundamental

    Có r?t nhi?u nh?m l?n v? khái ni?m stream processing. Nhi?u ??nh ngh?a tr?n l?n chi ti?t tri?n khai, yêu c?u hi?u su?t,…

    9 条评论
  • Suy t? v? blockchain

    Suy t? v? blockchain

    Trong ??i s?ng hàng ngày, khi ?i ch? ho?c siêu th?, làm sao chúng ta bi?t r?ng lo?i rau, th?t, ho?c trái cay chúng ta…

    3 条评论
  • C?i thi?n read throughput khi s? d?ng Elasticsearch

    C?i thi?n read throughput khi s? d?ng Elasticsearch

    Vi?c t?ng s? l??ng replica mang l?i l?i ích hi?u su?t ?áng k?. Replica giúp t?ng th?ng l??ng ??c (read throughput): các…

    5 条评论
  • Reactor Pattern trong NodeJS

    Reactor Pattern trong NodeJS

    Y t??ng chính c?a m? hình Reactor là m?i thao tác I/O s? có m?t handler ?i kèm. Trong Node.

    1 条评论
  • "Chan kinh" backend developer c?n ph?i c?m ng? (P2)

    "Chan kinh" backend developer c?n ph?i c?m ng? (P2)

    Ti?p n?i bài vi?t tr??c, mình chia s? n?t hai m? hình socket ph? bi?n khác. ? các m? hình tr??c chúng ta th?y r?ng s?…

    1 条评论
  • "Chan kinh" backend developer c?n ph?i c?m ng? (P1)

    "Chan kinh" backend developer c?n ph?i c?m ng? (P1)

    Khi chúng ta là l?p trình viên, nh?ng ng??i th? chan chính, chúng ta th??ng s? ti?p c?n nhi?u m? hình l?p trình socket…

    2 条评论
  • Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

    Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

    Bài vi?t tr??c chúng ta ?? hi?u cách kernel Linux ho?t ??ng khi ??c và g?i d? li?u, cách nó x? ly các k?t n?i và chuy?n…

  • C? ch? client k?t n?i server t??ng d? hóa khó

    C? ch? client k?t n?i server t??ng d? hóa khó

    Trong quá trình phát tri?n ?ng d?ng, chúng ta th??ng ??a các gi?i pháp t?i ?u database, coding,..

    5 条评论
  • Cassandra có th?t s? ghi nhanh nh? l?i ??n?

    Cassandra có th?t s? ghi nhanh nh? l?i ??n?

    ?? tr? l?i cho cau h?i ? tiêu ?? chúng ta cùng nhau ?i bóc tách ki?n trúc và c? ch? ghi c?a cassandra. V?y h?y cùng xem…

社区洞察

其他会员也浏览了