登录查看更多内容

Kafka Producer: Secrets to Efficient Real-Time Data Processing

Nguyen Trung Nam

Senior Software Architect @ VPS Securities JSC

发布日期: 2024年7月24日

In modern architecture, there are many reasons why an application might need to write messages to Kafka: recording user activities for auditing or analysis, recording information from smart devices, asynchronous communication with other applications, and many more.

Each architecture requires diverse and different needs, such as: the importance of the message, or can we tolerate the loss of some messages? Can we accept accidentally duplicating messages? Are there strict latency or throughput requirements?

In financial transactions involving money or inventory management in e-commerce, there will be a requirement never to lose a single message and not to process duplicate messages. The requirement for low latency, allowing latency <= 500 ms, and the need for high throughput are also crucial. On the other hand, tasks like sending customer notifications can tolerate the loss of some messages. To meet the requirements, we first need to understand the architecture of Kafka. As I described in my previous post about the high-level design of Kafka, you can refer back to it for a comprehensive overview. Once we understand the architecture of Kafka, we can break down its components for detailed analysis and control. We can confidently tweak the parameters and make accurate assessments when applying Kafka to our architecture. In today's post, I will share about the “Producer,” a very important component that is often overlooked and under-analyzed. Regarding the “Producer,” I have two main topics:

High-Level Design
Configuration

These two topics will be covered in separate posts; today’s post will focus on the “High-Level Design” of the producer.

领英推荐

Big Data Architectural patterns - Lambda (λ), Kappa…

Deepanshu Kalra 2 年前

Decoding Data Processing: Navigating the Kappa vs…

Krishna Srikanth K 1 年前

Fundamentals of Data Engineering: Building the…

Sankhyana Consultancy Services Pvt. Ltd. 6 个月前

Let's dive straight into the producer's processing flow. You can follow the flow in the attached diagram. To start producing messages to Kafka, the first step is to create a ProducerRecord object. The required attributes are the topic and value information. Key and partition information are optional and may or may not be included. The producer then sends the ProducerRecord. In this step, the producer performs the following tasks:

Serialize the key and value into byte[] so they can be sent over the network.
Next, the data will be sent to the partitioner. If the partition information is already present in the ProducerRecord, the partitioner does nothing and returns the pre-configured partition information. If we have not specified the partition in the ProducerRecord, the partitioner processes it to select a partition based on the key information. Once the partition is selected, the producer determines which topic and specific partition the record will go to. Then, the producer adds the record to the batches being sent to the same topic and partition. A separate thread is responsible for sending the batch records to the corresponding Kafka broker.
When the Kafka broker receives the message, it sends back a response:If the message is successfully written to Kafka, it returns RecordMetadata, including the topic, partition, and offset of the record in the partition.If the broker fails to write the message, it returns an error. The producer receives the error information, retries a few times before giving up, and throws an error.

By reading this, you should understand the producer's processing flow and identify potential bottlenecks. When high latency and throughput are required, the first thing to consider is the producer. Any issues at the source will impact subsequent flows. In the next post, I will share how to fine-tune and explain the configuration parameters of the producer to optimize for various requirements.

Manh Pham Duc

?Senior Software Engineer

7 个月

Good to know!

1 次回应

Tran Khanh Ly

Software Tester at National Citizen Bank (NCB)

8 个月

Very helpful!

1 次回应

Trung V? Tr?ng

?Technical Lead at VERP | .Net, SQL Server, SQL optimizer, Angular

8 个月

Thanks for sharing

1 次回应

?Hoang Van Quy

Database Developer | Database Administrator | Database Oracle #performanceturning, and #toiuucosodulieu

8 个月

Good to know!, Hóng post ti?p theo

1 次回应

Vincent Granville

Co-Founder, BondingAI.io

8 个月

See also SingleStore for speed and scalability (petabytes of data, real time): https://mltblog.com/4fgKryU

1 次回应

查看更多评论

要查看或添加评论，请登录

Nguyen Trung Nam的更多文章

Stream Processing Pattern

2025年3月8日

Stream Processing Pattern

M?i h? th?ng stream processing ??u có s? khác bi?t, t? m? hình ??n gi?n ch? g?m consumer, logic x? ly và producer, ??n…

3 条评论
Stream Processing Fundamental

2025年3月6日

Stream Processing Fundamental

Có r?t nhi?u nh?m l?n v? khái ni?m stream processing. Nhi?u ??nh ngh?a tr?n l?n chi ti?t tri?n khai, yêu c?u hi?u su?t,…

9 条评论
Suy t? v? blockchain

2025年2月21日

Suy t? v? blockchain

Trong ??i s?ng hàng ngày, khi ?i ch? ho?c siêu th?, làm sao chúng ta bi?t r?ng lo?i rau, th?t, ho?c trái cay chúng ta…

3 条评论
C?i thi?n read throughput khi s? d?ng Elasticsearch

2025年2月1日

C?i thi?n read throughput khi s? d?ng Elasticsearch

Vi?c t?ng s? l??ng replica mang l?i l?i ích hi?u su?t ?áng k?. Replica giúp t?ng th?ng l??ng ??c (read throughput): các…

5 条评论
Reactor Pattern trong NodeJS

2025年2月1日

Reactor Pattern trong NodeJS

Y t??ng chính c?a m? hình Reactor là m?i thao tác I/O s? có m?t handler ?i kèm. Trong Node.

1 条评论
"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

2024年12月13日

"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

Ti?p n?i bài vi?t tr??c, mình chia s? n?t hai m? hình socket ph? bi?n khác. ? các m? hình tr??c chúng ta th?y r?ng s?…

1 条评论
"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

2024年12月11日

"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

Khi chúng ta là l?p trình viên, nh?ng ng??i th? chan chính, chúng ta th??ng s? ti?p c?n nhi?u m? hình l?p trình socket…

2 条评论
Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

2024年12月9日

Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

Bài vi?t tr??c chúng ta ?? hi?u cách kernel Linux ho?t ??ng khi ??c và g?i d? li?u, cách nó x? ly các k?t n?i và chuy?n…
C? ch? client k?t n?i server t??ng d? hóa khó

2024年11月28日

C? ch? client k?t n?i server t??ng d? hóa khó

Trong quá trình phát tri?n ?ng d?ng, chúng ta th??ng ??a các gi?i pháp t?i ?u database, coding,..

5 条评论
Cassandra có th?t s? ghi nhanh nh? l?i ??n?

2024年11月11日

Cassandra có th?t s? ghi nhanh nh? l?i ??n?

?? tr? l?i cho cau h?i ? tiêu ?? chúng ta cùng nhau ?i bóc tách ki?n trúc và c? ch? ghi c?a cassandra. V?y h?y cùng xem…

See all articles

Kafka Producer: Secrets to Efficient Real-Time Data Processing

Nguyen Trung Nam

Senior Software Architect @ VPS Securities JSC

领英推荐

Nguyen Trung Nam的更多文章

社区洞察

其他会员也浏览了

Fundamentals of Data Engineering: Building the Backbone of Modern Data Infrastructure

Change Data Capture (CDC) Events Ingestion

Kafka Schema Registry

The Rise of EtLT(Extract, Tweak Light Transform, Load, Transform) in Modern Data Processing

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Understanding Lambda and Kappa Architectures: Which One is Right for Your Big Data Strategy?

Why Delta Lake Is The Most Widely Used Lakehouse Format In The World?

Navigating the Data Seas: The Crucial Role of Data Engineering in the Data Ecosystem

Navigating Big Data with Kafka: A Beginner's Guide

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering

领英推荐

Nguyen Trung Nam的更多文章

Stream Processing Pattern

Stream Processing Fundamental

Suy t? v? blockchain

C?i thi?n read throughput khi s? d?ng Elasticsearch

Reactor Pattern trong NodeJS

"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

C? ch? client k?t n?i server t??ng d? hóa khó

Cassandra có th?t s? ghi nhanh nh? l?i ??n?

社区洞察

其他会员也浏览了

Fundamentals of Data Engineering: Building the Backbone of Modern Data Infrastructure

Change Data Capture (CDC) Events Ingestion

Kafka Schema Registry

The Rise of EtLT(Extract, Tweak Light Transform, Load, Transform) in Modern Data Processing

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Understanding Lambda and Kappa Architectures: Which One is Right for Your Big Data Strategy?

Why Delta Lake Is The Most Widely Used Lakehouse Format In The World?

Navigating the Data Seas: The Crucial Role of Data Engineering in the Data Ecosystem

Navigating Big Data with Kafka: A Beginner's Guide

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering