登录查看更多内容

Exploring Kafka's Architecture: The Foundation for Real-Time Data Processing

Nguyen Trung Nam

Senior Software Architect @ VPS Securities JSC

发布日期: 2024年7月22日

In today's technology landscape, real-time data processing is a crucial requirement. Apache Kafka, a distributed data processing platform, has become a key tool for transmitting and processing large data streams. With its scalability, reliability, and high performance, Kafka helps businesses efficiently collect, manage, and analyze data.

Case Studies Using Kafka:

In the Financial Sector:

Conditional Orders (Stream of market price signals)
Notification Systems (Creating multiple stream channels for millions of notifications, emails, SMS)
Synchronization between cache data sources and persistent data sources
Asset Information Synchronization
Corporate Bond Robo Trader
Price Board
And more...

In the E-commerce Sector:

Inventory Management
ETL for Data Warehouses
And more...

Through various system architectures, from simple to complex, including high-load systems requiring high consistency, Kafka has proven to be highly reliable and an indispensable component in my architectural designs and deployments.

Throughout the implementation process, there are many interesting aspects that I will gradually share in upcoming articles, from usage tips to important configurations for Kafka clients and servers tailored to different use cases.

In this article, I will summarize Kafka's architecture to help you understand its key components:

领英推荐

Understanding Kafka System Design: Diving into Kafka…

Lavakumar Thatisetti 1 年前

ServiceNow RoboArchitect

Nicola Attico 1 年前

Delta Live Tables in Databricks Series —Part 2 — The…

Krishna Yogi Kolluru 8 个月前

Kafka Clients:

Producer: Pushes messages to specific topics.
Consumer Group: Subscribes to topics and consumes messages.

Kafka Servers:

Core Service and Storage:

Broker: Holds multiple partitions. A partition holds a subset of messages for a topic.
Storage:

- Data Storage: Messages are persisted in data storage in partitions.

- State Storage: Consumer states are managed by state storage.

- Metadata Storage: Configuration and properties of topics are persisted in metadata storage.

Coordination Service:

Service Discovery: Identifies which brokers are alive.
Leader Election: One of the brokers is selected as the active controller. There is only one active controller in the cluster. The active controller is responsible for assigning partitions.
Apache Zookeeper or etcd are commonly used to elect a controller.

Fatemeh Haeri

Data Engineer

7 个月

Very helpful!

Tran Khanh Ly

Software Tester at National Citizen Bank (NCB)

8 个月

Useful tips

Pradeep Sekar

Junior Developer Advocate @ Streambased

8 个月

Nguyen Trung Nam Great article about the basics of the Kafka Article its short, brief and concise. I would like to share something which helps you to expand on the use case on how to use kafka. I would like to know your opinion on it. With the introduction of KIP-405, Kafka now supports unlimited tier storage, allowing it to handle ingestion, storage, and processing all within a single platform. This update means Kafka can serve as both your streaming and data storage, effectively simplifying the traditional data engineering workflow by eliminating the need for separate data storage, processing, and integration layers. Streambased takes this a step further by enabling you to perform batch analytics using SQL directly from Kafka. With seamless JDBC driver connectivity, Streambased allows you to integrate your data directly with your favorite BI tools. This not only reduces complexity but also accelerates your data pipeline, making real-time insights more accessible than ever.

Dinh Cong Phan

?A curious and dedicated software developer

8 个月

C?m ?n anh Nam ?? chia s?. Ch? ??i các bài vi?t và use case ?ng d?ng Kafka c?a anh trong th?c t? ?.

D??ng Xuan ?à

??Java Software Engineer | Oracle Certified Professional

8 个月

Thanks for sharing

1 次回应

查看更多评论

要查看或添加评论，请登录

Nguyen Trung Nam的更多文章

Stream Processing Pattern

2025年3月8日

Stream Processing Pattern

M?i h? th?ng stream processing ??u có s? khác bi?t, t? m? hình ??n gi?n ch? g?m consumer, logic x? ly và producer, ??n…

3 条评论
Stream Processing Fundamental

2025年3月6日

Stream Processing Fundamental

Có r?t nhi?u nh?m l?n v? khái ni?m stream processing. Nhi?u ??nh ngh?a tr?n l?n chi ti?t tri?n khai, yêu c?u hi?u su?t,…

9 条评论
Suy t? v? blockchain

2025年2月21日

Suy t? v? blockchain

Trong ??i s?ng hàng ngày, khi ?i ch? ho?c siêu th?, làm sao chúng ta bi?t r?ng lo?i rau, th?t, ho?c trái cay chúng ta…

3 条评论
C?i thi?n read throughput khi s? d?ng Elasticsearch

2025年2月1日

C?i thi?n read throughput khi s? d?ng Elasticsearch

Vi?c t?ng s? l??ng replica mang l?i l?i ích hi?u su?t ?áng k?. Replica giúp t?ng th?ng l??ng ??c (read throughput): các…

5 条评论
Reactor Pattern trong NodeJS

2025年2月1日

Reactor Pattern trong NodeJS

Y t??ng chính c?a m? hình Reactor là m?i thao tác I/O s? có m?t handler ?i kèm. Trong Node.

1 条评论
"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

2024年12月13日

"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

Ti?p n?i bài vi?t tr??c, mình chia s? n?t hai m? hình socket ph? bi?n khác. ? các m? hình tr??c chúng ta th?y r?ng s?…

1 条评论
"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

2024年12月11日

"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

Khi chúng ta là l?p trình viên, nh?ng ng??i th? chan chính, chúng ta th??ng s? ti?p c?n nhi?u m? hình l?p trình socket…

2 条评论
Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

2024年12月9日

Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

Bài vi?t tr??c chúng ta ?? hi?u cách kernel Linux ho?t ??ng khi ??c và g?i d? li?u, cách nó x? ly các k?t n?i và chuy?n…
C? ch? client k?t n?i server t??ng d? hóa khó

2024年11月28日

C? ch? client k?t n?i server t??ng d? hóa khó

Trong quá trình phát tri?n ?ng d?ng, chúng ta th??ng ??a các gi?i pháp t?i ?u database, coding,..

5 条评论
Cassandra có th?t s? ghi nhanh nh? l?i ??n?

2024年11月11日

Cassandra có th?t s? ghi nhanh nh? l?i ??n?

?? tr? l?i cho cau h?i ? tiêu ?? chúng ta cùng nhau ?i bóc tách ki?n trúc và c? ch? ghi c?a cassandra. V?y h?y cùng xem…

See all articles

Exploring Kafka's Architecture: The Foundation for Real-Time Data Processing

Nguyen Trung Nam

Senior Software Architect @ VPS Securities JSC

Case Studies Using Kafka:

领英推荐

Kafka Clients:

Kafka Servers:

Core Service and Storage:

Coordination Service:

Nguyen Trung Nam的更多文章

社区洞察

其他会员也浏览了

Modern Snowflake Stack in 2024

Enterprise DataHub

Top 10 Data Pipeline Tools: Use Cases

Kafka vs. RabbitMQ: Key Differences in Architecture and Messaging

DataOps: an Automation Journey in?Tuidi

Exploring Apache Airflow Architecture and Core Components

System Design: Best Practices from Experience

Unleashing the Power of Apache Flink: From Novice to Architect in the Software Industry

Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Space-Based Architecture: Resolving Data Consistency, Performance, and Scalability Challenges in Distributed Systems

Case Studies Using Kafka:

领英推荐

Kafka Clients:

Kafka Servers:

Core Service and Storage:

Coordination Service:

Nguyen Trung Nam的更多文章

Stream Processing Pattern

Stream Processing Fundamental

Suy t? v? blockchain

C?i thi?n read throughput khi s? d?ng Elasticsearch

Reactor Pattern trong NodeJS

"Chan kinh" backend developer c?n ph?i c?m ng? (P2)

"Chan kinh" backend developer c?n ph?i c?m ng? (P1)

Listener, Acceptor và Reader khái ni?m làm backend c?n ph?i bi?t

C? ch? client k?t n?i server t??ng d? hóa khó

Cassandra có th?t s? ghi nhanh nh? l?i ??n?

社区洞察

其他会员也浏览了

Modern Snowflake Stack in 2024

Enterprise DataHub

Top 10 Data Pipeline Tools: Use Cases

Kafka vs. RabbitMQ: Key Differences in Architecture and Messaging

DataOps: an Automation Journey in?Tuidi

Exploring Apache Airflow Architecture and Core Components

System Design: Best Practices from Experience

Unleashing the Power of Apache Flink: From Novice to Architect in the Software Industry

Architecture Talk-2: Teradata - Vantage Architecture (MPP - Massive Parallel Processing)

Space-Based Architecture: Resolving Data Consistency, Performance, and Scalability Challenges in Distributed Systems