登录查看更多内容

Apache Kafka

Vanshika Munshi

HR Manager

发布日期: 2023年11月2日

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect, and provides the Kafka Streams libraries for stream processing applications. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes.

Kafka stores key-value messages that come from arbitrarily many processes called producers. The data can be partitioned into different "partitions" within different "topics". Within a partition, messages are strictly ordered by their offsets (the position of a message within a partition), and indexed and stored together with a timestamp. Other processes called "consumers" can read messages from partitions. For stream processing, Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as Apache Apex, Apache Beam, Apache Flink, Apache Spark, Apache Storm, and Apache NiFi.

Kafka runs on a cluster of one or more servers (called brokers), and the partitions of all topics are distributed across the cluster nodes. Additionally, partitions are replicated to multiple brokers. This architecture allows Kafka to deliver massive streams of messages in a fault-tolerant fashion and has allowed it to replace some of the conventional messaging systems like Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Since the 0.11.0.0 release, Kafka offers transactional writes, which provide exactly-once stream processing using the Streams API.

Kafka supports two types of topics: Regular and compacted. Regular topics can be configured with a retention time or a space bound. If there are records that are older than the specified retention time or if the space bound is exceeded for a partition, Kafka is allowed to delete old data to free storage space. By default, topics are configured with a retention time of 7 days, but it's also possible to store data indefinitely. For compacted topics, records don't expire based on time or space bounds. Instead, Kafka treats later messages as updates to earlier messages with the same key and guarantees never to delete the latest message per key. Users can delete messages entirely by writing a so-called tombstone message with null-value for a specific key.

领英推荐

A Comprehensive Overview Of Apache Kafka

InRhythm 2 年前

How to Integrate Apache Kafka into Your Web App?

Quokka Labs 1 年前

Spring Boot + Apache Kafka

Ahmed El-Sayed 2 年前

There are five major APIs in Kafka:

Producer API – Permits an application to publish streams of records.
Consumer API – Permits an application to subscribe to topics and processes streams of records.
Connector API – Executes the reusable producer and consumer APIs that can link the topics to the existing applications.
Streams API – This API converts the input streams to output and produces the result.
Admin API – Used to manage Kafka topics, brokers, and other Kafka objects.

The consumer and producer APIs are decoupled from the core functionality of Kafka through an underlying messaging protocol. This allows writing compatible API layers in any programming language that are as efficient as the Java APIs bundled with Kafka. The Apache Kafka project maintains a list of such third party APIs.

要查看或添加评论，请登录

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

2024年8月13日

Key Data Engineer Skills and Responsibilities

Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…
What Is Financial Planning? Definition, Meaning and Purpose

2024年8月12日

What Is Financial Planning? Definition, Meaning and Purpose

Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…
What is Power BI?

2024年8月10日

What is Power BI?

The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…
Abinitio Graphs

2024年8月8日

Abinitio Graphs

Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…
Abinitio Interview Questions

2024年8月6日

Abinitio Interview Questions

1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…
Big Query

2024年8月5日

Big Query

BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…
Responsibilities of Abinitio Developer

2024年8月3日

Responsibilities of Abinitio Developer

Job Description Project Role : Application Developer Project Role Description : Design, build and configure…
Abinitio Developer

2024年8月2日

Abinitio Developer

Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…
Data Engineer

2024年8月1日

Data Engineer

Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…
Pyspark

2024年7月31日

Pyspark

What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

See all articles

Apache Kafka

Vanshika Munshi

HR Manager

领英推荐

Vanshika Munshi的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Apache Kafka: Best Online Courses to Master Real-Time Data Streaming (2025)

A Complete Guide to Apache Kafka for Developers (or, everything I know about Kafka in one place)

Understanding and Implementing Kafka for Scalable Data Streaming

August 2023 - Iceberg Community News

Kafka Producer And Consumer In Spring Boot

Async Internode Messaging in Apache Cassandra? 4.0 - Hands-On

Scalable Data Storage in Spring Java

Apache Kafka – Complete Guide

Apache Kafka

Cheers to Real-time Analytics with Apache Flink : Part 1 of 3

领英推荐

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

What Is Financial Planning? Definition, Meaning and Purpose

What is Power BI?

Abinitio Graphs

Abinitio Interview Questions

Big Query

Responsibilities of Abinitio Developer

Abinitio Developer

Data Engineer

Pyspark

社区洞察

其他会员也浏览了

Unlocking the Power of Apache Kafka: Best Online Courses to Master Real-Time Data Streaming (2025)

A Complete Guide to Apache Kafka for Developers (or, everything I know about Kafka in one place)

Understanding and Implementing Kafka for Scalable Data Streaming

August 2023 - Iceberg Community News

Kafka Producer And Consumer In Spring Boot

Async Internode Messaging in Apache Cassandra? 4.0 - Hands-On

Scalable Data Storage in Spring Java

Apache Kafka – Complete Guide

Apache Kafka

Cheers to Real-time Analytics with Apache Flink : Part 1 of 3