- Topic is a place to store messages, messages are contained within partitions, which are parts of the topic, and each message within a partition is identified by a unique offset index.
- When producer send a new message to the topic. Apache Kafka appends the new messages to the last offset index in order.
- A consumer, who subscribe to a topic, pull message from the partition in the same order as they are stored.
- Messages in Kafka are retained for a default period of 7 days before they are deleted. However, it's essential to note that even after deletion, the order of the remaining messages remains unchanged.
- Topics can have multiple partitions, allowing for horizontal scaling and efficient distribution of data. Producers and consumers from different applications can interact with these partitions simultaneously, enabling a flexible and versatile messaging system.
- Messages are stored at offsets within partitions, which are in turn stored within topics.
- These topics, along with their partitions and messages, are stored on disk and managed by servers. In Kafka, these servers are known as brokers and form a Kafka cluster.
- Deploying everything on a single server poses a risk of single-point failure. If this server fails, the entire system shuts down.
- That's why it's common practice to implement Kafka with multiple nodes across multiple servers, creating a replicated cluster to ensure system reliability and fault tolerance.
- When replication is enabled, Kafka designates one copy, known as the "active copy", as the leader, while the other copies are designated as "followers". When a producer writes new data to a topic, it is first written to the leader node within the Kafka cluster and then replicated to all follower nodes.
- Apache Kafka utilizes topics with partitions to organize messages, each identified by a unique offset.
- Producers add messages to topics, while consumers retrieve them in sequence from partitions.
- Message retention is set to 7 days, ensuring data persistence.
- Kafka's distributed architecture with multiple brokers ensures fault tolerance, employing replication to designate leaders and followers for data redundancy.