Kafka Simplified
Abhishek Gaddhyan
Apache Kafka is a distributed messaging system used to handle large volumes of real-time data streams. It is widely used for building real-time data pipelines and streaming applications. Kafka works on a publish-subscribe model, where producers send data (messages) to topics and consumers read from those topics.
This guide will walk you through the basic concepts of Kafka, Zookeeper, Kafka brokers, producers, consumers, partitions, and replicas. We will cover how to set up Kafka and demonstrate using both 1 partition with 1 replication and 4 partitions with 3 replicas.
What is Apache Kafka?
Step 1: Set Up Zookeeper
Kafka uses Zookeeper for maintaining metadata and leadership election. Kafka cannot run without Zookeeper. Let's start by setting up Zookeeper first.
cd kafka_2.13-2.8.0
Start Zookeeper: Kafka comes with a built-in Zookeeper configuration file. Run the following command to start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Zookeeper will start on the default port 2181.
Step 2: Start Kafka Broker
Now that Zookeeper is running, you can start the Kafka broker. The Kafka broker handles the actual processing of producing and consuming messages.
bin/kafka-server-start.sh config/server.properties
By default, Kafka will run on port 9092.
==> Both the ports are listening
bash-3.2$ netstat -an | grep -E '2181|9092'
tcp46 0 0 *.2181 *.* LISTEN -> ZooKeeper
tcp46 0 0 *.9092 *.* LISTEN -> KafkaServer
Step 3: Creating Topics (Partitions and Replicas)
In Kafka, topics are split into partitions. A partition allows Kafka to scale and distribute the data across brokers. Each partition can have replicas, which are copies of the data to ensure availability and fault tolerance.
Scenario 1: Create a Topic with 1 Partition and 1 Replication
In this example, we’ll create a topic with 1 partition and 1 replica.
bin/kafka-topics.sh --create --topic test-topic-1 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
This command creates a topic named test-topic-1 with 1 partition and 1 replica. Since there is only 1 replica, the data will only be stored on the leader and there won’t be any backup copies.
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Step 4: Produce Messages to the Topic
Now, let's produce some messages to this topic. In a new terminal, use the following command to start producing messages:
bash-3.2$ bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>Hello Kafka!
>Test Message 1
>Test Message 2
>Test Message 3
>Test Message 4
>This will be consumed by Kafka consumer
After running this, you can type messages into the console, and they will be sent to the test topic.
Step 5: Consume Messages from the Topic
In another terminal, use the following command to consume messages from the test topic:
bash-3.2$ bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
Hello Kafka!
Test Message 1
Test Message 2
Test Message 3
Test Message 4
This will be consumed by Kafka consumer
You should now see the messages you produced earlier, as the consumer reads messages from the test topic.
Scenario 2: Create a Topic with 4 Partitions and 3 Replicas
When you create multiple partitions in a Kafka topic (like 4 partitions), it's important to have multiple brokers in your Kafka cluster to distribute those partitions properly. Kafka can distribute partitions across available brokers, and the number of brokers must be at least equal to the replication factor to ensure proper data replication.
Why Do We Need Multiple Brokers?
Scenario Without Enough Brokers
If you attempt to create a topic with a replication factor of 3 but only have 1 broker running, you’ll see an error like:
Error while executing topic command : Replication factor: 3 larger than available brokers: 1.
Steps to Add Brokers and Use Multiple Brokers
bash-3.2$ cp config/server.properties config/server-1.properties
bash-3.2$ cp config/server.properties config/server-2.properties
Now, start the new brokers with the following commands:
bash-3.2$ bin/kafka-server-start.sh config/server-1.properties
bash-3.2$ bin/kafka-server-start.sh config/server-2.properties
bash-3.2$ bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092
This should list all brokers in the cluster.
Create the Topic with Multiple Brokers
Now that you have multiple brokers running, you can create the topic with 4 partitions and 3 replicas. Kafka will automatically distribute the partitions across the brokers.
bash-3.2$ bin/kafka-topics.sh --create --topic test-topic-2 --bootstrap-server localhost:9091,localhost:9092,localhost:9093 --partitions 4 --replication-factor 3
bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9091
The output should show how the partitions and replicas are distributed across brokers.
Kafka Producers
A Kafka producer sends data to a Kafka topic. Let’s create a simple producer to send some messages to test-topic-2.
bash-3.2$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-2
> Hello Kafka!
> This is a test message.
> Kafka is awesome.
Kafka Consumers
A Kafka consumer reads data from a Kafka topic. Let's consume the messages we sent in the previous step.
bash-3.2$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic-2 --from-beginning
This will start consuming messages from the beginning of the topic. You should see the messages you sent using the producer:
Hello Kafka!
This is a test message.
Kafka is awesome.
Understanding Partitions and Replicas
Now, let’s dive into partitions and replicas.
For example, if you have 4 partitions and 3 replicas, the data is spread across 3 brokers, with each partition having 1 leader and 2 replicas.
You can see the partition distribution and replicas by describing the topic:
bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9092
This will give you details about how the partitions and replicas are distributed across the brokers.
Verifying Fault Tolerance
If you stop a broker, Kafka will automatically elect a new leader for any partitions whose leader was on the stopped broker. To test this:
bash-3.2$ bin/kafka-server-stop.sh
bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9092
You’ll notice that Kafka promotes one of the replicas to become the leader for the affected partition.
Now if you run the consumer to fetch the messages from beginning, it will still get all messages
bash-3.2$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic-2 --from-beginning
Hello Kafka!
This is a test message.
Kafka is awesome.
Now with the basic understanding of Kafka, you can explore online courses to understand it better. Thank you!