Kafka Simplified

Kafka Simplified

Apache Kafka is a distributed messaging system used to handle large volumes of real-time data streams. It is widely used for building real-time data pipelines and streaming applications. Kafka works on a publish-subscribe model, where producers send data (messages) to topics and consumers read from those topics.

This guide will walk you through the basic concepts of Kafka, Zookeeper, Kafka brokers, producers, consumers, partitions, and replicas. We will cover how to set up Kafka and demonstrate using both 1 partition with 1 replication and 4 partitions with 3 replicas.


What is Apache Kafka?


  • Kafka Topics: A topic is a stream of messages or events. It is like a queue where messages are produced and consumed.
  • Kafka Producers: Producers write data to Kafka topics.
  • Kafka Consumers: Consumers read data from Kafka topics.
  • Kafka Brokers: Kafka brokers are servers that store and manage messages in Kafka.
  • Zookeeper: Zookeeper is a distributed coordination service that Kafka relies on for managing its metadata and leader election between brokers.



Step 1: Set Up Zookeeper

Kafka uses Zookeeper for maintaining metadata and leadership election. Kafka cannot run without Zookeeper. Let's start by setting up Zookeeper first.


  1. Download Apache Kafka from the official Kafka website.
  2. Navigate to Kafka directory:


cd kafka_2.13-2.8.0        

Start Zookeeper: Kafka comes with a built-in Zookeeper configuration file. Run the following command to start Zookeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties        

Zookeeper will start on the default port 2181.


Step 2: Start Kafka Broker

Now that Zookeeper is running, you can start the Kafka broker. The Kafka broker handles the actual processing of producing and consuming messages.


  1. Start Kafka Broker:


bin/kafka-server-start.sh config/server.properties        

By default, Kafka will run on port 9092.

==> Both the ports are listening

bash-3.2$ netstat -an | grep -E '2181|9092'
tcp46      0      0  *.2181                 *.*                    LISTEN  -> ZooKeeper
tcp46      0      0  *.9092                 *.*                    LISTEN  -> KafkaServer        

Step 3: Creating Topics (Partitions and Replicas)

In Kafka, topics are split into partitions. A partition allows Kafka to scale and distribute the data across brokers. Each partition can have replicas, which are copies of the data to ensure availability and fault tolerance.

Scenario 1: Create a Topic with 1 Partition and 1 Replication

In this example, we’ll create a topic with 1 partition and 1 replica.


  • Create the Topic:


bin/kafka-topics.sh --create --topic test-topic-1 --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1        

This command creates a topic named test-topic-1 with 1 partition and 1 replica. Since there is only 1 replica, the data will only be stored on the leader and there won’t be any backup copies.


  • List the Topics: To verify that the topic was created, run:


bin/kafka-topics.sh --list --bootstrap-server localhost:9092        

Step 4: Produce Messages to the Topic

Now, let's produce some messages to this topic. In a new terminal, use the following command to start producing messages:

bash-3.2$ bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
>Hello Kafka!
>Test Message 1
>Test Message 2
>Test Message 3
>Test Message 4
>This will be consumed by Kafka consumer        

After running this, you can type messages into the console, and they will be sent to the test topic.


Step 5: Consume Messages from the Topic

In another terminal, use the following command to consume messages from the test topic:

bash-3.2$ bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
Hello Kafka!
Test Message 1
Test Message 2
Test Message 3
Test Message 4
This will be consumed by Kafka consumer        

You should now see the messages you produced earlier, as the consumer reads messages from the test topic.


Scenario 2: Create a Topic with 4 Partitions and 3 Replicas


When you create multiple partitions in a Kafka topic (like 4 partitions), it's important to have multiple brokers in your Kafka cluster to distribute those partitions properly. Kafka can distribute partitions across available brokers, and the number of brokers must be at least equal to the replication factor to ensure proper data replication.

Why Do We Need Multiple Brokers?


  • Partitions: When you create a topic with 4 partitions, Kafka will try to distribute those partitions across the available brokers. If you have only 1 broker, all the partitions for the topic will be stored on that broker, which would limit scalability and fault tolerance.
  • Replicas: When you set a replication factor of 3, Kafka will create 2 replica copies of each partition. For this, you'll need at least 3 brokers, so each partition can have a leader and 2 replicas stored on different brokers. Otherwise, Kafka will not be able to create the necessary replicas if there are fewer brokers than the replication factor.


Scenario Without Enough Brokers

If you attempt to create a topic with a replication factor of 3 but only have 1 broker running, you’ll see an error like:

Error while executing topic command : Replication factor: 3 larger than available brokers: 1.        

Steps to Add Brokers and Use Multiple Brokers


  • Start Multiple Kafka Brokers: To properly distribute partitions and replicas, you’ll need to start multiple Kafka brokers. You can start as many Kafka brokers as your system resources allow, but for simplicity, let’s start 3 brokers in this example.
  • Copy the Existing Kafka Configuration: To create additional brokers, copy the configuration file (server.properties) and modify the necessary settings for each broker. For example:


bash-3.2$ cp config/server.properties config/server-1.properties
bash-3.2$ cp config/server.properties config/server-2.properties        


  • Edit the Configuration for Each Broker:




Example:



broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=/tmp/kafka-logs-1        



broker.id=2
listeners=PLAINTEXT://localhost:9094
log.dirs=/tmp/kafka-logs-2        


  • Start the Additional Brokers:


Now, start the new brokers with the following commands:

bash-3.2$ bin/kafka-server-start.sh config/server-1.properties
bash-3.2$ bin/kafka-server-start.sh config/server-2.properties        


  • Verify the Brokers: You can verify that all brokers are running by using the following command:


bash-3.2$ bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092        

This should list all brokers in the cluster.

Create the Topic with Multiple Brokers

Now that you have multiple brokers running, you can create the topic with 4 partitions and 3 replicas. Kafka will automatically distribute the partitions across the brokers.


  • Create the Topic with 4 partitions and 3 replicas:


bash-3.2$ bin/kafka-topics.sh --create --topic test-topic-2 --bootstrap-server localhost:9091,localhost:9092,localhost:9093 --partitions 4 --replication-factor 3        


  • bootstrap-server: You provide the list of Kafka brokers (in this case, localhost:9091, localhost:9092, localhost:9093).
  • Kafka will now distribute the 4 partitions across the 3 brokers, with 1 leader and 2 replicas for each partition.
  • Verify the Topic: After creating the topic, you can verify that it has 4 partitions and 3 replicas using:


bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9091        

The output should show how the partitions and replicas are distributed across brokers.


Kafka Producers

A Kafka producer sends data to a Kafka topic. Let’s create a simple producer to send some messages to test-topic-2.


  • Run the Producer:


bash-3.2$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-2        


  • Send Messages: Once the producer starts, you can send some messages to the topic. For example:


> Hello Kafka!
> This is a test message.
> Kafka is awesome.        

Kafka Consumers

A Kafka consumer reads data from a Kafka topic. Let's consume the messages we sent in the previous step.


  • Run the Consumer:


bash-3.2$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic-2 --from-beginning        

This will start consuming messages from the beginning of the topic. You should see the messages you sent using the producer:

Hello Kafka!
This is a test message.
Kafka is awesome.        

Understanding Partitions and Replicas

Now, let’s dive into partitions and replicas.


  • Partitions: Kafka splits data into partitions to allow parallel processing. Each partition is like a separate queue where messages are stored.
  • Replicas: Replicas are copies of each partition to ensure that data is still available even if a broker fails.


For example, if you have 4 partitions and 3 replicas, the data is spread across 3 brokers, with each partition having 1 leader and 2 replicas.


  1. Leader Election: Kafka automatically chooses a leader for each partition. The leader handles all reads and writes for that partition.
  2. Replication: Kafka ensures that each partition has copies (replicas) stored on different brokers for fault tolerance. If the leader fails, one of the replicas is promoted to become the new leader.


You can see the partition distribution and replicas by describing the topic:

bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9092        

This will give you details about how the partitions and replicas are distributed across the brokers.


Verifying Fault Tolerance

If you stop a broker, Kafka will automatically elect a new leader for any partitions whose leader was on the stopped broker. To test this:


  • Stop a Kafka Broker: Press Ctrl+C to stop the broker or run the following command to stop it:


bash-3.2$ bin/kafka-server-stop.sh        


  • Check the Topic Again: If you check the topic using:


bash-3.2$ bin/kafka-topics.sh --describe --topic test-topic-2 --bootstrap-server localhost:9092        

You’ll notice that Kafka promotes one of the replicas to become the leader for the affected partition.

Now if you run the consumer to fetch the messages from beginning, it will still get all messages

bash-3.2$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic-2 --from-beginning
Hello Kafka!
This is a test message.
Kafka is awesome.
bash-3.2$        


Now with the basic understanding of Kafka, you can explore online courses to understand it better. Thank you!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了