Role of Apache ZooKeeper in Kafka
Role of Apache ZooKeeper in Kafka

Role of Apache ZooKeeper in Kafka

What is ZooKeeper?

Apache ZooKeeper plays the very important role in system architecture as it works in the shadow of more exposed Big Data tools, as Apache Spark or Apache Kafka. In other words, Apache Zookeeper is a distributed, open-source configuration, synchronization service along with naming registry for distributed applications.

Originally, the ZooKeeper framework was built at “Yahoo!”. Because it helps to access their applications in an easy manner. Further, for organized service used by HadoopHBase, it became a standard and other distributed frameworks.

Learn Apache Kafka Streams | Stream Processing Topology

Now, let’s discuss the role of ZooKeeper in Kafka in detail:

ZooKeeper in Kafka 

Basically, Kafka – ZooKeeper stores a lot of shared information about Kafka Consumers and Kafka Brokers, let’s discuss them in detail:

a. Kafka Brokers

Below given are the roles of ZooKeeper in Kafka Broker:

i. State

Zookeeper determines the state. That means, it notices, if the Kafka Broker is alive, always when it regularly sends heartbeats requests. Also, while the Broker is the constraint to handle replication, it must be able to follow replication needs.

ii. Quotas

In order to have different producing and consuming quotas, Kafka Broker allows some clients. This value is set in ZK under /config/clients path. Also, we can change it in bin/kafka-configs.sh script.

iii. Replicas

However, for each topic, Zookeeper in Kafka keeps a set of in-sync replicas (ISR). Moreover, if somehow previously selected leader node fails then on the basis of currently live nodes Apache ZooKeeper will elect the new leader.

Have a look at Apache Kafka Career Scope with Salary trends

iv. Nodes and Topics Registry

Basically, Zookeeper in Kafka stores nodes and topic registries. It is possible to find there all available brokers in Kafka and, more precisely, which Kafka topics are held by each broker, under /brokers/ids and /brokers/topics zNodes, they’re stored. In addition, when it’s started, Kafka broker create the register automatically.

b. Kafka Consumers

i. Offsets

ZooKeeper is the default storage engine, for consumer offsets, in Kafka’s 0.9.1 release. However, all information about how many messages Kafka consumer consumes by each consumer is stored in ZooKeeper.

ii. Registry

Consumers in Kafka also have their own registry as in the case of Kafka Brokers. However, same rules apply to it, ie. as ephemeral zNode, it’s destroyed once consumer goes down and the registration process is made automatically by the consumer.

How does Kafka talk to ZooKeeper?

Here, we will see how Kafka classes are responsible for working with ZooKeeper. Scala class representing Kafka is KafkaServer. Its startup() method, initZk() contains a call to method initializing ZooKeeper connection. There are several methods in this algorithm which we use in this Zookeeper method. Hence, as a result, the method creates the temporary connection to ZooKeeper, in this case. This session is responsible for creating zNodes corresponding to chroot if it’s miAfterwarderwards, this connection closes and creates the final connection held by the server.

After, still inside initZk(), Kafka initializes all persistent zNodes, especially which server uses. We can retrieve there, among others: /consumers, /brokers/ids, /brokers/topics, /config, /admin/delete_topics, /brokers/seqid, /isr_change_notification, /config/topics, /config/clients.

Learn Apache Kafka + Spark Streaming Integration

Now, using synchronization to initialize other members, we can use this created ZooKeeper instance:

  • Replica manager
  • Config manager
  • Coordinator, and controller

ZooKeeper Production Deployment

In order to store persistent cluster metadata, Kafka uses ZooKeeper. Suppose, we lost the Kafka data in |zk|, the mapping of replicas to Kafka Brokers and topic configurations would be lost as well, making our Kafka Cluster no longer functional and potentially resulting in total data loss.

Stable Version of ZooKeeper 

However, the current stable branch is 3.4 and the latest release of that branch is 3.4.9.

Also, we can use “four letter word” ENVI, to find the current version of a running server.

For example:

echo envi | nc localhost 2181

It shows all of the environment information for the ZooKeeper server, including the version.

Note: Only with this version of ZooKeeper, the ZooKeeper start script and tests the functionality of ZooKeeper.

Hardware of ZooKeeper Server

Here are some guidelines, for choosing proper hardware for a cluster of ZooKeeper servers.

a. Memory

Basically, ZooKeeper is not a memory intensive application when handling only data stored by Kafka. Make sure, a minimum of 8 GB of RAM should be there for ZooKeeper use, in a typical production use case.

b. CPU

As a Kafka metadata, ZooKeeper store does not heavily consume CPU resources. ZooKeeper also offers a latency sensitive function. That implies we must consider providing a dedicated CPU core to ensure context switching is not an issue if it must compete for CPU with other processes.

Let’s revise Apache Kafka Security | Need and Components of Kafka

c. Disks

In order to maintain a healthy ZooKeeper cluster, Disk performance is very essential. To perform optimally, we recommend using Solid state drives (SSD) as ZooKeeper must have low latency disk writes.

Read Complete Article>>

Sangamesh KS

Data Science and Power BI Expert

6 年

Tejender singh?

回复

要查看或添加评论,请登录

Malini Shukla的更多文章

社区洞察

其他会员也浏览了