登录查看更多内容

Role of Apache ZooKeeper in Kafka

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

发布日期: 2018年7月24日

What is ZooKeeper?

Apache ZooKeeper plays the very important role in system architecture as it works in the shadow of more exposed Big Data tools, as Apache Spark or Apache Kafka. In other words, Apache Zookeeper is a distributed, open-source configuration, synchronization service along with naming registry for distributed applications.

Originally, the ZooKeeper framework was built at “Yahoo!”. Because it helps to access their applications in an easy manner. Further, for organized service used by Hadoop, HBase, it became a standard and other distributed frameworks.

Learn Apache Kafka Streams | Stream Processing Topology

Now, let’s discuss the role of ZooKeeper in Kafka in detail:

ZooKeeper in Kafka

Basically, Kafka – ZooKeeper stores a lot of shared information about Kafka Consumers and Kafka Brokers, let’s discuss them in detail:

a. Kafka Brokers

Below given are the roles of ZooKeeper in Kafka Broker:

i. State

Zookeeper determines the state. That means, it notices, if the Kafka Broker is alive, always when it regularly sends heartbeats requests. Also, while the Broker is the constraint to handle replication, it must be able to follow replication needs.

ii. Quotas

In order to have different producing and consuming quotas, Kafka Broker allows some clients. This value is set in ZK under /config/clients path. Also, we can change it in bin/kafka-configs.sh script.

iii. Replicas

However, for each topic, Zookeeper in Kafka keeps a set of in-sync replicas (ISR). Moreover, if somehow previously selected leader node fails then on the basis of currently live nodes Apache ZooKeeper will elect the new leader.

Have a look at Apache Kafka Career Scope with Salary trends

iv. Nodes and Topics Registry

Basically, Zookeeper in Kafka stores nodes and topic registries. It is possible to find there all available brokers in Kafka and, more precisely, which Kafka topics are held by each broker, under /brokers/ids and /brokers/topics zNodes, they’re stored. In addition, when it’s started, Kafka broker create the register automatically.

b. Kafka Consumers

i. Offsets

ZooKeeper is the default storage engine, for consumer offsets, in Kafka’s 0.9.1 release. However, all information about how many messages Kafka consumer consumes by each consumer is stored in ZooKeeper.

ii. Registry

Consumers in Kafka also have their own registry as in the case of Kafka Brokers. However, same rules apply to it, ie. as ephemeral zNode, it’s destroyed once consumer goes down and the registration process is made automatically by the consumer.

How does Kafka talk to ZooKeeper?

Here, we will see how Kafka classes are responsible for working with ZooKeeper. Scala class representing Kafka is KafkaServer. Its startup() method, initZk() contains a call to method initializing ZooKeeper connection. There are several methods in this algorithm which we use in this Zookeeper method. Hence, as a result, the method creates the temporary connection to ZooKeeper, in this case. This session is responsible for creating zNodes corresponding to chroot if it’s miAfterwarderwards, this connection closes and creates the final connection held by the server.

After, still inside initZk(), Kafka initializes all persistent zNodes, especially which server uses. We can retrieve there, among others: /consumers, /brokers/ids, /brokers/topics, /config, /admin/delete_topics, /brokers/seqid, /isr_change_notification, /config/topics, /config/clients.

Learn Apache Kafka + Spark Streaming Integration

Now, using synchronization to initialize other members, we can use this created ZooKeeper instance:

Replica manager
Config manager
Coordinator, and controller

ZooKeeper Production Deployment

In order to store persistent cluster metadata, Kafka uses ZooKeeper. Suppose, we lost the Kafka data in |zk|, the mapping of replicas to Kafka Brokers and topic configurations would be lost as well, making our Kafka Cluster no longer functional and potentially resulting in total data loss.

Stable Version of ZooKeeper

However, the current stable branch is 3.4 and the latest release of that branch is 3.4.9.

Also, we can use “four letter word” ENVI, to find the current version of a running server.

For example:

echo envi | nc localhost 2181

It shows all of the environment information for the ZooKeeper server, including the version.

Note: Only with this version of ZooKeeper, the ZooKeeper start script and tests the functionality of ZooKeeper.

Hardware of ZooKeeper Server

Here are some guidelines, for choosing proper hardware for a cluster of ZooKeeper servers.

a. Memory

Basically, ZooKeeper is not a memory intensive application when handling only data stored by Kafka. Make sure, a minimum of 8 GB of RAM should be there for ZooKeeper use, in a typical production use case.

b. CPU

As a Kafka metadata, ZooKeeper store does not heavily consume CPU resources. ZooKeeper also offers a latency sensitive function. That implies we must consider providing a dedicated CPU core to ensure context switching is not an issue if it must compete for CPU with other processes.

Let’s revise Apache Kafka Security | Need and Components of Kafka

c. Disks

In order to maintain a healthy ZooKeeper cluster, Disk performance is very essential. To perform optimally, we recommend using Solid state drives (SSD) as ZooKeeper must have low latency disk writes.

Read Complete Article>>

Sangamesh KS

Data Science and Power BI Expert

6 年

Tejender singh?

要查看或添加评论，请登录

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

2020年1月21日

Top 9 Computer Vision Project Ideas for Beginners

Understand the visual world around us Computer Vision Projects Computer vision is the most powerful and compelling type…
12 Cool Data Science project ideas with source code - "Strengthen your Resume"

2019年11月13日

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

INTRODUCTION Data Science, a field that brings out wonders almost every second day and that’s why it is often regarded…

3 条评论
Python Coding Interview Questions for Experienced - Python FAQ's

2019年9月30日

Python Coding Interview Questions for Experienced - Python FAQ's

Firstly, If you are here, you probably already have a interview scheduled so my friend all the very best with that…
How Data Science is the Backbone of Retail?

2019年7月16日

How Data Science is the Backbone of Retail?

Data Science is having an increasing impact on business models in all industries. And in today’s digital world, data…
How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

2019年7月9日

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

“The goal is to turn data into information, and information into insight” Data Scientist is an analytical data expert…
What’s the Best programming Language to Start a Career in Data Science?

2019年6月25日

What’s the Best programming Language to Start a Career in Data Science?

If you are thinking which programming languages should I learn to Master data Science in 2019? Then you are at the…

1 条评论
11 Reason Why TensorFlow is So Popular

2019年6月15日

11 Reason Why TensorFlow is So Popular

TensorFlow Features | Why TensorFlow Is So Popular TensorFlow gives us an interactive multiplatform programming…
20 Deep Learning Terminologies You Must Know

2019年6月14日

20 Deep Learning Terminologies You Must Know

Deep Learning Terminologies a. Recurrent Neuron It’s one of the best from the Deep Learning Terminologies.

2 条评论
TensorFlow Performance Optimization – Tips To Improve Performance

2019年6月12日

TensorFlow Performance Optimization – Tips To Improve Performance

Ways for TensorFlow Performance Optimization There a variety of ways through which you can optimize your hardware tools…
Top 9 Reasons Why QlikView is Best in BI

2019年6月11日

Top 9 Reasons Why QlikView is Best in BI

QlikView Features Below are the 9 Features of QlikView, which gives us the importance of QlikView, let’s discuss them:…

See all articles

Role of Apache ZooKeeper in Kafka

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

What is ZooKeeper?

ZooKeeper in Kafka

a. Kafka Brokers

b. Kafka Consumers

How does Kafka talk to ZooKeeper?

ZooKeeper Production Deployment

Stable Version of ZooKeeper

Hardware of ZooKeeper Server

a. Memory

b. CPU

c. Disks

Malini Shukla的更多文章

社区洞察

其他会员也浏览了

Kafka Simplified

Apache HBase

--- Apache Kafka vs Solace PubSub+: A Comprehensive Guide for Modern Messaging Systems

Apache Kafka: Core Concepts and Use Cases

Apache Cassandra vs ScyllaDB

Apache Beam Tutorial

Apache Spark

Advanced Concepts in Apache Kafka

ZERO to HERO in 5 minutes in Apache KAFKA

?? Apache Kafka Internals-Part1

What is ZooKeeper?

ZooKeeper in Kafka

a. Kafka Brokers

b. Kafka Consumers

How does Kafka talk to ZooKeeper?

ZooKeeper Production Deployment

Stable Version of ZooKeeper

Hardware of ZooKeeper Server

a. Memory

b. CPU

c. Disks

Malini Shukla的更多文章

Top 9 Computer Vision Project Ideas for Beginners

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

Python Coding Interview Questions for Experienced - Python FAQ's

How Data Science is the Backbone of Retail?

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

What’s the Best programming Language to Start a Career in Data Science?

11 Reason Why TensorFlow is So Popular

20 Deep Learning Terminologies You Must Know

TensorFlow Performance Optimization – Tips To Improve Performance

Top 9 Reasons Why QlikView is Best in BI

社区洞察

其他会员也浏览了

Kafka Simplified

Apache HBase

--- Apache Kafka vs Solace PubSub+: A Comprehensive Guide for Modern Messaging Systems

Apache Kafka: Core Concepts and Use Cases

Apache Cassandra vs ScyllaDB

Apache Beam Tutorial

Apache Spark

Advanced Concepts in Apache Kafka

ZERO to HERO in 5 minutes in Apache KAFKA

?? Apache Kafka Internals-Part1