登录查看更多内容

Building a Hadoop Cluster from the Powerful Automation Tool: Ansible

Komal Suthar

Technical Support Engineer@Red Hat | RHCA

发布日期: 2020年12月15日

Hello everyone,

Here's my new blog where I am going to show you how to configure the whole Hadoop Cluster basically I mean configuring one node as Namenode and the other as Datanode.

What's new...?

This whole cluster setup will be not done by me but one of the powerful tool Automation Tool in the market like Ansible will do this for me.

Sounds cool....

Firstly we have to create one Virtual Machine as Ansible Controller node and two more VM's one for Namenode/Master and other for Datanode/Slave. This two VM's will work as Target node for Ansible.

Now we will install Ansible in our Controller node by using this pip command coz Ansible is written in Python language. To check use this command...

#pip3 install ansible

As Ansible is an Agentless so we don't have need to install the ansible software in our Target node.

Now as Controller node don't have any information than hoe it will do the configuration on the Targets, so for that we have to give both Target Node's IP, user name and their login and password to the Controller node in /etc/ip.txt file so that Controller node can do the configuration. Basically it is known as Inventory where we can give all the Target node IP we like to Configure. One node information is given in one particular line.

And now we have to give this file name to the ansible by defining this file name in the ansible configuration file. Bydefault configuration file is not given by ansible so we have to create a folder use this following commands:-

#mkdir /etc/ansible

#vim /etc/ansible/ansible.cfg

Now write the same in this file...

It is always a good practice to check that all the Managed nodes are connected by pinging.

Now we will configure the Namenode and the Datanode from the Ansible Automation Tool.

Before using the Ansible tool it is always a good practise to make a Hard coded or Soft coded note for all the steps we need to this configuration.

Ansible can be used in two ways firstly by using the cli method basically known as Ad-hoc Commands and secondly by creating a Playbook.

First we will run the adhoc commands step by step.

Note: If you find difficulty in any of the ad-hoc command then go for the playbook.

Step 1: Copying jdk and hadoop software in all

From my Hadoop folder where I have downloaded both the softwares will be copied to all the Master and Slave nodes.

#ansible all -m copy -a "src=/hadoop/hadoop-1.2.1-1.x86_64.rpm dest=/root"

#ansible all -m copy -a "src=/hadoop/jdk-8u171-linux-x64.rpm dest=/root"

Step 2: Clearing the caches in all

In some cases while we download these two softwares if the base Redhat OS have less Ram then we will not able to do the installation part. So in this step we are going to clear the caches.

#ansible all -m shell -a "echo 3 > /proc/sys/vm/drop_caches"

Step 3: Installing JDK and Hadoop in both in all

#ansible all -m shell -a "rpm -i jdk-8u171-linux-x64.rpm"

#ansible all -m shell -a "rpm -i hadoop-1.2.1-1.x86_64.rpm"

Step 4: Configuring the Master node

Now we are going to do the configuration of the Namenode and for doing so these are the following steps:

Creating a Namenode Directory

# ansible Master -m file -a "name='nn' state=directory"

Copying hdfs-site.xml

#ansible Master -m copy -a "src=/hadoop/nn_hdfs-site.xml dest=/etc/hadoop/hdfs-site.xml"

Copying core-site.xml

#ansible Master -m copy -a "src=/hadoop/nn_core-site.xml dest=/etc/hadoop/core-site.xml"

Format the Master node

#ansible Master -m shell -a "echo Y | hadoop namenode -format

Start the services

#ansible Master -m shell -a "hadoop-daemon.sh start namenode"

Step 5: Configuring the Slave node

Now we are going to do the configuration of the Datanode and for doing so these are the following steps:

Creating a Datanode Directory

# ansible Slave -m file -a "name='dn' state=directory"

Copying hdfs-site.xml

#ansible Slave -m copy -a "src=/hadoop/dn_hdfs-site.xml dest=/etc/hadoop/hdfs-site.xml"

Copying core-site.xml

#ansible Slave -m copy -a "src=/hadoop/dn_core-site.xml dest=/etc/hadoop/core-site.xml"

Starting the Datanode services

#ansible Slave -m shell -a "hadoop-daemon.sh start datanode"

Here is the Playbook of the same.

To run this Playbook use the following command...

#ansible-playbook cluster.yml

Here my software's are installed and running this playbook again and again so that's why it gives this error. It will be ignored.

As we can see our Namenode services has been started.

Now Datanode services are also started.

After running the Playbook in the Controller node we can check it from the Target node whether it is launched successfully or not. By using the "jps" command.

Hope you find this artical intresting.

Thank you !!

Yash Raj

4 年

great job

Komal Suthar

Technical Support Engineer@Red Hat | RHCA

4 年

To download the playbook.. Visit my GitHub : https://github.com/24-komal/Ansible_Hadoop_Cluster_Configuration

Aditya Raj

DevOps Engineer @ Hike || AWS Certified || RHCE || RHCSA || DevOps || Cloud Computing

4 年

Amazing you have explained each and every steps

查看更多评论

要查看或添加评论，请登录

Komal Suthar的更多文章

Mastering Kubernetes: Step-by-Step Guide to Setting Up Your Cluster on AWS Cloud

2024年8月6日

Mastering Kubernetes: Step-by-Step Guide to Setting Up Your Cluster on AWS Cloud

Setting up a Kubernetes cluster on AWS can seem daunting, but with a clear backend knowledge, step-by-step guide, it…

5 条评论
Openshift - Container Platform

2021年5月14日

Openshift - Container Platform

When it comes to container orchestration, the first thing that comes to mind is “Kubernetes” but now OpenShift is the…

3 条评论
MongoDB: The NoSQL Non-Relational Database

2021年5月13日

MongoDB: The NoSQL Non-Relational Database

MongoDB is a rich document-oriented NoSQL database. It is one of the most popular open-source NoSQL databases written…
What Is Jenkins? Why To Use It? How Netflix Use This Tool?

2021年4月18日

What Is Jenkins? Why To Use It? How Netflix Use This Tool?

Jenkins is a Continuous Integration (CI) server or tool which is written in java. It provides Continuous Integration…

8 条评论
Dynamic provisioning of Jenkins slave node on AWS Cloud using Docker

2021年4月16日

Dynamic provisioning of Jenkins slave node on AWS Cloud using Docker

Jenkins is the most used open-source tool in the world, It’s Master-slave architecture is great for scalability to do…

8 条评论
What is AWS SQS – Benefits, Queue & Function

2021年3月1日

What is AWS SQS – Benefits, Queue & Function

SQS-stands for Simple Queue Service-is a service operated by AWS to handle queueing of messages. One service sends…
Case-study on Azure Kubernetes Service (AKS) - Finastra

2021年3月1日

Case-study on Azure Kubernetes Service (AKS) - Finastra

In this growing world every day we have to launch many applications on different different operating system and it…

2 条评论
How one of the pioneers - Google is using Neural Networks (NN)

2021年3月1日

How one of the pioneers - Google is using Neural Networks (NN)

Recently there has been a great buzz around the words “neural network” in the field of computer science and it has…

2 条评论
Kubernetes Cluster on AWS ec2 instances: Ansible

2021年2月13日

Kubernetes Cluster on AWS ec2 instances: Ansible

As we all know, Kubernetes is an open source software that allows us to deploy and manage containerized applications at…

4 条评论
Setup a Multi-Node Hadoop Cluster using Docker

2021年1月19日

Setup a Multi-Node Hadoop Cluster using Docker

In this article, we will look at how you can set up Docker to be used to launch a Multi-node Hadoop cluster inside a…

9 条评论

See all articles

Building a Hadoop Cluster from the Powerful Automation Tool: Ansible

Komal Suthar

Technical Support Engineer@Red Hat | RHCA

Ansible can be used in two ways firstly by using the cli method basically known as Ad-hoc Commands and secondly by creating a Playbook.

Note: If you find difficulty in any of the ad-hoc command then go for the playbook.

Komal Suthar的更多文章

社区洞察

其他会员也浏览了

Hadoop Cluster Revealed

CONFIGURING HADOOP CLUSTER USING ANSIBLE

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

ARTH-Task 11.1 Config Hadoop & Start Service vis Ansible

Riding Hadoop on Docker - Running Hadoop in Pseudo distributed mode on Docker

Hadoop Automation Using Ansible

ARTH: Hadoop Automation using Ansible

Hadoop Overview:

Integration of LVM with Hadoop

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy

Ansible can be used in two ways firstly by using the cli method basically known as Ad-hoc Commands and secondly by creating a Playbook.

Note: If you find difficulty in any of the ad-hoc command then go for the playbook.

Komal Suthar的更多文章

Mastering Kubernetes: Step-by-Step Guide to Setting Up Your Cluster on AWS Cloud

Openshift - Container Platform

MongoDB: The NoSQL Non-Relational Database

What Is Jenkins? Why To Use It? How Netflix Use This Tool?

Dynamic provisioning of Jenkins slave node on AWS Cloud using Docker

What is AWS SQS – Benefits, Queue & Function

Case-study on Azure Kubernetes Service (AKS) - Finastra

How one of the pioneers - Google is using Neural Networks (NN)

Kubernetes Cluster on AWS ec2 instances: Ansible

Setup a Multi-Node Hadoop Cluster using Docker

社区洞察

其他会员也浏览了

Hadoop Cluster Revealed

CONFIGURING HADOOP CLUSTER USING ANSIBLE

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

ARTH-Task 11.1 Config Hadoop & Start Service vis Ansible

Riding Hadoop on Docker - Running Hadoop in Pseudo distributed mode on Docker

Hadoop Automation Using Ansible

ARTH: Hadoop Automation using Ansible

Hadoop Overview:

Integration of LVM with Hadoop

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy