登录查看更多内容

Automating Hadoop Using Ansible

Surayya Shaikh

1x RedHat Certified | ARTH LEARNER | RHCE | Kubernetes | DevOps | Docker | Linux | Python | AWS

发布日期: 2020年12月18日

Hello everyone, Back with another article. In this you will find how we can automate hadoop using the linux automation tool i.e Redhat Ansible on the top of AWS...

What is hadoop ?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

What is Hadoop Cluster ?

A Hadoop cluster is a collection of computers, known as nodes, that are networked together to perform parallel computations on big data sets.

What is Namenode ?

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. The NameNode is a Single Point of Failure for the HDFS Cluster

What is Datanode ?

A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them. It then responds to requests from the NameNode for filesystem operations. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

What is Ansible ?

Ansible is a software tool that provides simple but powerful automation for cross-platform computer support. It is primarily intended for IT professionals, who use it for application deployment, updates on workstations and servers, cloud provisioning, configuration management, intra-service orchestration, and nearly anything a systems administrator does on a weekly or daily basis. Ansible doesn't depend on agent software and has no additional security infrastructure, so it's easy to deploy.

How Ansible works

In Ansible, there are two categories of computers: the control node and managed nodes. The control node is a computer that runs Ansible. There must be at least one control node, although a backup control node may also exist. A managed node is any device being managed by the control node.

Ansible works by connecting to nodes (clients, servers, or whatever you're configuring) on a network, and then sending a small program called an Ansible module to that node. Ansible executes these modules over SSH and removes them when finished. The only requirement for this interaction is that your Ansible control node has login access to the managed nodes. SSH Keys are the most common way to provide access, but other forms of authentication are also supported.

Ansible playbooks

While modules provide the means of accomplishing a task, the way you use them is through an Ansible playbook. A playbook is a configuration file written in YAML that provides instructions for what needs to be done in order to bring a managed node into the desired state. Playbooks are meant to be simple, human-readable, and self-documenting. They are also idempotent, meaning that a playbook can be run on a system at any time without having a negative effect upon it. If a playbook is run on a system that's already properly configured and in its desired state, then that system should still be properly configured after a playbook runs.

So, let’s carry out this practical??

To carry out the above task, I have installed two different ec2 Virtual Machines on the top of Oracle AWS cloud. and my controller node is on virtual box...

Here is the configuration file of the ansible:

Here is the inventory file of the ansible:

Configuring hdfs-site

Instead of editing the existing files through the playbook, the more efficient way is to copy the file from the controller, by making the changes according to the need. Create the hdfs-site.xml file in the controller and edit the syntax as follows for Jinja Parsing:

Here the node and hdfs_dir are the variables we had created main playbook. They will help us in editing the files for both uses - master and slave nodes.

We use the template module to copy the files so that the file will get parsed by Ansible during the task execution.

Configuring core-site

Similarly we configure the core-site.xml file as follows...

Here is the complete playbook for configuring the target nodes as namenode and datanode:

hadoop.yml

- name: "Namenode configuration"
  hosts: namenode

  vars_prompt:

  - name: "hdfs_dir"

    prompt: "Enter Namenode Directory"

    private: no


  - name: "node"

    prompt: "Enter node"

    private: no
 

  - name: "ip_addr"

    prompt: "Enter the Ip Address"

    private: no
 

  tasks:

  - name: "Copying JDK"

    copy:

       src: "/root/jdk-8u171-linux-x64.rpm"

       dest: /home/ec2-user/

    register: jdk
 

  - name: "Copying Hadoop"

    copy:

       src: "/root/hadoop-1.2.1-1.x86_64.rpm"

       dest: /home/ec2-user/

    register: hadoop

 
  - name: "Installing JDK"

    yum:

       name: "/home/ec2-user/jdk-8u171-linux-x64.rpm"

       state: present

    when: jdk.failed==false

    register: ijdk

 

  - name: "Installing Hadoop"

    command: "rpm -i /home/ec2-user/hadoop-1.2.1-1.x86_64.rpm --force"

    when: hadoop.failed=false

    register: ihadoop

    when: ijdk.failed==false

 

  - name: "Deleting Directory"

    shell: "rm -rf  {{ hdfs_dir }}"

    ignore_errors: yes

 

  - name: "Creating directory"

    file:

       state: directory

       path: "{{ hdfs_dir }}"

 

  - name: "Configuring hdfs-site"

    template:

       src: "/AnsibleWS/hdfs-site.xml"

       dest: "/etc/hadoop/hdfs-site.xml"

    when: ihadoop.failed==false

 

  - name: "Configuring core-site"

    template:

       src: "/AnsibleWS/core-site.xml"

       dest: "/etc/hadoop/core-site.xml"

    when: ihadoop.failed==false

 

  - name: "Formatting the Namenode"

    shell: "echo Y | hadoop namenode -format"

    register: format

 

  - debug:

       var: format.stdout

 

  - name: "stopping the namenode"

    command: hadoop-daemon.sh stop namenode

    ignore_errors: yes

 

  - name: "starting the namenode server"

    command: hadoop-daemon.sh start namenode

    when: format.failed==false

    register: startnn

 

  - debug:

       var: startnn.stdout

 

  - name: "checking status"

    command: jps

    register: jps

    when: format.failed==false and startnn.failed==false

 

  - debug:

       var: jps

 

 ############################################################

 
- name: "Datanode configuration"

  hosts: datanode

  vars_prompt:

  - name: "hdfs_dir"

    prompt: "Enter Datanode Directory"

    private: no

 

  - name: "node"

    prompt: "Enter node"

    private: no

 

  - name: "ip_addr"

    prompt: "Enter the namenode Ip Address"

    private: no

 

  tasks:

  - name: "Copying JDK"

    copy:

       src: "/root/jdk-8u171-linux-x64.rpm"

       dest: /home/ec2-user/

    register: jdk

 

  - name: "Copying Hadoop"

    copy:

       src: "/root/hadoop-1.2.1-1.x86_64.rpm"

       dest: /home/ec2-user/

    register: hadoop

 

  - name: "Installing JDK"

    yum:

       name: "/home/ec2-user/jdk-8u171-linux-x64.rpm"

       state: present

    when: jdk.failed==false

    register: ijdk

 

  - name: "Deleting Directory"

    shell: "rm -rf  {{ hdfs_dir }}"

    ignore_errors: yes

 

  - name: "Creating directory"

    file:

       state: directory

       path: "{{ hdfs_dir }}"

 

 

  - name: "Installing Hadoop"

    command: "rpm -i /home/ec2-user/hadoop-1.2.1-1.x86_64.rpm --force"

    when: hadoop.failed=false

    register: ihadoop

    when: ijdk.failed==false

 

  - name: "Configuring hdfs-site"

    template:

       src: hdfs-site.xml

       dest: /etc/hadoop/hdfs-site.xml

    when: ihadoop.failed==false

    register: hdfs

 

  - name: "Configuring core-site"

    template:

       src: core-site.xml

       dest: /etc/hadoop/core-site.xml

    when: ihadoop.failed==false

    register: core

 

  - name: "stopping the datanode server"

    command: hadoop-daemon.sh stop datanode

    ignore_errors: yes

 

  - name: "starting the datanode server"

    command: hadoop-daemon.sh start datanode

    when: ihadoop.failed==false

    register: startdn

 

  - debug:

       var: startdn.stdout

 

  - name: "checking status"

    command: jps

    register: jps

    when: ihadoop.failed==false and startdn.failed==false

 

  - debug:

       var: jps.stdout

 

  - name: Pause for 15 seconds to build cache

    pause:

       seconds: 15

 

  - name: "Checking Report"

    shell: "hadoop dfsadmin -report"

    register: report

  - debug:

          var: report.stdout

We now have our playbook configured. Run the playbook for getting the configurations automated -

ansible-playbook hadoop.yml

Now from namenode we can see the cluster report for confirmation:

We can also check it from Hadoop WebUI ...

Thank You ?... Keep Learning Keep Sharing !!!

Have a good day !! ???????

要查看或添加评论，请登录

Surayya Shaikh的更多文章

OSPF Routing Protocol (Use Cases)

2021年8月30日

OSPF Routing Protocol (Use Cases)

OSPF is a Link State protocol that’s considered maybe the most famous protocol among the Interior Gateway Protocol…

4 条评论
k-means clustering and its real use-case

2021年7月16日

k-means clustering and its real use-case

Every Machine Learning engineer wants to achieve accurate predictions with their algorithms. Such learning algorithms…
JAVASCRIPT AND ITS USE CASES

2021年6月26日

JAVASCRIPT AND ITS USE CASES

The collection of prewritten code that has some special working function in JavaScript is known as the JavaScript…
How Cyber Department use Confusion Matrix?

2021年6月6日

How Cyber Department use Confusion Matrix?

Hello Everyone ???????♀? This Article is based on cybercrime cases where we talk about the confusion matrix or its two…
MongoDB-Workshop

2021年5月10日

MongoDB-Workshop

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database…
Industry Use Cases - Kubernetes / Openshift

2021年3月3日

Industry Use Cases - Kubernetes / Openshift

Container adoption is growing rapidly in the enterprise-much faster than expected. — Forbes The speakers during the…
CREATING LOAD BALANCER ON AWS USING ANSIBLE Roles

2021年1月9日

CREATING LOAD BALANCER ON AWS USING ANSIBLE Roles

Hello guys !! Back with another article, In this article, you will find how we can create a load balancer on AWS using…
Industry Expert Practical Demo Session on Ansible & Ansible Tower

2020年12月29日

Industry Expert Practical Demo Session on Ansible & Ansible Tower

Hello Guys ??????? Yesterday on 28dec2020 , Mr. Vimal Daga sir and Mrs.
Haproxy (Load Balancer) Configuration Using Ansible on AWS

2020年12月18日

Haproxy (Load Balancer) Configuration Using Ansible on AWS

Hello guys !! Back with another article, In this article you will find how we can create a load balancer on AWS using…
How ansible helps in solving challenges faced by big industries?

2020年12月1日

How ansible helps in solving challenges faced by big industries?

In this article we will discuss what is ansible in devops and its use cases? If anyone ask you about ansible you can…

See all articles

Automating Hadoop Using Ansible

Surayya Shaikh

1x RedHat Certified | ARTH LEARNER | RHCE | Kubernetes | DevOps | Docker | Linux | Python | AWS

How Ansible works

Ansible playbooks

Surayya Shaikh的更多文章

社区洞察

其他会员也浏览了

Configuration of HDFS Cluster with Ansible

Configure Hadoop and Start Cluster Services Using Ansible Playbook And Restarting HTTPD Service Is Not Idempotence In Nature Using Ansible Playbook

Configure Hadoop and start cluster services using Ansible Playbook

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

Microsoft Azure and Hadoop Ecosystem

Setup a Multi-Node Hadoop Cluster using Docker

How Facebook uses Ansible for Hadoop Setup

ARTH: Hadoop Automation using Ansible

ARTH-Task 11.1 Config Hadoop & Start Service vis Ansible

Hadoop Automation Using Ansible

How Ansible works

Ansible playbooks

Surayya Shaikh的更多文章

OSPF Routing Protocol (Use Cases)

k-means clustering and its real use-case

JAVASCRIPT AND ITS USE CASES

How Cyber Department use Confusion Matrix?

MongoDB-Workshop

Industry Use Cases - Kubernetes / Openshift

CREATING LOAD BALANCER ON AWS USING ANSIBLE Roles

Industry Expert Practical Demo Session on Ansible & Ansible Tower

Haproxy (Load Balancer) Configuration Using Ansible on AWS

How ansible helps in solving challenges faced by big industries?

社区洞察

其他会员也浏览了

Configuration of HDFS Cluster with Ansible

Configure Hadoop and Start Cluster Services Using Ansible Playbook And Restarting HTTPD Service Is Not Idempotence In Nature Using Ansible Playbook

Configure Hadoop and start cluster services using Ansible Playbook

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

Microsoft Azure and Hadoop Ecosystem

Setup a Multi-Node Hadoop Cluster using Docker

How Facebook uses Ansible for Hadoop Setup

ARTH: Hadoop Automation using Ansible

ARTH-Task 11.1 Config Hadoop & Start Service vis Ansible

Hadoop Automation Using Ansible