登录查看更多内容

Automating Hadoop Using Ansible

Amit Sharma

CKA || 1xAWS || 4xGCP || 1xAzure || 2xRedHat Certified || DevOps Engineer [???????]@Searce Inc || Freelancer || Terraform || Ansible || GitLab || Jenkins || Kubernetes || Docker || Openshift || AWS || GCP || Azure

发布日期: 2020年11月29日

Hello Guys, Back with another article. In this you will find how we can automate hadoop using the linux automation tool i.e Ansible.

So, for installing ansible kindly visit on the below link.

What is hadoop ?

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

What is Hadoop Cluster ?

A Hadoop cluster is a collection of computers, known as nodes, that are networked together to perform these kinds of parallel computations on big data sets.

What is Namenode ?

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. ... The NameNode is a Single Point of Failure for the HDFS Cluster

What is Datanode ?

A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them. ... It then responds to requests from the NameNode for filesystem operations. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

What is Client Node ?

Client nodes are in charge of loading the data into the cluster. Client nodes first submit MapReduce jobs describing how data needs to be processed, and then fetch the results once the processing is finished.

What is Ansible ?

Ansible is a software tool that provides simple but powerful automation for cross-platform computer support. It is primarily intended for IT professionals, who use it for application deployment, updates on workstations and servers, cloud provisioning, configuration management, intra-service orchestration, and nearly anything a systems administrator does on a weekly or daily basis. Ansible doesn't depend on agent software and has no additional security infrastructure, so it's easy to deploy.

How Ansible works

In Ansible, there are two categories of computers: the control node and managed nodes. The control node is a computer that runs Ansible. There must be at least one control node, although a backup control node may also exist. A managed node is any device being managed by the control node.

Ansible works by connecting to nodes (clients, servers, or whatever you're configuring) on a network, and then sending a small program called an Ansible module to that node. Ansible executes these modules over SSH and removes them when finished. The only requirement for this interaction is that your Ansible control node has login access to the managed nodes. SSH Keys are the most common way to provide access, but other forms of authentication are also supported.

Ansible playbooks

While modules provide the means of accomplishing a task, the way you use them is through an Ansible playbook. A playbook is a configuration file written in YAML that provides instructions for what needs to be done in order to bring a managed node into the desired state. Playbooks are meant to be simple, human-readable, and self-documenting. They are also idempotent, meaning that a playbook can be run on a system at any time without having a negative effect upon it. If a playbook is run on a system that's already properly configured and in its desired state, then that system should still be properly configured after a playbook runs.

Modules in Ansible

Modules (also referred to as “task plugins” or “library plugins”) are discrete units of code that can be used from the command line or in a playbook task. Ansible executes each module, usually on the remote managed node, and collects return values

Here in this article you will find the complete end to end automation of hadoop hdfs cluster using ansible.

So, you need to write the playbooks of respective configuration on the target node. we need to first configure the namenode and start it. The playbook of namenode is written below.

Variables in Ansible:

Ansible uses variables to manage differences between systems. With Ansible, you can execute tasks and playbooks on multiple different systems with a single command. ... You can define these variables in your playbooks, in your inventory, in re-usable files or roles, or at the command line.

Ansible Installation In Below Slides:

Name Node Variables:

namenode variables ????


hadoop_path: "/root/hadoop-1.2.1-1.x86_64.rpm"

jdk_path: "/root/jdk-8u171-linux-x64.rpm"

hadoop_software: "hadoop-1.2.1-1.x86_64.rpm"

jdk_software: "jdk-8u171-linux-x64.rpm"

core_site: "/root/namenode_files/core-site.xml"

hdfs_site: "/root/namenode_files/hdfs-site.xml"

directory_path: "/nn"

start_namenode: "hadoop-daemon.sh start namenode"

run_jps: "jps"

directory_delete: "rm -rf /nn"

stop_namenode: "hadoop-daemon.sh stop namenode"

hadoop_report: "hadoop dfsadmin -report"

Now the playbook of the namenode is given below with respect variable.

NameNode Playbook:

- hosts: namenode
  vars_files:
     - namenode_var.yml
  tasks:
  - name: "Copying the hadoop File"
    copy:
     src: "{{ hadoop_path }}"
     dest: "/root/"
  - name: "Copying the JDK File"
    copy:
     src: "{{ hadoop_software }}"
     dest: "/root/"
  - name: "Installing Jdk"
    shell: "rpm -ivh {{ jdk_software }}"
    register: Java
    ignore_errors: yes
  - name: "Java Installation"
    debug:
      var: Java.stdout
  - name: "Installing Hadoop"
    shell: "rpm -ivh {{ hadoop_software }}  --force"
    register: Hadoop
    ignore_errors: yes
  - name: "Hadop Installation"
    debug:
      var: Hadoop.stdout
  - name: "Copying the core-site.xml file"
    copy:
      src: "{{ core_site }}"
      dest: "/etc/hadoop/"
  - name: "Copying the hdfs-site.xml file"
    copy:
      src: "{{ hdfs_site }}"
      dest: "/etc/hadoop/"
  - name: "Deleting the directory"
    shell: "{{ directory_delete }}"
    ignore_errors: yes
  - name: "Creating a directory"
    file:
      state: directory
      path: "{{ directory_path }}"
  - name: "Formatting the directory"
    shell: "echo Y |  hadoop namenode -format"
    register: format
  - name: "Formating NameNode"
    debug:
      var: format.stdout
  - name: "Stoping the namenode"
    shell: "{{ stop_namenode }}"
    ignore_errors: yes
    register: hadoop_stopped
  - name: "Stopping hadoop"
    debug:
     var: hadoop_stopped.stdout


  - name: "Starting the namenode"
    shell: "{{ start_namenode }}"
    ignore_errors: yes
    register: hadoop_started
  - name: "Started hadoop"
    debug:
     var: hadoop_started.stdout
  - name: "Java Process"
    shell: "{{ run_jps }}"
    register: jps
  - name: "Java Process"
    debug:
     var: jps.stdout

the command to run the namenode playbook is given below.

 ansible-playbook   namenode.yml

Running Namenode Playbook

DataNode Variables:

datanode variables ????

hadoop_path: "/root/hadoop-1.2.1-1.x86_64.rpm"

jdk_path: "/root/jdk-8u171-linux-x64.rpm"

hadoop_software: "hadoop-1.2.1-1.x86_64.rpm"

jdk_software: "jdk-8u171-linux-x64.rpm"

core_site: "/root/datanode_files/core-site.xml"

hdfs_site: "/root/datanode_files/hdfs-site.xml"

directory_path: "/dn1"

start_datanode: "hadoop-daemon.sh start datanode"

run_jps: "jps"

directory_delete: "rm -rf /dn1"

stop_datanode: "hadoop-daemon.sh stop datanode"

hadoop_report: "hadoop dfsadmin -report"

Now the playbook of the datanode is given below with respect variable.

Datanode Playbook:

- hosts: datanode
  vars_files:
     - datanode_var.yml
  tasks:
  - name: "Copying the hadoop File"
    copy:
     src: "{{ hadoop_path }}"
     dest: "/root/"
  - name: "Copying the JDK File"
    copy:
     src: "{{ hadoop_software }}"
     dest: "/root/"
  - name: "Installing Jdk"
    shell: "rpm -ivh {{ jdk_software }}"
    register: Java
    ignore_errors: yes
  - name: "Java Installation"
    debug:
      var: Java.stdout
  - name: "Installing Hadoop"
    shell: "rpm -ivh {{ hadoop_software }}  --force"
    register: Hadoop
    ignore_errors: yes
  - name: "Hadop Installation"
    debug:
      var: Hadoop.stdout
  - name: "Copying the core-site.xml file"
    copy:
      src: "{{ core_site }}"
      dest: "/etc/hadoop/"
  - name: "Copying the hdfs-site.xml file"
    copy:
      src: "{{ hdfs_site }}"
      dest: "/etc/hadoop/"
  - name: "Deleting the directory"
    shell: "{{ directory_delete }}"
    ignore_errors: yes
  - name: "Creating a directory"
    file:
      state: directory
      path: "{{ directory_path }}"
  - name: "Formatting the directory"
    shell: "echo Y |  hadoop namenode -format"
    ignore_errors: yes
    register: format
  - name: "Formating NameNode"
    debug:
      var: format.stdout
  - name: "Stoping the namenode"
    shell: "{{ stop_datanode }}"
    ignore_errors: yes
    register: hadoop_stopped
  - name: "Stopping hadoop"
    debug:
     var: hadoop_stopped.stdout


  - name: "Starting the datanode"
    shell: "{{ start_datanode }}"
    ignore_errors: yes
    register: hadoop_started
  - name: "Started hadoop"
    debug:
     var: hadoop_started.stdout
  - name: "Java Process"
    shell: "{{ run_jps }}"
    register: jps
  - name: "Java Process"
    debug:
     var: jps.stdout
  - name: "Running Hadoop Report"
    shell: "{{ hadoop_report }}"
    register: hadoop_report
  - name: "Showing Hadoop Report"
    debug:
     var: hadoop_report.stdout

the command to run the Datanode playbook is given below.

 ansible-playbook   datanode.yml

Running Datanode Playbook

ClientNode Variables:

clientnode variables ????

hadoop_path: "/root/hadoop-1.2.1-1.x86_64.rpm"

jdk_path: "/root/jdk-8u171-linux-x64.rpm"

hadoop_software: "hadoop-1.2.1-1.x86_64.rpm"

jdk_software: "jdk-8u171-linux-x64.rpm"

core_site: "/root/client_files/core-site.xml"

hadoop_report: "hadoop dfsadmin -report"

client_report: "hadoop fs -ls  /"

file_name: "file.txt"

put_file: "hadoop fs -put  /root/{{ file_name }}   / "

client_file_src: "/root/client_files/{{ file_name }}"

remove_file: "hadoop fs -rm /{{ file_name }}"

client_file_dest: "/root/"

Now the playbook of the clientnode is given below with respect variable.

ClientNode Playbook:

- hosts: client
  vars_files:
     - client_var.yml
  tasks:
  - name: "Copying the hadoop File"
    copy:
     src: "{{ hadoop_software }}"
     dest: "/root/"
  - name: "Copying the JDK File"
    copy:
     src: "{{ jdk_path }}"
     dest: "/root/"
  - name: "Installing Jdk"
    shell: "rpm -ivh {{ jdk_software }}"
    ignore_errors: yes
    register: Java
    ignore_errors: yes
  - name: "Java Installation"
    debug:
      var: Java.stdout
  - name: "Installing Hadoop"
    shell: "rpm -ivh {{ hadoop_software }}  --force"
    register: Hadoop
    ignore_errors: yes
  - name: "Hadop Installation"
    debug:
      var: Hadoop.stdout
  - name: "Copying the core-site.xml file"
    copy:
      src: "{{ core_site }}"
      dest: "/etc/hadoop/"
  - name: "Files Avalable"
    shell: "{{ client_report }}"
    register: files
  - name: "Showing Files"
    debug:
     var: files
  - name: "Deleting Previous Files"
    shell: "{{ remove_file }}"
    ignore_errors: yes
  - name: "Copying the files to client node"
    copy:
      src: "{{ client_file_src }}"
      dest: "{{ client_file_dest }}"
  - name: "Uploading the Files by client"
    shell: "{{ put_file }}"
  - name: "Files Avalable"
    shell: "{{ client_report }}"
    register: files
  - name: "Showing Files"
    debug:
     var: files
  - name: "Running Hadoop Report"
    shell: "{{ hadoop_report }}"
    register: hadoop_report
  - name: "Shoowing Hadoop Report"
    debug:
     var: hadoop_report.stdout

the command to run the Clientnode playbook is given below.

 ansible-playbook   clientnode.yml

Running Clinetnode Playbook

As the Client Node ran successfully, now we need to see the dashboard that file that client wanted to send is sended successfully or not.

As you can see the file has been sended in the hdfs cluster.

Ansible Playbook Code link: ????

Adarsh Kumar

Cloud Technical Solutions Engineer @ Google

4 年

Great work dude ??

Divyanshu Sharma

4 年

? ??

Ishika Mandloi

ML | DL | DevOps | Linux | AWS | Python |

4 年

Great work Amit Sharma ??????

1 次回应

Rajit Paul

Learner ? DevSecOps @Isha Foundation ? AWS Community Builder ? CKA, AWS & RedHat Certified

4 年

Great Job ? ?? Amit Sharma

1 次回应

Suyog Shinde

Building Resilient Systems | Cloud & DevOps Engineer

4 年

Very useful Amit Sharma

1 次回应

查看更多评论

要查看或添加评论，请登录

Amit Sharma的更多文章

CREATING LOAD BALANCER ON AWS USING ANSIBLE

2020年12月7日

CREATING LOAD BALANCER ON AWS USING ANSIBLE

4 条评论
CONFIGURING APACHE WEBSERVER USING ANSIBLE

2020年11月30日

CONFIGURING APACHE WEBSERVER USING ANSIBLE

Hello Guys, Back with another article. In this you will find how we can automate Apache WebServer using the linux…
APACHE WEBSERVER CONFIGURATION IN DOCKER ?? USING ANSIBLE

2020年11月25日

APACHE WEBSERVER CONFIGURATION IN DOCKER ?? USING ANSIBLE

?Task 1 - Ansible? STEPS TO DO THIS TASK : ?? Configure Docker ?? Start and enable Docker services ?? Pull the httpd…

7 条评论
Using AWS CLI to host and deliver web content in a snap

2020年11月13日

Using AWS CLI to host and deliver web content in a snap

5 条评论
Hadoop | Static Partition | LVM

2020年10月29日

Hadoop | Static Partition | LVM

Hey Guys back with another article !! In this Article you will find how we can perform how we can Static Partition and…

11 条评论
Creating High Availability Architecture with AWS CLI | CloudFront | S3 | EBS

2020年10月26日

Creating High Availability Architecture with AWS CLI | CloudFront | S3 | EBS

What is AWS ? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering…

14 条评论
HYBRID MULTI CLOUD TASK 6 | | Deploy the word-press application on K8S and AWS using Terraform

2020年10月7日

HYBRID MULTI CLOUD TASK 6 | | Deploy the word-press application on K8S and AWS using Terraform

Hello Guys, back with another article. In this article you will find a word press application has been launched using…

2 条评论
What Technologies does Facebook , Netflix, Amazon, Starbucks, Youtube uses to store its Big Data !!

2020年9月16日

What Technologies does Facebook , Netflix, Amazon, Starbucks, Youtube uses to store its Big Data !!

Companies such as Facebook, Netflix, Amazon, Starbucks, Youtube have to manage an enormous amount of data on a daily…
ANSIBLE TASK 3 CREATING LOAD BALANCER ON AWS USING ANSIBLE

2020年8月29日

ANSIBLE TASK 3 CREATING LOAD BALANCER ON AWS USING ANSIBLE

Hello guys !! Back with another article, In this article you will find how we can create a load balancer on AWS using…

6 条评论
DEVOPS TASK 3 JENKINS INTEGRATION WITH KUBERNETES

2020年8月24日

DEVOPS TASK 3 JENKINS INTEGRATION WITH KUBERNETES

Hello Guys back with another article. In this article you will find jenkins intergration with Kubernetes.

2 条评论

See all articles

Automating Hadoop Using Ansible

Amit Sharma

CKA || 1xAWS || 4xGCP || 1xAzure || 2xRedHat Certified || DevOps Engineer [???????]@Searce Inc || Freelancer || Terraform || Ansible || GitLab || Jenkins || Kubernetes || Docker || Openshift || AWS || GCP || Azure

How Ansible works

Ansible playbooks

Modules in Ansible

Variables in Ansible:

Ansible Installation In Below Slides:

Running Namenode Playbook

DataNode Variables:

Datanode Playbook:

Running Datanode Playbook

ClientNode Variables:

ClientNode Playbook:

Running Clinetnode Playbook

Amit Sharma的更多文章

社区洞察

其他会员也浏览了

Hadoop And Apache SparK: Which Is Suitable for Your Domain of Work?

Introduction to Hadoop

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Understanding Hadoop: The Backbone of Big Data Processing

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy

Apache Hadoop vs Apache Spark

Unlocking the Power of Apache Hadoop: How Companies Are Leveraging Big Data Analytics

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

Apache Spark and Hadoop's Ecosystem

Microsoft Azure and Hadoop Ecosystem

How Ansible works

Ansible playbooks

Modules in Ansible

Variables in Ansible:

Ansible Installation In Below Slides:

Running Namenode Playbook

DataNode Variables:

Datanode Playbook:

Running Datanode Playbook

ClientNode Variables:

ClientNode Playbook:

Running Clinetnode Playbook

Amit Sharma的更多文章

CREATING LOAD BALANCER ON AWS USING ANSIBLE

CONFIGURING APACHE WEBSERVER USING ANSIBLE

APACHE WEBSERVER CONFIGURATION IN DOCKER ?? USING ANSIBLE

Using AWS CLI to host and deliver web content in a snap

Hadoop | Static Partition | LVM

Creating High Availability Architecture with AWS CLI | CloudFront | S3 | EBS

HYBRID MULTI CLOUD TASK 6 | | Deploy the word-press application on K8S and AWS using Terraform

What Technologies does Facebook , Netflix, Amazon, Starbucks, Youtube uses to store its Big Data !!

ANSIBLE TASK 3 CREATING LOAD BALANCER ON AWS USING ANSIBLE

DEVOPS TASK 3 JENKINS INTEGRATION WITH KUBERNETES

社区洞察

其他会员也浏览了

Hadoop And Apache SparK: Which Is Suitable for Your Domain of Work?

Introduction to Hadoop

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Understanding Hadoop: The Backbone of Big Data Processing

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy

Apache Hadoop vs Apache Spark

Unlocking the Power of Apache Hadoop: How Companies Are Leveraging Big Data Analytics

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

Apache Spark and Hadoop's Ecosystem

Microsoft Azure and Hadoop Ecosystem