Configuring Hadoop(NN/DN) via Ansible

Configuring Hadoop(NN/DN) via Ansible

Before getting hands on into any practical implementation its always good to know the terminologies.......

What is Apache Hadoop ?

* Hadoop is a tool made by Apache community to solve big data problems by making use of RAM/CPU from different computers who acts as datanodes and working under the master computer namenode.

*Datanodes facilitate namenode with a particular amount of storage and contribute their storage by sharing them over a network .

*Several Problems like Velocity, Volume and Veracity can be solved by making use of a big data handling tool.

What is Ansible ?

*Ansible is a DevOps tool made by RedHat and using which, we can configure most of configuration that we will ever need into a networking device . It is very important to note that Ansible is only meant for configuration , although it can perform other tasks like provisioning an OS but this feature is introduced only for automating configuration not launch the OS.

*Ansible is a great management tool which works on push mechanisms which Means we don't need any agent to be setup on the behalf of working of Ansible.

*Ansible uses a declarative approach , it means we just have to tell ansible what to do, and how to do is taken care by the smart modules that ansible uses

*Ansible also provides idempotency

Idempotency means ansible doesn't run a code continuous as much as we trigger, first it goes to the slave node, check the state of system and then at last it decides whether to rerun or state achieved.

Why we need Ansible Here ?

The use of Ansible in configuration of Hadoop Cluster is to achieve automation especially for Hadoop being utilized in bigger environments.

Thus a lot of manual task which many a times also leads to errors can be overcome by utilizing automation scripts.

Therefore , considering that the reader have a slight knowledge of both Ansible and Hadoop let's see the Ansible playbook

Assumptions:

*It is assumed that you want to setup your Hadoop cluster with a basic property of dfs.name.dir and dfs.name.dir for both name node and datanode.

*The variables nndir , nnport , dndir, dnport can be changed accordingly.

*The folder named file consists of template that is used for copying the basic layout for files hdfs-site.xml and core-site.xml .

*The playbook is intended for RedHat 8 Linux, maybe utilized in different environments after manipulation.

- hosts: namenode
  vars:
          nndir: "/nn"
          nnport: 9001
  tasks: 
          - name: Making folder for Redhat DVD
            file:
                    path: /dvd
                    state: directory
                    #mode: 0755
          - name: Mounting Redhat DVD
            mount:
                    src: /dev/cdrom
                    path: /dvd
                    fstype: iso9660
                    state: present
          - name: Making Repository for Redhat Disk AppStream
            yum_repository:
                    name: App1
                    description: "Redhat DVD App List 1"
                    baseurl: "file:///dvd/AppStream"
                    file: redhatdvd
                    gpgcheck: no
          - name: Making Repository for Redhat Disk BaseOS
            yum_repository:
                    name: App2
                    description: "Redhat DVD App List 2"
                    baseurl: "file:///dvd/BaseOS"           
                    file: redhatdvd
                    gpgcheck: no
          - name: Installing wget if not Available
            package:
                    name: wget
                    state: present
          - name: Installing Java JDK 
            package:
                    name: jdk
                    state: present
          - name: Downloading Hadoop Software
            command: "wget -c https://archive.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm"
          - name: Installing Hadoop Software
            command: "rpm -i --force hadoop-1.2.1-1.x86_64.rpm "
          - name: Deleting preexisting folder for NameNode if present
            file:
                    path: "{{ nndir }}"
                    state: absent


          - name: Making folder for NameNode
            file:
                    path: "{{ nndir }}"
                    state: directory


          - name: Copying hdfs-site.xml file from Controller Node
            copy:
                    src: files/hdfs-site.xml
                    dest: /etc/hadoop/
          - name: Copying core-site.xml file from Controller Node
            copy:
                    src: files/core-site.xml
                    dest: /etc/hadoop/ 
          - name: Adding dfs.name.dir property
            shell: |
                    echo '<configuration>
                    <property>
                    <name>dfs.name.dir</name>
                    <value>{{ nndir }}</value>
                    </property>
                    </configuration>' >> /etc/hadoop/hdfs-site.xml
          - name: Adding fs.default.name property
            shell: |
                    echo '<configuration>
                    <property>
                    <name>fs.default.name</name>
                    <value>hdfs://{{ ansible_facts['default_ipv4']['address'] }}:{{ nnport }}</value>
                    </property>
                    </configuration>' >> /etc/hadoop/core-site.xml 
          - name: Checking process running 
            command: "pidof /usr/java/default/bin/java"
            register: x
            failed_when: false
            ignore_errors: yes


          - name: Checking NameNode process if running already
            shell: "kill `pidof /usr/java/default/bin/java`"
            when: x.rc == 0 
      
          - name: Formatting the namenode directory
            shell: "echo 'Y' | hadoop namenode -format"


          - name: Starting Namenode

            command: "hadoop-daemon.sh start namenode"

One of importance step in the above tasks is Checking Running java process which maybe due to preinstalled Hadoop Software .Also its been used for making our setup idempotent in nature.

- hosts: datanodes
  vars:
          dndir: "/dn"
          nnip: "{{ groups.namenode[0] }}"
          nnport: 9001
  tasks:
          - name: Making folder for Redhat DVD
            file:
                    path: /dvd
                    state: directory
          - name: Mounting Redhat DVD
            mount:
                    src: /dev/cdrom
                    path: /dvd
                    fstype: iso9660
                    state: present
          - name: Making Repository for Redhat Disk AppStream           
            yum_repository:
                    name: App1       
                    description: "Redhat DVD App List 1"
                    baseurl: "file:///dvd/AppStream"
                    file: redhatdvd
                    gpgcheck: no
          - name: Making Repository for Redhat Disk BaseOS
            yum_repository:
                    name: App2
                    description: "Redhat DVD App List 2"
                    baseurl: "file:///dvd/BaseOS"
                    file: redhatdvd
                    gpgcheck: no
          - name: Installing wget if not Available
            package:
                    name: wget
                    state: present
          - name: Installing Java JDK
            package:
                    name: jdk
                    state: present
          - name: Downloading Hadoop Software
            command: "wget -c https://archive.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm"
          - name: Installing Hadoop Software
            command: "rpm -i --force hadoop-1.2.1-1.x86_64.rpm "
          - name: Deleting preexisting folder for DataNode if present
            file:
                    path: "{{ dndir }}"
                    state: absent
          - name: Making folder for DataNode
            file:
                    path: "{{ dndir }}"
                    state: directory
          - name: Copying hdfs-site.xml file from Controller Node
            copy:
                    src: files/hdfs-site.xml
                    dest: /etc/hadoop/
          - name: Copying core-site.xml file from Controller Node
            copy:
                    src: files/core-site.xml
                    dest: /etc/hadoop/ 
          - name: Adding dfs.data.dir property
            shell: |
                    echo '<configuration>
                    <property>
                    <name>dfs.data.dir</name>
                    <value>{{ dndir }}</value>
                    </property>
                    </configuration>' >> /etc/hadoop/hdfs-site.xml
          - name: Adding fs.default.name property
            shell: |
                    echo '<configuration>
                    <property>
                    <name>fs.default.name</name>
                    <value>hdfs://{{ nnip }}:{{ nnport }}</value>
                    </property>
                    </configuration>' >> /etc/hadoop/core-site.xml 
          - name: Checking DataNode process if running already
            shell: "pidof /usr/java/default/bin/java"
            register: x
            failed_when: false
            ignore_errors: yes


          - name: Killing running previously process                     
            shell: "kill `pidof /usr/java/default/bin/java`"
            when: x.rc == 0


          - name: Starting DataNode
            command: "hadoop-daemon.sh start datanode"

Time to apply the playbook

For running the playbook , first we have to make an inventory file and put the IPs of namenode and datanodes into the respective groups

No alt text provided for this image
No alt text provided for this image

*Note: It is not recommended to use root user

No alt text provided for this image

Its always good practice to check connection to managed nodes before running ansible commands and so finally lets run our playbook

No alt text provided for this image

The below snap shows the running playbook

No alt text provided for this image
No alt text provided for this image

Finally ? ,We can use the Hadoop health site to see the connected nodes as well as the last contact time .

No alt text provided for this image

So that's the start after so many days getting back to work and so let us continue and keep on learning day by day ??.

No alt text provided for this image

要查看或添加评论,请登录

Nishant Singh的更多文章

  • Configuring Hive with HDFS & MapReduce Cluster backend

    Configuring Hive with HDFS & MapReduce Cluster backend

    Hello to the reader , hope you are all doing great. Now that you are here, Lets just start it already ??.

  • Why handlers are used in Ansible?

    Why handlers are used in Ansible?

    Handlers are the tasks which gets triggered when some changes are made to a particular task. This solves a very…

  • Setting up AWS CDN with AWS CLI

    Setting up AWS CDN with AWS CLI

    Content Delivery Networks is one of the best utilization of a company's own private network across the globe. A company…

  • Play with IPs , IPv4 in particular

    Play with IPs , IPv4 in particular

    This article is an interesting one at least for me, Although it takes time to understand networking concepts since I…

  • A Session with two experts

    A Session with two experts

    The session was started by Mr. Arun Eapen with the explanation of what automation is and specially why we need it,So…

  • Configuring HAProxy-LB with Ansible

    Configuring HAProxy-LB with Ansible

    As I always say its always better to have a look onto the basic technical terms to get started, and so lets see what we…

  • Getting started with AWS CLI....

    Getting started with AWS CLI....

    This is an small article on explanation of getting started with Command line interface with some easy and helpful…

  • Ubisoft got enhanced with AWS

    Ubisoft got enhanced with AWS

    Normally Every Gaming company demands some big infrastructure with good quality CPU, RAM for the game development and…

  • Hybrid Cloud Setup: K8s and RDS

    Hybrid Cloud Setup: K8s and RDS

    A great setup to learn the intrication of the two different cloud platforms working together with the help of terraform…

  • Big Data Problems ........

    Big Data Problems ........

    Numbers are increasing day by day call it no. of people , no.

社区洞察

其他会员也浏览了