Configuring a multinode Hadoop Cluster over AWS using : Ansible

Deepak Kumar Pandia

Data Engineer @ Altimetrik

发布日期: 2021年2月19日

Hello everyone......

Here is my new article, in this article we are going to see how we can built a Multinode Hadoop cluster on AWS with the help of Ansible.

PROBLEM : Sometimes its quite difficult and time consuming to launch multiple instances and perform all the steps manually to setup a Hadoop cluster. and it becomes more time consuming if we have requirement to setup a cluster with multiple datanodes e.g. 100

So to overcome this issue i used Ansible and written an ansible playbook to configure the whole cluster in a single go, for this playbook i created three ROLES. One for Namenode and one for datanode and one for client and each Role will perform all the steps required to setup a Multinode Hadoop Cluster.

What is ROLES?

In Ansible, the role is the primary mechanism for breaking a playbook into multiple files. This simplifies writing complex playbooks, and it makes them easier to reuse.

now here is the playbook for whole setup:

- hosts: "localhost"
  vars_files:
          - "secret.yml"
  vars_prompt:
          - name: "num_item"
            prompt: "enter the number of Datanodes"
            private: no 
  tasks:
          - name: "provisioning the nodes"
            include_tasks: "/root/arth_ws/ansible/test2.yml"
            loop:
                    - { name: 'Datanode', num: "{{ num_item }}" }
                    - { name: 'Namenode', num: 1 }
                    - { name: 'Client', num: 1 }
          - name: "waiting for the instances to ready for the use "
            pause:
                  seconds: 60
          - name: "refreshing the inventory"
            meta: "refresh_inventory"
- hosts: "tag_Name_Namenode"
  vars_prompt:
          - name: "dr"
            prompt: "enter the namenode's directory name"
            private: no


  roles:
          - role: "hadoop"
- hosts: "tag_Name_Datanode"
  roles:
          - role: "hadoopslave"
- hosts: "tag_Name_Client"
  roles:

          - role: "hadoopclient"

as you can see in this above playbook i attached a secret file , this file contains the credential of my AWS account in encrypted form and for this i used concept of vault in Ansible. This playbook will prompt us three times, for vault password, number of datanodes and name of the Namenode's directory.

this playbook contains three roles:

hadoop = used for Namenode
hadoopslave = used for Datanodes
hadoopclient = used for Client

This playbook also includes a task test2.yml, this task will launch the instances on AWS and here i used jinja templates to make the playbook more dynamic, now when we run this task it will launch one Namenode and one Client and number of Datanodes which we will provide to the playbook

Now here is test2.yml file :

- name: "provisioning the instances "
  ec2:
          zone: "ap-south-1a"
          assign_public_ip: "yes"
          ec2_access_key: "{{ u }}"
          ec2_secret_key: "{{ p }}"
          count: "{{ item.num }}"
          image: "ami-04b1ddd35fd71475a"
          instance_tags:
                  Name: "{{ item.name }}" 
          instance_type: "t2.micro"
          key_name: "hadoop_slave_key"
          region: "ap-south-1"
          state: "present"
          wait: yes

          vpc_subnet_id: "subnet-678b820f"

now every instance will tagged with some tag_Name and ansible will use that tag_Name to perform the further tasks and for this we have to use dynamic inventory, it will helps to perform the tasks dynamically.

Now we have to create three ansible roles with this command for different-different use cases :

ansible-galaxy init <role_name>

these roles will contains some folders and inside these folders we have to give tasks,templates, variables etc.

here is all the role's tasks:

Hadoop: this role will configure all the instances with tag_Name_Namenode as Namenode , this role will also check if the namenode is already formatted than it will not format the namenode again

# tasks file for hadoop master


- name: "copying the jdk file "
  copy:
          src: "/root/jdk-8u171-linux-x64.rpm"
          dest: "/root"
- name: "copying the hadoop file"
  copy:
          src: "/root/hadoop-1.2.1-1.x86_64.rpm"
          dest: "/root"
- name: "checking the java file is already installed or not "
  shell: "java -version"
  ignore_errors: yes
  register: C
- name: "installing jdk file"
  shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
  ignore_errors: yes
  when: "C.rc != 0"
- name: "checking the hadoop software is installed or not"
  shell: "hadoop version"
  ignore_errors: yes
  register: D
- name: "installing hadoop file"
  shell:  "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force"
  when: "D.rc != 0"
- debug:
        msg: "{{ dr }}"
- name: "creating a new directory for namenode "
  file:
          state: directory
          path: "/{{ dr }}"
- name: "updating the hdfs conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/hdfs-site.xml"
          block: |
                  <property>
                  <name>dfs.name.dir</name>
                  <value>/{{ dr }}</value>
                  </property>  
          insertafter: "<configuration>"
- name: "updating core conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/core-site.xml"
          block: |
                  <property>
                  <name>fs.default.name</name>
                  <value>hdfs://0.0.0.0:9001</value>
                  </property>
          insertbefore: "</configuration>" 
- name: "checking if the namenode is already formatted"
  stat:
          path: "/{{ dr }}/current/VERSION"
  register: check
- name: "formating the namenode"
  shell: "echo Y | hadoop namenode -format"
  ignore_errors: yes
  when: "check.stat.exists == false"
- name: "starting the namenode"
  command: "hadoop-daemon.sh start namenode"
  ignore_errors: yes
  register: X 
- debug:
        msg: "namenode is already running"
  when: X.rc != 0
- name: "showing the report of namenode"
  shell: "hadoop dfsadmin -report"
  register: E
- debug:

        var: E

2. hadoopslave: this role will configure all the instances with tag_Name_Datanode as datanode.

# tasks file for hadoopslave
- name: "copying the jdk file "
  copy:
          src: "/root/jdk-8u171-linux-x64.rpm"
          dest: "/root"
- name: "copying the hadoop file"
  copy:
          src: "/root/hadoop-1.2.1-1.x86_64.rpm"
          dest: "/root"
- name: "checking the java file is already installed or not "
  shell: "java -version"
  ignore_errors: yes
  register: C
- debug:
        msg: "jdk is already installed"
  when: "C.rc == 0"
- name: "installing jdk file"
  shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
  ignore_errors: yes
  when: "C.rc != 0"
- name: "checking the hadoop software is installed or not"
  shell: "hadoop version"
  ignore_errors: yes
  register: D
- debug:
        msg: "hadoop is already installed"
  when: "D.rc == 0"
- name: "installing hadoop file"
  shell:  "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force"
  when: "D.rc != 0"
- name: "creating a new directory for datanode "
  file:
          state: directory
          path: "/DN1"
- name: "updating the hdfs conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/hdfs-site.xml"
          block: |
                  <property>
                  <name>dfs.data.dir</name>
                  <value>/DN1</value>
                  </property>  
          insertafter: "<configuration>"
- name: "updating core conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/core-site.xml"
          block: |
                  <property>
                  <name>fs.default.name</name>
                  {% for ip in groups['tag_Name_Namenode'] %}
                  <value>hdfs://{{ ip }}:9001</value>
                  {% endfor %}
                  </property>
          insertbefore: "</configuration>" 
- name: "starting the datanode"
  command: "hadoop-daemon.sh start datanode"
  ignore_errors: yes
  register: X 
- debug:
        msg: "datanode is already running"
  when: X.rc != 0
- name: "showing the report of cluster"
  shell: "hadoop dfsadmin -report"
  register: E
- debug:

        var: E

3. hadoopclient: this role will configure the instances as client

# tasks file for hadoopslave


- name: "copying the jdk file "
  copy:
          src: "/root/jdk-8u171-linux-x64.rpm"
          dest: "/root"
- name: "copying the hadoop file"
  copy:
          src: "/root/hadoop-1.2.1-1.x86_64.rpm"
          dest: "/root"
- name: "checking the java file is already installed or not "
  shell: "java -version"
  ignore_errors: yes
  register: C
- debug:
        msg: "jdk is already installed"
  when: "C.rc == 0"
- name: "installing jdk file"
  shell: "rpm -i /root/jdk-8u171-linux-x64.rpm"
  ignore_errors: yes
  when: "C.rc != 0"
- name: "checking the hadoop software is installed or not"
  shell: "hadoop version"
  ignore_errors: yes
  register: D
- debug:
        msg: "hadoop is already installed"
  when: "D.rc == 0"
- name: "installing hadoop file"
  shell:  "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force"
  when: "D.rc != 0"
- name: "updating the hdfs conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/hdfs-site.xml"
          block: |
                  <property>
                  <name>dfs.replication</name>
                  <value>replication_factor</value>
                  </property>  
          insertafter: "<configuration>"
- name: "updating core conf file"
  blockinfile:
          state: "present"
          path: "/etc/hadoop/core-site.xml"
          block: |
                  <property>
                  <name>fs.default.name</name>
                  {% for ip in groups['tag_Name_Namenode'] %}
                  <value>hdfs://{{ ip }}:9001</value>
                  {% endfor %}
                  </property>
          insertbefore: "</configuration>" 
- name: "showing the report of cluster"
  shell: "hadoop dfsadmin -report"
  register: E
- debug:

        var: E

now we will run this playbook with this command :

ansible-playbook --ask-vault-pass hadoop.yml

while running this playbook it will prompt you for vault password and after authenticating the password the playbook will run

Vault password: 
enter the number of Datanodes: 2


PLAY [localhost] ***************************************************************


TASK [Gathering Facts] *********************************************************
ok: [localhost]


TASK [provisioning the nodes] **************************************************
included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Datanode', 'num': '2'})
included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Namenode', 'num': 1})
included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Client', 'num': 1})


TASK [provisioning the instances] **********************************************
changed: [localhost]


TASK [provisioning the instances] **********************************************
changed: [localhost]


TASK [provisioning the instances] **********************************************
changed: [localhost]


TASK [waiting for the instances to ready for the use] **************************
Pausing for 60 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]
[WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details
enter the namenode's directory name: NN1


PLAY [tag_Name_Namenode] *******************************************************


TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host 13.233.204.206 is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change the meaning of that path. See https://docs.ansible.com
/ansible/2.10/reference_appendices/interpreter_discovery.html for more
information.
ok: [13.233.204.206]


TASK [hadoop : copying the jdk file] *******************************************
changed: [13.233.204.206]


TASK [copying the hadoop file] *************************************************
changed: [13.233.204.206]


TASK [hadoop : checking the java file is already installed or not] *************
fatal: [13.233.204.206]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.039109", "end": "2021-02-19 15:44:31.896784", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:44:31.857675", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [hadoop : installing jdk file] ********************************************
changed: [13.233.204.206]


TASK [checking the hadoop software is installed or not] ************************
fatal: [13.233.204.206]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.039096", "end": "2021-02-19 15:44:42.577900", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:44:42.538804", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [installing hadoop file] **************************************************
changed: [13.233.204.206]


TASK [hadoop : debug] **********************************************************
ok: [13.233.204.206] => {
    "msg": "NN1"
}


TASK [hadoop : creating a new directory for namenode] **************************
changed: [13.233.204.206]


TASK [hadoop : updating the hdfs conf file] ************************************
changed: [13.233.204.206]


TASK [hadoop : updating core conf file] ****************************************
changed: [13.233.204.206]


TASK [hadoop : checking if the namenode is already formatted] ******************
ok: [13.233.204.206]


TASK [hadoop : formating the namenode] *****************************************
changed: [13.233.204.206]


TASK [hadoop : starting the namenode] ******************************************
changed: [13.233.204.206]


TASK [hadoop : debug] **********************************************************
skipping: [13.233.204.206]


TASK [hadoop : showing the report of namenode] *********************************
changed: [13.233.204.206]


TASK [hadoop : debug] **********************************************************
ok: [13.233.204.206] => {
    "E": {
        "changed": true,
        "cmd": "hadoop dfsadmin -report",
        "delta": "0:00:01.289421",
        "end": "2021-02-19 15:45:06.713983",
        "failed": false,
        "rc": 0,
        "start": "2021-02-19 15:45:05.424562",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "Configured Capacity: 0 (0 KB)\nPresent Capacity: 0 (0 KB)\nDFS Remaining: 0 (0 KB)\nDFS Used: 0 (0 KB)\nDFS Used%: ?%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 0 (0 total, 0 dead)",
        "stdout_lines": [
            "Configured Capacity: 0 (0 KB)",
            "Present Capacity: 0 (0 KB)",
            "DFS Remaining: 0 (0 KB)",
            "DFS Used: 0 (0 KB)",
            "DFS Used%: ?%",
            "Under replicated blocks: 0",
            "Blocks with corrupt replicas: 0",
            "Missing blocks: 0",
            "",
            "-------------------------------------------------",
            "Datanodes available: 0 (0 total, 0 dead)"
        ]
    }
}


PLAY [tag_Name_Datanode] *******************************************************


TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host 13.233.233.54 is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change the meaning of that path. See https://docs.ansible.com
/ansible/2.10/reference_appendices/interpreter_discovery.html for more
information.
ok: [13.233.233.54]
[WARNING]: Platform linux on host 13.127.175.173 is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change the meaning of that path. See https://docs.ansible.com
/ansible/2.10/reference_appendices/interpreter_discovery.html for more
information.
ok: [13.127.175.173]


TASK [hadoopslave : copying the jdk file] **************************************
changed: [13.127.175.173]
changed: [13.233.233.54]


TASK [hadoopslave : copying the hadoop file] ***********************************
changed: [13.127.175.173]
changed: [13.233.233.54]


TASK [hadoopslave : checking the java file is already installed or not] ********
fatal: [13.233.233.54]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.038867", "end": "2021-02-19 15:49:20.004663", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:19.965796", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [13.127.175.173]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.038734", "end": "2021-02-19 15:49:20.113537", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:20.074803", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [hadoopslave : debug] *****************************************************
skipping: [13.233.233.54]
skipping: [13.127.175.173]


TASK [hadoopslave : installing jdk file] ***************************************
changed: [13.127.175.173]
changed: [13.233.233.54]


TASK [hadoopslave : checking the hadoop software is installed or not] **********
fatal: [13.233.233.54]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.038406", "end": "2021-02-19 15:49:31.406020", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:31.367614", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []}
...ignoring
fatal: [13.127.175.173]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.039494", "end": "2021-02-19 15:49:31.542342", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:31.502848", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [hadoopslave : debug] *****************************************************
skipping: [13.233.233.54]
skipping: [13.127.175.173]


TASK [hadoopslave : installing hadoop file] ************************************
changed: [13.233.233.54]
changed: [13.127.175.173]


TASK [hadoopslave : creating a new directory for datanode] *********************
changed: [13.233.233.54]
changed: [13.127.175.173]


TASK [hadoopslave : updating the hdfs conf file] *******************************
changed: [13.233.233.54]
changed: [13.127.175.173]


TASK [hadoopslave : updating core conf file] ***********************************
changed: [13.233.233.54]
changed: [13.127.175.173]


TASK [hadoopslave : starting the datanode] *************************************
changed: [13.233.233.54]
changed: [13.127.175.173]


TASK [hadoopslave : debug] *****************************************************
skipping: [13.233.233.54]
skipping: [13.127.175.173]


TASK [hadoopslave : showing the report of cluster] *****************************
changed: [13.127.175.173]
changed: [13.233.233.54]


TASK [hadoopslave : debug] *****************************************************
ok: [13.233.233.54] => {
    "E": {
        "changed": true,
        "cmd": "hadoop dfsadmin -report",
        "delta": "0:00:01.255826",
        "end": "2021-02-19 15:49:50.604401",
        "failed": false,
        "rc": 0,
        "start": "2021-02-19 15:49:49.348575",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12884541440 (12 GB)\nDFS Remaining: 12884525056 (12 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135027712 (1.99 GB)\nDFS Remaining: 6442295296(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135093248 (1.99 GB)\nDFS Remaining: 6442229760(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021",
        "stdout_lines": [
            "Configured Capacity: 17154662400 (15.98 GB)",
            "Present Capacity: 12884541440 (12 GB)",
            "DFS Remaining: 12884525056 (12 GB)",
            "DFS Used: 16384 (16 KB)",
            "DFS Used%: 0%",
            "Under replicated blocks: 0",
            "Blocks with corrupt replicas: 0",
            "Missing blocks: 0",
            "",
            "-------------------------------------------------",
            "Datanodes available: 2 (2 total, 0 dead)",
            "",
            "Name: 13.233.233.54:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2135027712 (1.99 GB)",
            "DFS Remaining: 6442295296(6 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.11%",
            "Last contact: Fri Feb 19 15:49:50 UTC 2021",
            "",
            "",
            "Name: 13.127.175.173:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2135093248 (1.99 GB)",
            "DFS Remaining: 6442229760(6 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.11%",
            "Last contact: Fri Feb 19 15:49:50 UTC 2021"
        ]
    }
}
ok: [13.127.175.173] => {
    "E": {
        "changed": true,
        "cmd": "hadoop dfsadmin -report",
        "delta": "0:00:01.234052",
        "end": "2021-02-19 15:49:50.522919",
        "failed": false,
        "rc": 0,
        "start": "2021-02-19 15:49:49.288867",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12884541440 (12 GB)\nDFS Remaining: 12884525056 (12 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135027712 (1.99 GB)\nDFS Remaining: 6442295296(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135093248 (1.99 GB)\nDFS Remaining: 6442229760(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021",
        "stdout_lines": [
            "Configured Capacity: 17154662400 (15.98 GB)",
            "Present Capacity: 12884541440 (12 GB)",
            "DFS Remaining: 12884525056 (12 GB)",
            "DFS Used: 16384 (16 KB)",
            "DFS Used%: 0%",
            "Under replicated blocks: 0",
            "Blocks with corrupt replicas: 0",
            "Missing blocks: 0",
            "",
            "-------------------------------------------------",
            "Datanodes available: 2 (2 total, 0 dead)",
            "",
            "Name: 13.233.233.54:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2135027712 (1.99 GB)",
            "DFS Remaining: 6442295296(6 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.11%",
            "Last contact: Fri Feb 19 15:49:50 UTC 2021",
            "",
            "",
            "Name: 13.127.175.173:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2135093248 (1.99 GB)",
            "DFS Remaining: 6442229760(6 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.11%",
            "Last contact: Fri Feb 19 15:49:50 UTC 2021"
        ]
    }
}


PLAY [tag_Name_Client] *********************************************************


TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host 13.127.94.119 is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change the meaning of that path. See https://docs.ansible.com
/ansible/2.10/reference_appendices/interpreter_discovery.html for more
information.
ok: [13.127.94.119]


TASK [hadoopclient : copying the jdk file] *************************************
changed: [13.127.94.119]


TASK [hadoopclient : copying the hadoop file] **********************************
changed: [13.127.94.119]


TASK [hadoopclient : checking the java file is already installed or not] *******
fatal: [13.127.94.119]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.039097", "end": "2021-02-19 15:52:11.869482", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:52:11.830385", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [hadoopclient : debug] ****************************************************
skipping: [13.127.94.119]


TASK [hadoopclient : installing jdk file] **************************************
changed: [13.127.94.119]


TASK [hadoopclient : checking the hadoop software is installed or not] *********
fatal: [13.127.94.119]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.038448", "end": "2021-02-19 15:52:22.414704", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:52:22.376256", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []}
...ignoring


TASK [hadoopclient : debug] ****************************************************
skipping: [13.127.94.119]


TASK [hadoopclient : installing hadoop file] ***********************************
changed: [13.127.94.119]


TASK [hadoopclient : updating the hdfs conf file] ******************************
changed: [13.127.94.119]


TASK [hadoopclient : updating core conf file] **********************************
changed: [13.127.94.119]


TASK [hadoopclient : showing the report of cluster] ****************************
changed: [13.127.94.119]


TASK [hadoopclient : debug] ****************************************************
ok: [13.127.94.119] => {
    "E": {
        "changed": true,
        "cmd": "hadoop dfsadmin -report",
        "delta": "0:00:01.277961",
        "end": "2021-02-19 15:52:34.016109",
        "failed": false,
        "rc": 0,
        "start": "2021-02-19 15:52:32.738148",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12871831552 (11.99 GB)\nDFS Remaining: 12871815168 (11.99 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2141470720 (1.99 GB)\nDFS Remaining: 6435852288(5.99 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.03%\nLast contact: Fri Feb 19 15:52:32 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2141360128 (1.99 GB)\nDFS Remaining: 6435962880(5.99 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.03%\nLast contact: Fri Feb 19 15:52:32 UTC 2021",
        "stdout_lines": [
            "Configured Capacity: 17154662400 (15.98 GB)",
            "Present Capacity: 12871831552 (11.99 GB)",
            "DFS Remaining: 12871815168 (11.99 GB)",
            "DFS Used: 16384 (16 KB)",
            "DFS Used%: 0%",
            "Under replicated blocks: 0",
            "Blocks with corrupt replicas: 0",
            "Missing blocks: 0",
            "",
            "-------------------------------------------------",
            "Datanodes available: 2 (2 total, 0 dead)",
            "",
            "Name: 13.233.233.54:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2141470720 (1.99 GB)",
            "DFS Remaining: 6435852288(5.99 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.03%",
            "Last contact: Fri Feb 19 15:52:32 UTC 2021",
            "",
            "",
            "Name: 13.127.175.173:50010",
            "Decommission Status : Normal",
            "Configured Capacity: 8577331200 (7.99 GB)",
            "DFS Used: 8192 (8 KB)",
            "Non DFS Used: 2141360128 (1.99 GB)",
            "DFS Remaining: 6435962880(5.99 GB)",
            "DFS Used%: 0%",
            "DFS Remaining%: 75.03%",
            "Last contact: Fri Feb 19 15:52:32 UTC 2021"
        ]
    }
}


PLAY RECAP *********************************************************************
13.127.175.173             : ok=13   changed=11   unreachable=0    failed=0    skipped=3    rescued=0    ignored=2   
13.127.94.119              : ok=11   changed=9    unreachable=0    failed=0    skipped=2    rescued=0    ignored=2   
13.233.204.206             : ok=16   changed=12   unreachable=0    failed=0    skipped=1    rescued=0    ignored=2   
13.233.233.54              : ok=13   changed=11   unreachable=0    failed=0    skipped=3    rescued=0    ignored=2   
localhost                  : ok=8    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

as you can see i have given two datanodes so this playbook will configure one instance as master and 2 instances as slave and 1 instance as client

we can also check the hadoop cluster report with the web portal with the port number 50070 and public ip of master

Everything is working fine and hadoop cluster is ready to use now ........

Hope you find this artical intresting.

Thank you !!

Komal Suthar

Technical Support Engineer@Red Hat | RHCA

4 年

Great..Deepak kumar Pandia

查看更多评论

要查看或添加评论，请登录

Deepak Kumar Pandia的更多文章

Deploy WordPress with Amazon RDS using Ansible

2021年2月9日

Deploy WordPress with Amazon RDS using Ansible

Hope you’re doing great !! Welcome to another article , in this article we are going to deploy wordpress with Amazon…
Kubernetes: Revolutionizing Industries

2020年12月25日

Kubernetes: Revolutionizing Industries

Introduction:- Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized…

1 条评论
lvm integration with hadoop

2020年11月15日

lvm integration with hadoop

First we have to create and attach a new physical harddisk, and we can see new hard disk /dev/sdb of 150GB in size now…
Http server configure in docker container : -

2020年11月14日

Http server configure in docker container : -

for configuring the webserver first we have to start the docker : for starting the docker we have to give this command…

1 条评论
High availability aws cloud infrastructure using CloudFront

2020年10月31日

High availability aws cloud infrastructure using CloudFront

This architecture includes :- ?? Webserver configured on EC2 Instance ?? Document Root(/var/www/html) made persistent…
what is aws cli?

2020年10月14日

what is aws cli?

The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and…

3 条评论

See all articles

Configuring a multinode Hadoop Cluster over AWS using : Ansible

Deepak Kumar Pandia

Data Engineer @ Altimetrik

Deepak Kumar Pandia的更多文章

社区洞察

其他会员也浏览了

Big Data: The Top 10 Commercial Hadoop Platforms

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Configuration of HDFS Cluster with Ansible

Hadoop Architecture Made Easy!

How to integrate LVM with Hadoop And provide Elasticity to DataNode Storage?

How to integrate LVM with Hadoop and providing Elasticity to DataNode Storage using AWS Cloud?

Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

INTEGRATION OF LVM Partition WITH HADOOP CLUSTER

Deepak Kumar Pandia的更多文章

Deploy WordPress with Amazon RDS using Ansible

Kubernetes: Revolutionizing Industries

lvm integration with hadoop

Http server configure in docker container : -

High availability aws cloud infrastructure using CloudFront

what is aws cli?

社区洞察

其他会员也浏览了

Big Data: The Top 10 Commercial Hadoop Platforms

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Configuration of HDFS Cluster with Ansible

Hadoop Architecture Made Easy!

How to integrate LVM with Hadoop And provide Elasticity to DataNode Storage?

How to integrate LVM with Hadoop and providing Elasticity to DataNode Storage using AWS Cloud?

Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

INTEGRATION OF LVM Partition WITH HADOOP CLUSTER