Configuring a multinode Hadoop Cluster over AWS using : Ansible
Hello everyone......
Here is my new article, in this article we are going to see how we can built a Multinode Hadoop cluster on AWS with the help of Ansible.
PROBLEM : Sometimes its quite difficult and time consuming to launch multiple instances and perform all the steps manually to setup a Hadoop cluster. and it becomes more time consuming if we have requirement to setup a cluster with multiple datanodes e.g. 100
So to overcome this issue i used Ansible and written an ansible playbook to configure the whole cluster in a single go, for this playbook i created three ROLES. One for Namenode and one for datanode and one for client and each Role will perform all the steps required to setup a Multinode Hadoop Cluster.
What is ROLES?
In Ansible, the role is the primary mechanism for breaking a playbook into multiple files. This simplifies writing complex playbooks, and it makes them easier to reuse.
now here is the playbook for whole setup:
- hosts: "localhost" vars_files: - "secret.yml" vars_prompt: - name: "num_item" prompt: "enter the number of Datanodes" private: no tasks: - name: "provisioning the nodes" include_tasks: "/root/arth_ws/ansible/test2.yml" loop: - { name: 'Datanode', num: "{{ num_item }}" } - { name: 'Namenode', num: 1 } - { name: 'Client', num: 1 } - name: "waiting for the instances to ready for the use " pause: seconds: 60 - name: "refreshing the inventory" meta: "refresh_inventory" - hosts: "tag_Name_Namenode" vars_prompt: - name: "dr" prompt: "enter the namenode's directory name" private: no roles: - role: "hadoop" - hosts: "tag_Name_Datanode" roles: - role: "hadoopslave" - hosts: "tag_Name_Client" roles:
- role: "hadoopclient"
as you can see in this above playbook i attached a secret file , this file contains the credential of my AWS account in encrypted form and for this i used concept of vault in Ansible. This playbook will prompt us three times, for vault password, number of datanodes and name of the Namenode's directory.
this playbook contains three roles:
- hadoop = used for Namenode
- hadoopslave = used for Datanodes
- hadoopclient = used for Client
This playbook also includes a task test2.yml, this task will launch the instances on AWS and here i used jinja templates to make the playbook more dynamic, now when we run this task it will launch one Namenode and one Client and number of Datanodes which we will provide to the playbook
Now here is test2.yml file :
- name: "provisioning the instances " ec2: zone: "ap-south-1a" assign_public_ip: "yes" ec2_access_key: "{{ u }}" ec2_secret_key: "{{ p }}" count: "{{ item.num }}" image: "ami-04b1ddd35fd71475a" instance_tags: Name: "{{ item.name }}" instance_type: "t2.micro" key_name: "hadoop_slave_key" region: "ap-south-1" state: "present" wait: yes
vpc_subnet_id: "subnet-678b820f"
now every instance will tagged with some tag_Name and ansible will use that tag_Name to perform the further tasks and for this we have to use dynamic inventory, it will helps to perform the tasks dynamically.
Now we have to create three ansible roles with this command for different-different use cases :
ansible-galaxy init <role_name>
these roles will contains some folders and inside these folders we have to give tasks,templates, variables etc.
here is all the role's tasks:
- Hadoop: this role will configure all the instances with tag_Name_Namenode as Namenode , this role will also check if the namenode is already formatted than it will not format the namenode again
# tasks file for hadoop master - name: "copying the jdk file " copy: src: "/root/jdk-8u171-linux-x64.rpm" dest: "/root" - name: "copying the hadoop file" copy: src: "/root/hadoop-1.2.1-1.x86_64.rpm" dest: "/root" - name: "checking the java file is already installed or not " shell: "java -version" ignore_errors: yes register: C - name: "installing jdk file" shell: "rpm -i /root/jdk-8u171-linux-x64.rpm" ignore_errors: yes when: "C.rc != 0" - name: "checking the hadoop software is installed or not" shell: "hadoop version" ignore_errors: yes register: D - name: "installing hadoop file" shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force" when: "D.rc != 0" - debug: msg: "{{ dr }}" - name: "creating a new directory for namenode " file: state: directory path: "/{{ dr }}" - name: "updating the hdfs conf file" blockinfile: state: "present" path: "/etc/hadoop/hdfs-site.xml" block: | <property> <name>dfs.name.dir</name> <value>/{{ dr }}</value> </property> insertafter: "<configuration>" - name: "updating core conf file" blockinfile: state: "present" path: "/etc/hadoop/core-site.xml" block: | <property> <name>fs.default.name</name> <value>hdfs://0.0.0.0:9001</value> </property> insertbefore: "</configuration>" - name: "checking if the namenode is already formatted" stat: path: "/{{ dr }}/current/VERSION" register: check - name: "formating the namenode" shell: "echo Y | hadoop namenode -format" ignore_errors: yes when: "check.stat.exists == false" - name: "starting the namenode" command: "hadoop-daemon.sh start namenode" ignore_errors: yes register: X - debug: msg: "namenode is already running" when: X.rc != 0 - name: "showing the report of namenode" shell: "hadoop dfsadmin -report" register: E - debug:
var: E
2. hadoopslave: this role will configure all the instances with tag_Name_Datanode as datanode.
# tasks file for hadoopslave - name: "copying the jdk file " copy: src: "/root/jdk-8u171-linux-x64.rpm" dest: "/root" - name: "copying the hadoop file" copy: src: "/root/hadoop-1.2.1-1.x86_64.rpm" dest: "/root" - name: "checking the java file is already installed or not " shell: "java -version" ignore_errors: yes register: C - debug: msg: "jdk is already installed" when: "C.rc == 0" - name: "installing jdk file" shell: "rpm -i /root/jdk-8u171-linux-x64.rpm" ignore_errors: yes when: "C.rc != 0" - name: "checking the hadoop software is installed or not" shell: "hadoop version" ignore_errors: yes register: D - debug: msg: "hadoop is already installed" when: "D.rc == 0" - name: "installing hadoop file" shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force" when: "D.rc != 0" - name: "creating a new directory for datanode " file: state: directory path: "/DN1" - name: "updating the hdfs conf file" blockinfile: state: "present" path: "/etc/hadoop/hdfs-site.xml" block: | <property> <name>dfs.data.dir</name> <value>/DN1</value> </property> insertafter: "<configuration>" - name: "updating core conf file" blockinfile: state: "present" path: "/etc/hadoop/core-site.xml" block: | <property> <name>fs.default.name</name> {% for ip in groups['tag_Name_Namenode'] %} <value>hdfs://{{ ip }}:9001</value> {% endfor %} </property> insertbefore: "</configuration>" - name: "starting the datanode" command: "hadoop-daemon.sh start datanode" ignore_errors: yes register: X - debug: msg: "datanode is already running" when: X.rc != 0 - name: "showing the report of cluster" shell: "hadoop dfsadmin -report" register: E - debug:
var: E
3. hadoopclient: this role will configure the instances as client
# tasks file for hadoopslave - name: "copying the jdk file " copy: src: "/root/jdk-8u171-linux-x64.rpm" dest: "/root" - name: "copying the hadoop file" copy: src: "/root/hadoop-1.2.1-1.x86_64.rpm" dest: "/root" - name: "checking the java file is already installed or not " shell: "java -version" ignore_errors: yes register: C - debug: msg: "jdk is already installed" when: "C.rc == 0" - name: "installing jdk file" shell: "rpm -i /root/jdk-8u171-linux-x64.rpm" ignore_errors: yes when: "C.rc != 0" - name: "checking the hadoop software is installed or not" shell: "hadoop version" ignore_errors: yes register: D - debug: msg: "hadoop is already installed" when: "D.rc == 0" - name: "installing hadoop file" shell: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force" when: "D.rc != 0" - name: "updating the hdfs conf file" blockinfile: state: "present" path: "/etc/hadoop/hdfs-site.xml" block: | <property> <name>dfs.replication</name> <value>replication_factor</value> </property> insertafter: "<configuration>" - name: "updating core conf file" blockinfile: state: "present" path: "/etc/hadoop/core-site.xml" block: | <property> <name>fs.default.name</name> {% for ip in groups['tag_Name_Namenode'] %} <value>hdfs://{{ ip }}:9001</value> {% endfor %} </property> insertbefore: "</configuration>" - name: "showing the report of cluster" shell: "hadoop dfsadmin -report" register: E - debug:
var: E
now we will run this playbook with this command :
ansible-playbook --ask-vault-pass hadoop.yml
while running this playbook it will prompt you for vault password and after authenticating the password the playbook will run
Vault password: enter the number of Datanodes: 2 PLAY [localhost] *************************************************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [provisioning the nodes] ************************************************** included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Datanode', 'num': '2'}) included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Namenode', 'num': 1}) included: /root/arth_ws/ansible/test2.yml for localhost => (item={'name': 'Client', 'num': 1}) TASK [provisioning the instances] ********************************************** changed: [localhost] TASK [provisioning the instances] ********************************************** changed: [localhost] TASK [provisioning the instances] ********************************************** changed: [localhost] TASK [waiting for the instances to ready for the use] ************************** Pausing for 60 seconds (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) ok: [localhost] [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details enter the namenode's directory name: NN1 PLAY [tag_Name_Namenode] ******************************************************* TASK [Gathering Facts] ********************************************************* [WARNING]: Platform linux on host 13.233.204.206 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com /ansible/2.10/reference_appendices/interpreter_discovery.html for more information. ok: [13.233.204.206] TASK [hadoop : copying the jdk file] ******************************************* changed: [13.233.204.206] TASK [copying the hadoop file] ************************************************* changed: [13.233.204.206] TASK [hadoop : checking the java file is already installed or not] ************* fatal: [13.233.204.206]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.039109", "end": "2021-02-19 15:44:31.896784", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:44:31.857675", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [hadoop : installing jdk file] ******************************************** changed: [13.233.204.206] TASK [checking the hadoop software is installed or not] ************************ fatal: [13.233.204.206]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.039096", "end": "2021-02-19 15:44:42.577900", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:44:42.538804", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [installing hadoop file] ************************************************** changed: [13.233.204.206] TASK [hadoop : debug] ********************************************************** ok: [13.233.204.206] => { "msg": "NN1" } TASK [hadoop : creating a new directory for namenode] ************************** changed: [13.233.204.206] TASK [hadoop : updating the hdfs conf file] ************************************ changed: [13.233.204.206] TASK [hadoop : updating core conf file] **************************************** changed: [13.233.204.206] TASK [hadoop : checking if the namenode is already formatted] ****************** ok: [13.233.204.206] TASK [hadoop : formating the namenode] ***************************************** changed: [13.233.204.206] TASK [hadoop : starting the namenode] ****************************************** changed: [13.233.204.206] TASK [hadoop : debug] ********************************************************** skipping: [13.233.204.206] TASK [hadoop : showing the report of namenode] ********************************* changed: [13.233.204.206] TASK [hadoop : debug] ********************************************************** ok: [13.233.204.206] => { "E": { "changed": true, "cmd": "hadoop dfsadmin -report", "delta": "0:00:01.289421", "end": "2021-02-19 15:45:06.713983", "failed": false, "rc": 0, "start": "2021-02-19 15:45:05.424562", "stderr": "", "stderr_lines": [], "stdout": "Configured Capacity: 0 (0 KB)\nPresent Capacity: 0 (0 KB)\nDFS Remaining: 0 (0 KB)\nDFS Used: 0 (0 KB)\nDFS Used%: ?%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 0 (0 total, 0 dead)", "stdout_lines": [ "Configured Capacity: 0 (0 KB)", "Present Capacity: 0 (0 KB)", "DFS Remaining: 0 (0 KB)", "DFS Used: 0 (0 KB)", "DFS Used%: ?%", "Under replicated blocks: 0", "Blocks with corrupt replicas: 0", "Missing blocks: 0", "", "-------------------------------------------------", "Datanodes available: 0 (0 total, 0 dead)" ] } } PLAY [tag_Name_Datanode] ******************************************************* TASK [Gathering Facts] ********************************************************* [WARNING]: Platform linux on host 13.233.233.54 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com /ansible/2.10/reference_appendices/interpreter_discovery.html for more information. ok: [13.233.233.54] [WARNING]: Platform linux on host 13.127.175.173 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com /ansible/2.10/reference_appendices/interpreter_discovery.html for more information. ok: [13.127.175.173] TASK [hadoopslave : copying the jdk file] ************************************** changed: [13.127.175.173] changed: [13.233.233.54] TASK [hadoopslave : copying the hadoop file] *********************************** changed: [13.127.175.173] changed: [13.233.233.54] TASK [hadoopslave : checking the java file is already installed or not] ******** fatal: [13.233.233.54]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.038867", "end": "2021-02-19 15:49:20.004663", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:19.965796", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []} ...ignoring fatal: [13.127.175.173]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.038734", "end": "2021-02-19 15:49:20.113537", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:20.074803", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [hadoopslave : debug] ***************************************************** skipping: [13.233.233.54] skipping: [13.127.175.173] TASK [hadoopslave : installing jdk file] *************************************** changed: [13.127.175.173] changed: [13.233.233.54] TASK [hadoopslave : checking the hadoop software is installed or not] ********** fatal: [13.233.233.54]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.038406", "end": "2021-02-19 15:49:31.406020", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:31.367614", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []} ...ignoring fatal: [13.127.175.173]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.039494", "end": "2021-02-19 15:49:31.542342", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:49:31.502848", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [hadoopslave : debug] ***************************************************** skipping: [13.233.233.54] skipping: [13.127.175.173] TASK [hadoopslave : installing hadoop file] ************************************ changed: [13.233.233.54] changed: [13.127.175.173] TASK [hadoopslave : creating a new directory for datanode] ********************* changed: [13.233.233.54] changed: [13.127.175.173] TASK [hadoopslave : updating the hdfs conf file] ******************************* changed: [13.233.233.54] changed: [13.127.175.173] TASK [hadoopslave : updating core conf file] *********************************** changed: [13.233.233.54] changed: [13.127.175.173] TASK [hadoopslave : starting the datanode] ************************************* changed: [13.233.233.54] changed: [13.127.175.173] TASK [hadoopslave : debug] ***************************************************** skipping: [13.233.233.54] skipping: [13.127.175.173] TASK [hadoopslave : showing the report of cluster] ***************************** changed: [13.127.175.173] changed: [13.233.233.54] TASK [hadoopslave : debug] ***************************************************** ok: [13.233.233.54] => { "E": { "changed": true, "cmd": "hadoop dfsadmin -report", "delta": "0:00:01.255826", "end": "2021-02-19 15:49:50.604401", "failed": false, "rc": 0, "start": "2021-02-19 15:49:49.348575", "stderr": "", "stderr_lines": [], "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12884541440 (12 GB)\nDFS Remaining: 12884525056 (12 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135027712 (1.99 GB)\nDFS Remaining: 6442295296(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135093248 (1.99 GB)\nDFS Remaining: 6442229760(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021", "stdout_lines": [ "Configured Capacity: 17154662400 (15.98 GB)", "Present Capacity: 12884541440 (12 GB)", "DFS Remaining: 12884525056 (12 GB)", "DFS Used: 16384 (16 KB)", "DFS Used%: 0%", "Under replicated blocks: 0", "Blocks with corrupt replicas: 0", "Missing blocks: 0", "", "-------------------------------------------------", "Datanodes available: 2 (2 total, 0 dead)", "", "Name: 13.233.233.54:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2135027712 (1.99 GB)", "DFS Remaining: 6442295296(6 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.11%", "Last contact: Fri Feb 19 15:49:50 UTC 2021", "", "", "Name: 13.127.175.173:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2135093248 (1.99 GB)", "DFS Remaining: 6442229760(6 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.11%", "Last contact: Fri Feb 19 15:49:50 UTC 2021" ] } } ok: [13.127.175.173] => { "E": { "changed": true, "cmd": "hadoop dfsadmin -report", "delta": "0:00:01.234052", "end": "2021-02-19 15:49:50.522919", "failed": false, "rc": 0, "start": "2021-02-19 15:49:49.288867", "stderr": "", "stderr_lines": [], "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12884541440 (12 GB)\nDFS Remaining: 12884525056 (12 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135027712 (1.99 GB)\nDFS Remaining: 6442295296(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2135093248 (1.99 GB)\nDFS Remaining: 6442229760(6 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.11%\nLast contact: Fri Feb 19 15:49:50 UTC 2021", "stdout_lines": [ "Configured Capacity: 17154662400 (15.98 GB)", "Present Capacity: 12884541440 (12 GB)", "DFS Remaining: 12884525056 (12 GB)", "DFS Used: 16384 (16 KB)", "DFS Used%: 0%", "Under replicated blocks: 0", "Blocks with corrupt replicas: 0", "Missing blocks: 0", "", "-------------------------------------------------", "Datanodes available: 2 (2 total, 0 dead)", "", "Name: 13.233.233.54:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2135027712 (1.99 GB)", "DFS Remaining: 6442295296(6 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.11%", "Last contact: Fri Feb 19 15:49:50 UTC 2021", "", "", "Name: 13.127.175.173:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2135093248 (1.99 GB)", "DFS Remaining: 6442229760(6 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.11%", "Last contact: Fri Feb 19 15:49:50 UTC 2021" ] } } PLAY [tag_Name_Client] ********************************************************* TASK [Gathering Facts] ********************************************************* [WARNING]: Platform linux on host 13.127.94.119 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com /ansible/2.10/reference_appendices/interpreter_discovery.html for more information. ok: [13.127.94.119] TASK [hadoopclient : copying the jdk file] ************************************* changed: [13.127.94.119] TASK [hadoopclient : copying the hadoop file] ********************************** changed: [13.127.94.119] TASK [hadoopclient : checking the java file is already installed or not] ******* fatal: [13.127.94.119]: FAILED! => {"changed": true, "cmd": "java -version", "delta": "0:00:00.039097", "end": "2021-02-19 15:52:11.869482", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:52:11.830385", "stderr": "/bin/sh: java: command not found", "stderr_lines": ["/bin/sh: java: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [hadoopclient : debug] **************************************************** skipping: [13.127.94.119] TASK [hadoopclient : installing jdk file] ************************************** changed: [13.127.94.119] TASK [hadoopclient : checking the hadoop software is installed or not] ********* fatal: [13.127.94.119]: FAILED! => {"changed": true, "cmd": "hadoop version", "delta": "0:00:00.038448", "end": "2021-02-19 15:52:22.414704", "msg": "non-zero return code", "rc": 127, "start": "2021-02-19 15:52:22.376256", "stderr": "/bin/sh: hadoop: command not found", "stderr_lines": ["/bin/sh: hadoop: command not found"], "stdout": "", "stdout_lines": []} ...ignoring TASK [hadoopclient : debug] **************************************************** skipping: [13.127.94.119] TASK [hadoopclient : installing hadoop file] *********************************** changed: [13.127.94.119] TASK [hadoopclient : updating the hdfs conf file] ****************************** changed: [13.127.94.119] TASK [hadoopclient : updating core conf file] ********************************** changed: [13.127.94.119] TASK [hadoopclient : showing the report of cluster] **************************** changed: [13.127.94.119] TASK [hadoopclient : debug] **************************************************** ok: [13.127.94.119] => { "E": { "changed": true, "cmd": "hadoop dfsadmin -report", "delta": "0:00:01.277961", "end": "2021-02-19 15:52:34.016109", "failed": false, "rc": 0, "start": "2021-02-19 15:52:32.738148", "stderr": "", "stderr_lines": [], "stdout": "Configured Capacity: 17154662400 (15.98 GB)\nPresent Capacity: 12871831552 (11.99 GB)\nDFS Remaining: 12871815168 (11.99 GB)\nDFS Used: 16384 (16 KB)\nDFS Used%: 0%\nUnder replicated blocks: 0\nBlocks with corrupt replicas: 0\nMissing blocks: 0\n\n-------------------------------------------------\nDatanodes available: 2 (2 total, 0 dead)\n\nName: 13.233.233.54:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2141470720 (1.99 GB)\nDFS Remaining: 6435852288(5.99 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.03%\nLast contact: Fri Feb 19 15:52:32 UTC 2021\n\n\nName: 13.127.175.173:50010\nDecommission Status : Normal\nConfigured Capacity: 8577331200 (7.99 GB)\nDFS Used: 8192 (8 KB)\nNon DFS Used: 2141360128 (1.99 GB)\nDFS Remaining: 6435962880(5.99 GB)\nDFS Used%: 0%\nDFS Remaining%: 75.03%\nLast contact: Fri Feb 19 15:52:32 UTC 2021", "stdout_lines": [ "Configured Capacity: 17154662400 (15.98 GB)", "Present Capacity: 12871831552 (11.99 GB)", "DFS Remaining: 12871815168 (11.99 GB)", "DFS Used: 16384 (16 KB)", "DFS Used%: 0%", "Under replicated blocks: 0", "Blocks with corrupt replicas: 0", "Missing blocks: 0", "", "-------------------------------------------------", "Datanodes available: 2 (2 total, 0 dead)", "", "Name: 13.233.233.54:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2141470720 (1.99 GB)", "DFS Remaining: 6435852288(5.99 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.03%", "Last contact: Fri Feb 19 15:52:32 UTC 2021", "", "", "Name: 13.127.175.173:50010", "Decommission Status : Normal", "Configured Capacity: 8577331200 (7.99 GB)", "DFS Used: 8192 (8 KB)", "Non DFS Used: 2141360128 (1.99 GB)", "DFS Remaining: 6435962880(5.99 GB)", "DFS Used%: 0%", "DFS Remaining%: 75.03%", "Last contact: Fri Feb 19 15:52:32 UTC 2021" ] } } PLAY RECAP ********************************************************************* 13.127.175.173 : ok=13 changed=11 unreachable=0 failed=0 skipped=3 rescued=0 ignored=2 13.127.94.119 : ok=11 changed=9 unreachable=0 failed=0 skipped=2 rescued=0 ignored=2 13.233.204.206 : ok=16 changed=12 unreachable=0 failed=0 skipped=1 rescued=0 ignored=2 13.233.233.54 : ok=13 changed=11 unreachable=0 failed=0 skipped=3 rescued=0 ignored=2 localhost : ok=8 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
as you can see i have given two datanodes so this playbook will configure one instance as master and 2 instances as slave and 1 instance as client
we can also check the hadoop cluster report with the web portal with the port number 50070 and public ip of master
Everything is working fine and hadoop cluster is ready to use now ........
Hope you find this artical intresting.
Thank you !!
Technical Support Engineer@Red Hat | RHCA
4 年Great..Deepak kumar Pandia