Configure Hadoop and start cluster services using Ansible Playbook
ARTH - Task 11.1 ???????
Task Description??
?? Configure Hadoop and start cluster services using Ansible Playbook
Introduction To Hadoop -
* Big Data is not a technology we can say that it is just a umbrella of problems which occurs because of huge amount of data and in different formats.
* Visit this article to know more about BigData -
* To solve BigData problem Hadoop is used . Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model
* To know how to configure Hadoop Cluster Visit this Article -
* In this practical I will create All ansible playbooks in "/root/HadoopCluster/" -
* For this practical I will use three Virtual Machine on My Local System. My NameNode is (IP - 192.168.43.129) & DataNode is (IP - 192.168.43.94).
* Ansible Configuration File "/etc/ansible/ansibe.cfg" at Controller Node -
* Inventory File "/root/HadoopCluster/hosts.txt" at Controller Node ( In my case I create Two groups of Managed Node First --> NameNode Second --> DataNode) -
Visit my this article to know about Ansible Playbook & Grouping of Inventories -
* Check List of Managed Node with ansible command -
To get all Managed Node IPs list - # ansible all --list-hosts To get "NameNode" group IPs list - # ansible NameNode --list-hosts To get "DataNode" group IPs list - # ansible DataNode --list-hosts
* Check Connectivity all Managed Node with Controller Node -
# ansible all -m ping
* For this practical I will create many file but our main file (playbook) is "hadoopcluster.yml" . We will create all files "hadoopcluster.yml" , "install_package.yml" , "Configure_Node.yml" , "namenode.yml" , "datanode.yml" . Overview of our Ansible Playbooks -
Step - 1 Create "hadoopcluster.yml" file -
1(A) Task 1 -
1(B) Task 2 -
Visit this article to know more about loop -
1(C) Task 3 -
1(D) Task 4 -
Step -2 Gather All Data For setup Hadoop Cluster -
* I create a separate variable file "hadoop_var.yml".
2(A) "Hadoop_Package_Requirement" is variable which have requied package name & their command -
2(B) . "Node_MetaData" is a variable which have some metadata about Nodes
2(C) . "Node" is variable which have Hadoop Configuration Properties for NameNode & DataNodes -
Step - 3 Write Tasks in separate file "install_package.yml" to install Required Packages for establish Hadoop Cluster -
3(A) . Task 1
3(B) . Task 2
I have "hadoop-1.2.1-1.x86_64.rpm" & "jdk-8u171-linux-x64.rpm" also in "/root/HadoopCluster" directory.
3(C) . Task 3
Step - 4 Create "Configure_Node.yml" -
Step -5 Create "namenode.yml" file -
5(A) . Task 1
5(B) . Task 2
5(C) . Task 3
5(D) . Task 4
5(E) . Task 5
5(F) . Task 6
Step - 6 Create "datanode.yml" -
6(A) . Task 1
6(B) . Task 2
6(C) . Task 3
6(D) . Task 4
6(E) . Task 5
Step - 7 Run "hadoopcluster.yml" playbook -
# ansible-playbook hadoopcluster.yml
Step - 8 Check Hadoop Cluster setup is done or not -
At NameNode # jps # hadoop dfsadmin -report At DataNode # jps
* Upload File from namenode to Hadoop Cluster ( To check )-
# hadoop fs -put home.html # hadoop fs -ls /
Our Hadoop Cluster is working good
Task is successfully done.