Ansible Automation for Hadoop Cluster

Ansible Automation for Hadoop Cluster

Hadoop:

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

  1. NameNode: NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes/slave nodes.
  2. DataNode: Data Nodes are the slave nodes in HDFS. DataNodes are responsible for storing actual data.

Ansible:

Ansible is a software tool that provides simple but powerful automation for cross-platform computer support. Ansible doesn't depend on agent software and has no additional security infrastructure, so it's easy to deploy.

Because Ansible is all about automation, it requires instructions to accomplish each job. With everything written down in simple script form.

Task:

Configuring Hadoop and Start Cluster Services.

Solution:

For configuring Hadoop here I'll use one NameNode and one DataNode so we have two ManagedNode for Ansible, that's why we're configuring inventory for two managed nodes as:

No alt text provided for this image

Now we can configure Ansible config file which is Ansible.cfg, we can use /etc/ansible/ansible.cfg file which is pre-created but we can also create one in our workspace which will considered first by ansible while running playbook from workspace.

No alt text provided for this image

For testing managed node connectivity we can use the command

#ansible all -m ping

Now, It's time to Write Ansible Script-

GitHub URL:- https://github.com/yashdwi05/ansible-hadoop.git

Here is the GitHub link for Ansible Script and the script run all the task fine, we'll have output as:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

After Playbook Run Completely we can also check manually the process of NameNode and DataNode. For Running Hadoop we need to first install java because hadoop run on the top of java, so after installing java hadoop will work fine.

Thank You!

要查看或添加评论,请登录

Yash Dwivedi的更多文章

社区洞察

其他会员也浏览了