登录查看更多内容

Ansible Automation for Hadoop Cluster

Yash Dwivedi

RHCA | Quantum Computing | OpenShift | DevSecOps | DevOps | MLOps | Big Data | Hybrid Multi Cloud | AWS | GCP | Python | Terraform | Ansible | Kubernetes | MongoDB | GIT & GitHub

发布日期: 2021年3月21日

Hadoop:

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.

NameNode: NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes/slave nodes.
DataNode: Data Nodes are the slave nodes in HDFS. DataNodes are responsible for storing actual data.

Ansible:

Ansible is a software tool that provides simple but powerful automation for cross-platform computer support. Ansible doesn't depend on agent software and has no additional security infrastructure, so it's easy to deploy.

Because Ansible is all about automation, it requires instructions to accomplish each job. With everything written down in simple script form.

Task:

Configuring Hadoop and Start Cluster Services.

Solution:

For configuring Hadoop here I'll use one NameNode and one DataNode so we have two ManagedNode for Ansible, that's why we're configuring inventory for two managed nodes as:

Now we can configure Ansible config file which is Ansible.cfg, we can use /etc/ansible/ansible.cfg file which is pre-created but we can also create one in our workspace which will considered first by ansible while running playbook from workspace.

For testing managed node connectivity we can use the command

#ansible all -m ping

Now, It's time to Write Ansible Script-

GitHub URL:- https://github.com/yashdwi05/ansible-hadoop.git

Here is the GitHub link for Ansible Script and the script run all the task fine, we'll have output as:

After Playbook Run Completely we can also check manually the process of NameNode and DataNode. For Running Hadoop we need to first install java because hadoop run on the top of java, so after installing java hadoop will work fine.

Thank You!

要查看或添加评论，请登录

Yash Dwivedi的更多文章

Web Application for Docker

2021年6月27日

Web Application for Docker

Task: This app will help the user to run all the docker commands of docker using webApp using JavaScript. Solution: for…

2 条评论
Making HTTPD Service Idempotent

2021年3月23日

Making HTTPD Service Idempotent

Description: Restarting HTTPD Service is not idempotence in nature and also consume more resources suggest a way to…
Industry Use Cases of Ansible

2021年3月22日

Industry Use Cases of Ansible

Hello, Here in the blog we'll talk something about Ansible and then will know how trivago is using Ansible, so let's…
Ansible Playbook for Launching Docker

2021年3月19日

Ansible Playbook for Launching Docker

Task: ?? Configure Docker ?? Start and enable Docker services ?? Pull the httpd server image from the Docker Hub ?? Run…
Porsche Informatik delivers automotive innovation faster with Red Hat OpenShift

2021年3月13日

Porsche Informatik delivers automotive innovation faster with Red Hat OpenShift

Porsche Informatik, IT service provider for the Volkswagen Group, needed to speed application development and delivery…
Network Setup We can Ping Google but not Facebook from Same?System

2021年3月12日

Network Setup We can Ping Google but not Facebook from Same?System

For Network setup we have to setup Routing table. A routing table is a set of rules, often viewed in table format, that…
Automating LVM Partition using Python-Script

2020年11月23日

Automating LVM Partition using Python-Script

We can automate LVM through python for automation we have the code link below: GitHub Link:-…
Configuring webserver and python inside container.

2020年11月23日

Configuring webserver and python inside container.

??Configuring Httpd Server on Docker Container ??Setting up Python Interpreter and running Python Code on Docker…
Increase or Decrease the Size of Static Partition Without Losing Any Data

2020年11月23日

Increase or Decrease the Size of Static Partition Without Losing Any Data

This Article explain how to increase/decrease the size of the static partition without losing data in Linux. WHAT IS…
Hadoop and LVM Integration

2020年11月22日

Hadoop and LVM Integration

LVM help to provide elasticity to data node and it will solve industry use cases. We can increase or decrease storage…

See all articles

Ansible Automation for Hadoop Cluster

Yash Dwivedi

RHCA | Quantum Computing | OpenShift | DevSecOps | DevOps | MLOps | Big Data | Hybrid Multi Cloud | AWS | GCP | Python | Terraform | Ansible | Kubernetes | MongoDB | GIT & GitHub

Yash Dwivedi的更多文章

社区洞察

其他会员也浏览了

Do I need Hadoop to be a good Data Scientist?

?? Hadoop Made Easy: Fix Common Errors and Install it Like a Pro!"

Hadoop 3: Comparison with Hadoop 2 and Spark

Hadoop 2.x

Harnessing the Power of Hadoop A Guide to Effective Data Management

Hadoop Cluster Revealed

What are the prerequisites to learn Hadoop?

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Hadoop Architecture

Spark Or Hadoop : Which Is The Best Big Data Framework?

Yash Dwivedi的更多文章

Web Application for Docker

Making HTTPD Service Idempotent

Industry Use Cases of Ansible

Ansible Playbook for Launching Docker

Porsche Informatik delivers automotive innovation faster with Red Hat OpenShift

Network Setup We can Ping Google but not Facebook from Same?System

Automating LVM Partition using Python-Script

Configuring webserver and python inside container.

Increase or Decrease the Size of Static Partition Without Losing Any Data

Hadoop and LVM Integration

社区洞察

其他会员也浏览了

Do I need Hadoop to be a good Data Scientist?

?? Hadoop Made Easy: Fix Common Errors and Install it Like a Pro!"

Hadoop 3: Comparison with Hadoop 2 and Spark

Hadoop 2.x

Harnessing the Power of Hadoop A Guide to Effective Data Management

Hadoop Cluster Revealed

What are the prerequisites to learn Hadoop?

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Hadoop Architecture

Spark Or Hadoop : Which Is The Best Big Data Framework?