登录查看更多内容

Hadoop Cluster Availability

Hritik Kumar

Programmer Analyst at Cognizant

发布日期: 2020年10月6日

What is Hadoop ?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. Hadoop is written in java language and developed by Apache software foundation.

Hadoop Cluster

In this figure we see simple Hadoop cluster. First we talk about client. Client is someone who upload, delete, read, write any file on this Hadoop cluster.

Namenode:-It is also known as master node. It is main component of hadoop cluster. All data node connected with one master node.

Datanode:- It is also known as slave node. Slave node share their own components with master node.

Availability of system or data in the wake of component failure in the system.

My team members:

Hritik Kumar (Me - Team Leader)
Ekansh (Learner success head)
Satyansh Srivastava (Team member)
Pravat Kumar Nath Sharma (Team member)
Vijay (Team member)

So here we are using AWS EC2 service to launch 6 instances, 1 for master node 1 for client and 4 for slave node. These all instances are of Red Hat Enterprise Linux 8. We install Hadoop 1.2.1 and jdk-8u171 software on top of all 6 instances. We are using putty software for a connection to all instances.

When installation part completed then we configured hdfs-site.xml and core-site.xml files in all instance except client instance. In client node we only configured core-site.xml file. (All these files are under /etc/hadoop folder)

hdfs-site.xml configuration for the NameNode:

</property>

</configuration>

core-site.xml configuration for the NameNode:

<name>fs.default.name</name>

</property>

</configuration>

hdfs-site.xml configuration for all the DataNodes:

</property>

</configuration>

core-site.xml configuration for all the DataNodes and client:

<name>fs.default.name</name>

</property>

</configuration>

Here 54.145.79.242 was the public IP of our Namenode at the time of testing.

For testing purpose we upload a text file of 140MB from client side. This file was store in 3 block because default block size of hadoop version 1 is 64MB. So our file size is more than 64MB so this file needs three block.

First tool we use for this task is tcpdump:-

We observed that when we stop one instance, another one instance get connected and provide service. When we stop second instance, third instance get connected and provide service to the client.

In this practical we saw that there is seamless availability because of replication. Replicas of files are stored on different nodes to fulfill availability. Master node handle Datanode failure situation.

I would like to thank all my team members for actively participating and completing this task.

SATYANSH SRIVASTAVA

Data & Product @ Bajaj Finserv | Bridging Data Engineering with Product Strategy for Innovation | Computer Science Engineer

4 年

Happy to help. ??

1 次回应

Pravat Kumar nathsharma

DevOps Engineer at Market Medium

4 年

Keep it up

1 次回应

查看更多评论

要查看或添加评论，请登录

Hritik Kumar的更多文章

Arth-Task: 22

2021年5月3日

Arth-Task: 22

Task Description?? ??? Research on use-cases of AWS SQS and create a blog, Article elaborating how it works. Hello…
ARTH TASK 14.3

2021年5月1日

ARTH TASK 14.3

Task Description?? ?? 14.3 Create an ansible Playbook which will dynamically load the variable file named the same as…
ARTH Team Task 1

2021年3月28日

ARTH Team Task 1

Task Description?? Create a menu using Python integrating all the concepts that have been taught by Vimal sir till now.…
Industry use cases of Neural Networks

2021年3月28日

Industry use cases of Neural Networks

What are Neural Networks? A neural network is a series of algorithms that endeavors to recognize underlying…
Industries are solving challenges using Ansible

2021年3月21日

Industries are solving challenges using Ansible

Task Description Create a Article, blog or Video on how industries are solving challenges using Ansible What is Ansible…

2 条评论
How Kubernetes is used in Industries

2021年3月14日

How Kubernetes is used in Industries

What is Kubernetes ? Kubernetes is an open-source container orchestration platform that enables the operation of an…
AWS: Netflix Case Study

2021年3月14日

AWS: Netflix Case Study

We all have a dream to do some type of startup but when it comes to funding, almost all of the new startups have a…
Artificial Intelligence in automotive Industry

2020年10月20日

Artificial Intelligence in automotive Industry

What is Artificial Intelligence ? Artificial intelligence (AI) is the simulation of human intelligence processes by…
How to contribute limited amount of storage as slave to the cluster

2020年10月19日

How to contribute limited amount of storage as slave to the cluster

Task Description:- In a Hadoop cluster, find how to contribute limited/ specific amount of storage as slave to the…

See all articles

Hadoop Cluster Availability

Hritik Kumar

Programmer Analyst at Cognizant

Hritik Kumar的更多文章

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

?? Hadoop Made Easy: Fix Common Errors and Install it Like a Pro!"

9 issues I’ve encountered when setting up a Hadoop/Spark cluster for the first time

How to Connect SQL Server 2019 Dev to Hadoop System 3.1.3

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

Hadoop Cluster Revealed

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-

Hritik Kumar的更多文章

Arth-Task: 22

ARTH TASK 14.3

ARTH Team Task 1

Industry use cases of Neural Networks

Industries are solving challenges using Ansible

How Kubernetes is used in Industries

AWS: Netflix Case Study

Artificial Intelligence in automotive Industry

How to contribute limited amount of storage as slave to the cluster

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

?? Hadoop Made Easy: Fix Common Errors and Install it Like a Pro!"

9 issues I’ve encountered when setting up a Hadoop/Spark cluster for the first time

How to Connect SQL Server 2019 Dev to Hadoop System 3.1.3

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

Hadoop Cluster Revealed

CONFIGURE HADOOP AND START CLUSTER SERVICES USING ANSIBLE PLAYBOOK:-