Hadoop Cluster Availability

Hadoop Cluster Availability

What is Hadoop ?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. Hadoop is written in java language and developed by Apache software foundation.

Hadoop Cluster

No alt text provided for this image

In this figure we see simple Hadoop cluster. First we talk about client. Client is someone who upload, delete, read, write any file on this Hadoop cluster.

Namenode:-It is also known as master node. It is main component of hadoop cluster. All data node connected with one master node.

Datanode:- It is also known as slave node. Slave node share their own components with master node.

Availability of system or data in the wake of component failure in the system.

My team members:

  1. Hritik Kumar (Me - Team Leader)
  2. Ekansh (Learner success head)
  3. Satyansh Srivastava (Team member)
  4. Pravat Kumar Nath Sharma (Team member)
  5. Vijay (Team member)

So here we are using AWS EC2 service to launch 6 instances, 1 for master node 1 for client and 4 for slave node. These all instances are of Red Hat Enterprise Linux 8. We install Hadoop 1.2.1 and jdk-8u171 software on top of all 6 instances. We are using putty software for a connection to all instances.

When installation part completed then we configured hdfs-site.xml and core-site.xml files in all instance except client instance. In client node we only configured core-site.xml file. (All these files are under /etc/hadoop folder)

hdfs-site.xml configuration for the NameNode:

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/nn</value>

</property>

</configuration>

core-site.xml configuration for the NameNode:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://0.0.0.0:9001</value>

</property>

</configuration>

hdfs-site.xml configuration for all the DataNodes:

<configuration>

<property>

<name>dfs.data.dir</name>

<value>/dn</value>

</property>

</configuration>

core-site.xml configuration for all the DataNodes and client:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://54.145.79.242:9001</value>

</property>

</configuration>

Here 54.145.79.242 was the public IP of our Namenode at the time of testing.

For testing purpose we upload a text file of 140MB from client side. This file was store in 3 block because default block size of hadoop version 1 is 64MB. So our file size is more than 64MB so this file needs three block.

First tool we use for this task is tcpdump:-

We observed that when we stop one instance, another one instance get connected and provide service. When we stop second instance, third instance get connected and provide service to the client.

In this practical we saw that there is seamless availability because of replication. Replicas of files are stored on different nodes to fulfill availability. Master node handle Datanode failure situation.

I would like to thank all my team members for actively participating and completing this task.


SATYANSH SRIVASTAVA

Data & Product @ Bajaj Finserv | Bridging Data Engineering with Product Strategy for Innovation | Computer Science Engineer

4 年

Happy to help. ??

Pravat Kumar nathsharma

DevOps Engineer at Market Medium

4 年

Keep it up

要查看或添加评论,请登录

Hritik Kumar的更多文章

  • Arth-Task: 22

    Arth-Task: 22

    Task Description?? ??? Research on use-cases of AWS SQS and create a blog, Article elaborating how it works. Hello…

  • ARTH TASK 14.3

    ARTH TASK 14.3

    Task Description?? ?? 14.3 Create an ansible Playbook which will dynamically load the variable file named the same as…

  • ARTH Team Task 1

    ARTH Team Task 1

    Task Description?? Create a menu using Python integrating all the concepts that have been taught by Vimal sir till now.…

  • Industry use cases of Neural Networks

    Industry use cases of Neural Networks

    What are Neural Networks? A neural network is a series of algorithms that endeavors to recognize underlying…

  • Industries are solving challenges using Ansible

    Industries are solving challenges using Ansible

    Task Description Create a Article, blog or Video on how industries are solving challenges using Ansible What is Ansible…

    2 条评论
  • How Kubernetes is used in Industries

    How Kubernetes is used in Industries

    What is Kubernetes ? Kubernetes is an open-source container orchestration platform that enables the operation of an…

  • AWS: Netflix Case Study

    AWS: Netflix Case Study

    We all have a dream to do some type of startup but when it comes to funding, almost all of the new startups have a…

  • Artificial Intelligence in automotive Industry

    Artificial Intelligence in automotive Industry

    What is Artificial Intelligence ? Artificial intelligence (AI) is the simulation of human intelligence processes by…

  • How to contribute limited amount of storage as slave to the cluster

    How to contribute limited amount of storage as slave to the cluster

    Task Description:- In a Hadoop cluster, find how to contribute limited/ specific amount of storage as slave to the…

社区洞察

其他会员也浏览了