登录查看更多内容

Let’s research and the world the know about the Myths of Hadoop

Nishant Singh

Software Engineer@HCL Tech | Red Hat Certified System Administrator | AWS Certified Solution Architect-Associate | AWS Certified Developer Associate | AWS Cloud Practitioner Certified

发布日期: 2020年11月12日

A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. Such clusters run Hadoop's open source distributed processing software on low-cost commodity computers.

Task 4.1 :- Individual/Team task:

??In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

Task 4.2 :- Team task:

??According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

???? Research with your teams and conclude this statement with proper proof

Solution:

??In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

First of all we have to setup Hadoop Cluster. Here I am using 1 NameNode and 1 DataNode.

Configure the NameNode:

we need the jdk and hadoop for setup hadoop cluster. I have already transfer both the software in EC2 instance.

Now install the jdk and hadoop software. First I install jdk software because it is necessary to install before hadoop software.

rpm -ivh jdk-8u171-linux-x64.rpm 

java -version      //for checking java is installed

rpm -ivh hadoop-1.2.1-1.x86_64.rpm  --force

hadoop version      //for checking hadoop is installed

Now we update the hdfs-site.xml and core-site.xml files at location /etc/hadoop.

Now we have to create one directory named nn and format it.

mkdir /nn

hadoop namenode -format

Now start the namenode by typing hadoop-daemon.sh start namenode and see the report of DataNode connected to the NameNode by typing hadoop dfsadmin -report.

hadoop-daemon.sh start namenode  //for starting namenode

hadoop dfsadmin -report  //for showing the information about connected datanode

Configure DataNode:

In the DataNode, I attach another 4GiB EBS volume to this instance and I want to share it to the NameNode.

Now type fdisk -l command to see total hard disk attach to the instance.

fdisk -l

Now create partition inside this 4GiB Volume. For creating partition use fdisk command to create partitions on a block device.

fdisk /dev/xvdf   //For creating partition

Now check the total hard disk by typing fdisk -l command.

fdisk -l

Now format this partition:

mkfs.ext4  /dev/xvdf1

Now I created one directory named dn at / location so I mount this directory to the 4GiB partition and share it to the NameNode.

Now come to the DataNode setup, For this I need the jdk and hadoop for setup hadoop cluster. I have already transfer both the software in EC2 instance.

Now install the jdk and hadoop software. First I install jdk software because it is necessary to install before hadoop software.

rpm -ivh jdk-8u171-linux-x64.rpm

java -version      //for checking java is installed

rpm -ivh hadoop-1.2.1-1.x86_64.rpm  --force

hadoop version      //for checking hadoop is installed

Now we update the hdfs-site.xml and core-site.xml files at location /etc/hadoop and share the same directory to the namenode which I mounted above.

Now start the namenode by typing hadoop-daemon.sh start datanode and see the report of DataNode connected to the NameNode by typing hadoop dfsadmin -report.

hadoop-daemon.sh start datanode

hadoop dfsadmin -report

In this way the task Contribute Limited Amount of Storage of Data Node in Hadoop Cluster is successfully completed.

Solution:

??According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling Velocity problem.

According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling the "Velocity" problem. I will prove here, that this is a wrong statement / assumption.

For this I have setup of Hadoop cluster with 1 Name Node, 4 Data Nodes and 1 Client, configure everything and to make it ready for use.

Monitor port 50010 for data packet transfer.

tcpdump -i eth0 port 50010 show the incoming and outgoing traffic at 50010 port.

tcpdump -i eth0 port 50010

Here I setup the complete cluster. I type the tcpdump command in all the DataNode as well as NameNode. Now I have already created a file a.txt (182MiB in size). Now I put this file through client to the Hadoop cluster.

Now you can see in the above image that Data is received by DataNode 2.

Now In the above image, You can differentiate it. As soon as DataNode 2 save the block then next block store in DataNode 1 and after that next block again store in the DataNode 2.

Hence I proved that Hadoop does not use the concept of parallelism to upload the split data while fulfilling Velocity problem.

Thanks for Reading the Article !!!

要查看或添加评论，请登录

查看全部

Let’s research and the world the know about the Myths of Hadoop

Nishant Singh

Software Engineer@HCL Tech | Red Hat Certified System Administrator | AWS Certified Solution Architect-Associate | AWS Certified Developer Associate | AWS Cloud Practitioner Certified

Solution:

??In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

In this way the task Contribute Limited Amount of Storage of Data Node in Hadoop Cluster is successfully completed.

Solution:

更多精彩文章

社区洞察

其他会员也浏览了

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Hadoop File Formats, when and what to use?

Top Hadoop Services in London UK Introduction , Features & Use Cases

Hadoop 2.x

Contribute Limited Amount Of Storage Of DataNode In Hadoop Cluster

A Comprehensive Guide to Hadoop YARN - Yet Another Resource Negotiator.

Hadoop Cluster Revealed

Hadoop: A Powerful Tool for Big Data Management

How "HADOOP" revolutionised Data Processing

Solution:

??In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?

In this way the task Contribute Limited Amount of Storage of Data Node in Hadoop Cluster is successfully completed.

Solution:

CREATE A DYNAMIC ANSIBLE PLAYBOOK FOR DEPLOYING A WEBPAGE IN THE RedHat-8 and Ubuntu-20 OS

2020年12月25日

What is Kubernetes and case study of Kubernetes

2020年12月24日

LAUNCH LOAD BALANCER USING HAPROXY AND CONFIGURE WEB SERVER USING ANSIBLE PLAYBOOK ON REDHAT8

2020年12月10日

LAUNCH LOAD BALANCER USING HAPROXY AND CONFIGURE WEB SERVER USING ANSIBLE PLAYBOOK ON AWS

2020年12月8日

Configure Hadoop and Start Cluster Services Using Ansible Playbook And Restarting HTTPD Service Is Not Idempotence In Nature Using Ansible Playbook

2020年11月29日

Ansible Introduction And Case Study

2020年11月28日

LAUNCH WEB SERVER ON THE TOP OF DOCKER USING ANSIBLE PLAYBOOK

2020年11月25日

How Client put the file, read the file in Hadoop Cluster and How it retrieve data when DataNode is crashed

2020年11月13日

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

2020年11月2日

Configuring the Webserver on the Docker and set the environment for python programs in the Docker container

2020年11月2日

社区洞察

其他会员也浏览了

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Hadoop File Formats, when and what to use?

Top Hadoop Services in London UK Introduction , Features & Use Cases

Hadoop 2.x

Contribute Limited Amount Of Storage Of DataNode In Hadoop Cluster

A Comprehensive Guide to Hadoop YARN - Yet Another Resource Negotiator.

Hadoop Cluster Revealed

Hadoop: A Powerful Tool for Big Data Management

How "HADOOP" revolutionised Data Processing