登录查看更多内容

Replication in Hadoop

Ishan Singhal

-

发布日期: 2020年10月23日

Replication of Data is extremely important in today's world. This ensures Durability of data where you do not have a Single Point of Failure and your setup is fault tolerant.

Replication Factor is the no. of nodes a block of data is copied to.

It is solely upon the Client to decide the replication factor for a file depending on the importance of file. By default its value is 3. Client can decide the RF while uploading the file or by configuring it in hdfs-site.xml

#hadoop fs -Ddfs.replication=4 -put t3.txt /

OR /etc/hadoop/hdfs-site.xml

.
.
.
<configuration>
<property>
<name>dfs.replication</name>
<value>4</value>
</property>
</configuration>
.
.

I had explained earlier how data transfer takes place directly between Client and DataNodes : https://www.dhirubhai.net/pulse/hadoop-breaking-myths-proof-ishan-singhal

How does replication happen ?

Client receives IP of Nodes from NameNode and copies data block to one DataNode. That DataNode then replicates that block to other DataNodes.

Reading a file and what happens if a DataNode shuts down / crashes during a read?

#hadoop fs -cat /t3.txt

We observe through tcpdump 9001 port that Client gets the locations from NameNode.

By observing port 50010 I see that it is getting data from DataNode 3, so I decide to terminate it midway.

Data read stops

After a while I see that data transfer again continues from DataNode 2

Hence we see how replicas help in fault tolerance.

Anurag Sharma

Project Engineer @ Wipro

4 年

Great work bro !!!

要查看或添加评论，请登录

Ishan Singhal的更多文章

systemctl To Run & Stop Containers!

2021年4月1日

systemctl To Run & Stop Containers!

For example we have a dns server container running on port 53. We would normally use these commands: $ podman run --rm…
Routing Table to Control Access to Sites

2020年12月13日

Routing Table to Control Access to Sites

A packet will only be created if the destination server IP is present in the routing table. Here you can see that all…
Setup Webserver with Ansible

2020年12月3日

Setup Webserver with Ansible

Configuration management tools come in handy when configuring multiple servers. Not only for speed but also reliability.
Ansible - configure docker in managed node and deploy a web-server container

2020年11月25日

Ansible - configure docker in managed node and deploy a web-server container

Let us first list down the possible steps we will have to follow: 1 - Configure Docker yum repo 2 - Install Docker 3…
Automating LVM Partition using Python-Script

2020年11月23日

Automating LVM Partition using Python-Script

You can find the script here. LVM AUTOMATION -------------- 1.
Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

2020年11月23日

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

We usually create a folder and set it as datanode directory. But what happens with this is it allows the whole size of…
Linux - Increase or Decrease Static Partitions without losing your data!

2020年11月19日

Linux - Increase or Decrease Static Partitions without losing your data!

We often see cases where we fall short on storage after our initial partition. Or we need to decrease the partition due…
Configuring Apache Webserver and Python Interpreter in a Docker Container

2020年10月31日

Configuring Apache Webserver and Python Interpreter in a Docker Container

Note: This is done manually, just using basic Linux concepts. It can be done more efficiently using DockerFile.
Parallelism in Hadoop Upload- A Myth

2020年10月30日

Parallelism in Hadoop Upload- A Myth

According to popular articles, Hadoop uses the concept of parallelism to upload the split data while fulfilling the…
Website Infrastructure purely using AWS CLI - EC2, S3, CloudFront, GitHub

2020年10月27日

Website Infrastructure purely using AWS CLI - EC2, S3, CloudFront, GitHub

Link to the previous article showing how to get started with AWS CLI - https://www.linkedin.

See all articles

Replication in Hadoop

Ishan Singhal

-

Reading a file and what happens if a DataNode shuts down / crashes during a read?

Hence we see how replicas help in fault tolerance.

Ishan Singhal的更多文章

社区洞察

其他会员也浏览了

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

LVM integration with Hadoop Cluster to provide Elasticity using Redhat 8 Linux

Big Data & Telco

Integrating LVM with Hadoop

Hide or Encrypt the application passwords

Shorticle 982 – Schema evolution, Time travel and Hidden partitioning with Data lake

Integration of LVM with Hadoop-Cluster & providing Elasticity to Datanode Storage

Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

ARTH - TASK 7 ??????

Reading a file and what happens if a DataNode shuts down / crashes during a read?

Hence we see how replicas help in fault tolerance.

Ishan Singhal的更多文章

systemctl To Run & Stop Containers!

Routing Table to Control Access to Sites

Setup Webserver with Ansible

Ansible - configure docker in managed node and deploy a web-server container

Automating LVM Partition using Python-Script

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Linux - Increase or Decrease Static Partitions without losing your data!

Configuring Apache Webserver and Python Interpreter in a Docker Container

Parallelism in Hadoop Upload- A Myth

Website Infrastructure purely using AWS CLI - EC2, S3, CloudFront, GitHub

社区洞察

其他会员也浏览了

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

LVM integration with Hadoop Cluster to provide Elasticity using Redhat 8 Linux

Big Data & Telco

Integrating LVM with Hadoop

Hide or Encrypt the application passwords

Shorticle 982 – Schema evolution, Time travel and Hidden partitioning with Data lake

Integration of LVM with Hadoop-Cluster & providing Elasticity to Datanode Storage

Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

ARTH - TASK 7 ??????