ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How to provide elasticity to the storage of a Data Node in HDFS cluster ?

Krushna Prasad Sahoo

â˜… O???S???? E??????? â˜… R?? H?? C???????? S???????s? I? O???S???? A???? â˜… Ass??. C???? E??????? (GCP) â˜… M????s??? C???????? A???? A???? â˜…

å‘å¸ƒæ—¥æœŸ: 2020å¹´11æœˆ4æ—¥

Have you thought if in a Hadoop Distributed Storage Cluster the storage of Data Nodes gets exhausted or completely utilized then what to do next ?????

How to solve this challenge ?

We may add some more Data Nodes into the cluster and get more storage right ? Well , this is pretty straight forward but it has also some disadvantages. The cluster gradually will become larger in size and it'll be very harder to manage. Also the cluster will consume more and more resource & as well as high power consumption also. The older Data Nodes will have no use in future with respect to store data even. So what to do ??
Confused?? !! Let me tell you the solution . In Linux we have a concept known as Logical Volume Management . You can think in this way, 2 physical HDs or multiple physical storage devices will plug together and contribute their storage logically . This logical storage or volume will work as a single storage device . Can you imagine what we can achieve with this ?

My HDFS cluster is having Data Nodes of 20GB in size and running smoothly . Suddenly due to some use-case I need 15 more GB in my Data Node. By this LVM concept on the fly I can increase the storage of Data Node . Isn't it so cool ?? !!!!

Now let's understand it step by step .

I'm assuming your Name Node and Data Node are already configured. So let's start them .

#  hadoop-daemon.sh start namenode       // to start name node

#  hadoop-daemon.sh start datanode       // to start data node

Now if all of your configuration is correct and your cluster is up then we can see the cluster report from any of the node .

#  hadoop dfsadmin -report

Now you have requirement come up for increasing storage. So added 2 HDs (/dev/sdb & /dev/sdc) of size 30GB each into the Data node . Here we'll first create Physical Volume. Using the physical volumes we'll be creating the Volume Group , then we'll create Logical Volume from it. Finally to use it we'll have to format & mount . Later we'll see how to increase the volume on the fly. Let's get into code .

#  pvcreate  /dev/sdb        

#  pvcreate  /dev/sdc

Physical Volume Created .

#  vgcreate myhadoopvg  /dev/sdb  /dev/sdc

Volume Group Created named as "myhadoopvg" .

#  lvcreate --size 40G  --name myhadooplv  myhadoopvg

Logical Volume created named as "myhadooplv" which is 40GB in size .

#  mkfs.ext4  /dev/myhaoopvg/myhadooplv

Logical Volume formatted (in ext4 format type).

#  mount  /dev/myhadoopvg/myhadooplv   /dnode

Logical Volume mounted to the mount point of Data Node i.e "/dnode" .

Now the total size of Volume Group is 30 + 30 = 60GB . But we have allocated only 40GB to the Logical Volume. Assume we have to increase the capacity of the Data Node by 10GB. Let's add some more to the volume.

#  lvextend  --size +12G  /dev/myhadoopvg/myhadooplv

#  resize2fs  /dev/myhadoopvg/myhadooplv

The very first command will extend the Logical Volume size by 12GB on the fly. And the next command will resize the file system of Data Node .

So finally we solved the challenge as well as achieved elasticity HDFS cluster with respect to storage ???? .

At the end I want to give one fantastic information that this LVM topic belongs to Linux OS. It has certain steps to finally create and use the Logical Volume. And I have the automated this complete task using Python Scripting . You can visit the below YouTube link to check it.

Hope this will help you .

Thank You So Much Guys !!!!!!!!!

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Krushna Prasad Sahooçš„æ›´å¤šæ–‡ç«

Mapper & Reducer Program using Aggregation Framework of MongoDB

2021å¹´9æœˆ5æ—¥

Mapper & Reducer Program using Aggregation Framework of MongoDB

Hello everyone, hope all are doing good. In this article I will be explaining a bit about MongoDB & its Aggregationâ€¦
Masterclass on Git & GitHub by REGex

2021å¹´6æœˆ9æ—¥

Masterclass on Git & GitHub by REGex

Krushna Prasad Sahoo KIIT University, Bhubaneswar Hello Peoples !! I recently attended a 1.5 hours Masterclass on Git &â€¦
Terraform : The Infrastructure As Code

2021å¹´5æœˆ4æ—¥

Terraform : The Infrastructure As Code

Hey Guys ???? !! Hope you are doing good, let's start our TERRAFORM journey together ..
?Ð£????OO? ?xÏlai?e? i? 7 mi?utes ..

2021å¹´4æœˆ29æ—¥

?Ð£????OO? ?xÏlai?e? i? 7 mi?utes ..

Hey Guys ???? !! How are you doing, I recently gone through one of the fantastic technology i.e Hyperloop.

3 æ¡è¯„è®º
Explore BASH Shell just in 5 minutes ..

2021å¹´4æœˆ27æ—¥

Explore BASH Shell just in 5 minutes ..

Hey Guys ???? !! Just recently I attended a fabulous workshop on ?BASH Shell Scripting? arranged by ??LinuxWorld teamâ€¦
Ansible ROLE Explained With a Real Use Case

2021å¹´3æœˆ29æ—¥

Ansible ROLE Explained With a Real Use Case

What is Ansible ? Ansible is a radically simple IT automation engine that automates cloud provisioning, configurationâ€¦
Let's Explore AWS Simple Queue Service .

2021å¹´3æœˆ15æ—¥

Let's Explore AWS Simple Queue Service .

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scaleâ€¦
RedHat OpenShift Success Stories??

2021å¹´3æœˆ13æ—¥

RedHat OpenShift Success Stories??

Recently I was reading about one of the fantastic product from RedHat i.e OpenShift Container Platform.
What is Jenkins & how it helped Autodesk to take Cloud Initiative & Release Faster with CI/CD pipeline ??

2021å¹´3æœˆ12æ—¥

What is Jenkins & how it helped Autodesk to take Cloud Initiative & Release Faster with CI/CD pipeline ??

What is Jenkins ? Jenkins is a self-contained, open source automation server which can be used to automate all sorts ofâ€¦
What is Azure Kubernetes Service & how it helped Bosch to solve the Wrong-Way Challenge !

2021å¹´3æœˆ4æ—¥

What is Azure Kubernetes Service & how it helped Bosch to solve the Wrong-Way Challenge !

What is Azure Kubernetes Service ? Azure Kubernetes Service (AKS) offers serverless Kubernetes, an integratedâ€¦

1 æ¡è¯„è®º

See all articles

How to provide elasticity to the storage of a Data Node in HDFS cluster ?

Krushna Prasad Sahoo

â˜… O???S???? E??????? â˜… R?? H?? C???????? S???????s? I? O???S???? A???? â˜… Ass??. C???? E??????? (GCP) â˜… M????s??? C???????? A???? A???? â˜…

How to solve this challenge ?

Krushna Prasad Sahooçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

LVM integration with Hadoop Cluster to provide Elasticity using Redhat 8 Linux

Integrating LVM with Hadoop

Task 7 Description ?? ?? 7.1: Elasticity Task a.)Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

ORC + ADLS = Polybase

Integration of LVM with Hadoop-Cluster & providing Elasticity to Datanode Storage

Shorticle 982 â€“ Schema evolution, Time travel and Hidden partitioning with Data lake

Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

ARTH - TASK 7 ??????

How Datanode can contribute specific size of storage to namenode and a brief discussion on a myth that client upload data to datanode in parallel.

How to solve this challenge ?

Krushna Prasad Sahooçš„æ›´å¤šæ–‡ç«

Mapper & Reducer Program using Aggregation Framework of MongoDB

Masterclass on Git & GitHub by REGex

Terraform : The Infrastructure As Code

?Ð£????OO? ?xÏlai?e? i? 7 mi?utes ..

Explore BASH Shell just in 5 minutes ..

Ansible ROLE Explained With a Real Use Case

Let's Explore AWS Simple Queue Service .

RedHat OpenShift Success Stories??

What is Jenkins & how it helped Autodesk to take Cloud Initiative & Release Faster with CI/CD pipeline ??

What is Azure Kubernetes Service & how it helped Bosch to solve the Wrong-Way Challenge !

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

LVM integration with Hadoop Cluster to provide Elasticity using Redhat 8 Linux

Integrating LVM with Hadoop

Task 7 Description ?? ?? 7.1: Elasticity Task a.)Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

ORC + ADLS = Polybase

Integration of LVM with Hadoop-Cluster & providing Elasticity to Datanode Storage

Shorticle 982 â€“ Schema evolution, Time travel and Hidden partitioning with Data lake

Integrating LVM with Hadoop and providing Elasticity to Data Node Storage

ARTH - TASK 7 ??????

How Datanode can contribute specific size of storage to namenode and a brief discussion on a myth that client upload data to datanode in parallel.

?Ð£????OO? ?xÏlai?e? i? 7 mi?utes ..

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†