How to provide elasticity to the storage of a Data Node in HDFS cluster ?
Krushna Prasad Sahoo

How to provide elasticity to the storage of a Data Node in HDFS cluster ?

Have you thought if in a Hadoop Distributed Storage Cluster the storage of Data Nodes gets exhausted or completely utilized then what to do next ?????

How to solve this challenge ?

  • We may add some more Data Nodes into the cluster and get more storage right ? Well , this is pretty straight forward but it has also some disadvantages. The cluster gradually will become larger in size and it'll be very harder to manage. Also the cluster will consume more and more resource & as well as high power consumption also. The older Data Nodes will have no use in future with respect to store data even. So what to do ??
  • Confused?? !! Let me tell you the solution . In Linux we have a concept known as Logical Volume Management . You can think in this way, 2 physical HDs or multiple physical storage devices will plug together and contribute their storage logically . This logical storage or volume will work as a single storage device . Can you imagine what we can achieve with this ?
My HDFS cluster is having Data Nodes of 20GB in size and running smoothly . Suddenly due to some use-case I need 15 more GB in my Data Node. By this LVM concept on the fly I can increase the storage of Data Node . Isn't it so cool ?? !!!!

Now let's understand it step by step .

  • I'm assuming your Name Node and Data Node are already configured. So let's start them .
#  hadoop-daemon.sh start namenode       // to start name node

#  hadoop-daemon.sh start datanode       // to start data node

  • Now if all of your configuration is correct and your cluster is up then we can see the cluster report from any of the node .
#  hadoop dfsadmin -report

  • Now you have requirement come up for increasing storage. So added 2 HDs (/dev/sdb & /dev/sdc) of size 30GB each into the Data node . Here we'll first create Physical Volume. Using the physical volumes we'll be creating the Volume Group , then we'll create Logical Volume from it. Finally to use it we'll have to format & mount . Later we'll see how to increase the volume on the fly. Let's get into code .
#  pvcreate  /dev/sdb        

#  pvcreate  /dev/sdc        

Physical Volume Created .

#  vgcreate myhadoopvg  /dev/sdb  /dev/sdc

Volume Group Created named as "myhadoopvg" .

#  lvcreate --size 40G  --name myhadooplv  myhadoopvg

Logical Volume created named as "myhadooplv" which is 40GB in size .

#  mkfs.ext4  /dev/myhaoopvg/myhadooplv

Logical Volume formatted (in ext4 format type).

#  mount  /dev/myhadoopvg/myhadooplv   /dnode

Logical Volume mounted to the mount point of Data Node i.e "/dnode" .

  • Now the total size of Volume Group is 30 + 30 = 60GB . But we have allocated only 40GB to the Logical Volume. Assume we have to increase the capacity of the Data Node by 10GB. Let's add some more to the volume.
#  lvextend  --size +12G  /dev/myhadoopvg/myhadooplv

#  resize2fs  /dev/myhadoopvg/myhadooplv

  • The very first command will extend the Logical Volume size by 12GB on the fly. And the next command will resize the file system of Data Node .
So finally we solved the challenge as well as achieved elasticity HDFS cluster with respect to storage ???? .

At the end I want to give one fantastic information that this LVM topic belongs to Linux OS. It has certain steps to finally create and use the Logical Volume. And I have the automated this complete task using Python Scripting . You can visit the below YouTube link to check it.

Hope this will help you .
Thank You So Much Guys !!!!!!!!!







要查看或添加评论,请登录

Krushna Prasad Sahoo的更多文章

社区洞察

其他会员也浏览了