Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
29 oct 2020

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Introduction

In this article i am going to show how to use LVM(Logical Volume Manager) storage in data node that helps to increases the size of data node on the fly that make our hadoop cluster elastic

In the term of storage elastic means to change the size(increases or decreases) of hard disk without losing our existing data and without making our storage offline(unavailable for user to access it for particular time).

In this whole practical i am going to AWS EC2 instance as a master and virtualbox virtual machine as a slave is simple one master and one slave architecture and last but not least I am using Red Hat Enterprise Linux (RHEL 8) OS for EC2 Instance and for slave Windows 10 as a host OS and RHEL 8 guest OS

In this article I am assuming that every one known how to configure hadoop cluster.

*In this article I am going to:-

  • Create LVM with 1 10GB hard disk inside virtualbox and use it as base directory for data node.
  • Then use one more hard disk of 10GB and extend it to the already created LV.
  • Finally we have one data node that providing nearly 19GB storage to master.

Setting up LV as a Data Node directory:

Step 1:- Attaching Hard Disk to Virtual Box

Now our first task is to create an LVM inside the virtual box for this first we need hard disk but in case of virtual box we can also say this we are going to connect virtual hard disk

  • Go to virtual machine setting
  • Click on storage
  • Click on add hard disk option
No alt text provided for this image
  • Create new hard disk
No alt text provided for this image


  • Then next
  • Then next
  • Give any size in my case I am giving the size of 10GB and click create
No alt text provided for this image
  • Scroll down you can see not attached hard disk select click on hard disk then choose
No alt text provided for this image
  • Do the same steps to create one more hard disk of any in my case i am creating hard disk of size 10GB
No alt text provided for this image
  • Now start the OS now we are ready to create LVM

Step 2:- Creating an LV

Now first open terminal to see whether the new hard disk is successfully attach to the OS or not for this use fdisk -l command

No alt text provided for this image

wo hard disk both of size 10GB name /dev/sdb and /dev/sdc connected. Now we are going to use first /dev/sdb hard to create LVM

  • first we need to create pv(physical volume) use pvcreate /dev/sdb command
No alt text provided for this image
  • now we are going to create vg(volume group) use vgcreate myvg /dev/sdb command (myvg is just a name for user purpose you can give any name its always good practice to give relevant name)
No alt text provided for this image
  • now let fine out how much size we can allocate to our lv(logical volume) use vgdisplay myvg command to show the information about vg
No alt text provided for this image
  • we have slightly less than 10GB available
  • now its time to create lv(logical volume) with the above created vg use lvcreate --size 9G --name slave_lv myvg command
No alt text provided for this image

size:- to define size of lv in my case its 9 GB (G=GB, M=MB, K=KB)

name:- a name user friendly name for lvm

myvg:- name of vg from which lv is going to take their storage from

  • This command it not always important but sometimes when we crate or attach hard disk drive does not load from memory this command helps to load driver from memory udevadm settle
No alt text provided for this image
  • Now lvdisplay /dev/myvg/slave_lv command to display the information about lv that we crated recently
No alt text provided for this image

Let's format hard disk in ext4 (fourth extended filesystem or extension version 4 depended on the article to article)format which standard format in linux use mkfs.ext4 /dev/myvg/slave_lv command.

No alt text provided for this image
  • Now for mount for this we are going to create directory in root folder name it as slave_dir use mkdir /slave_dir command
No alt text provided for this image
  • now we can do mount lv use mount /dev/myvg/slave_vg /slave_dir/ command
No alt text provided for this image
  • to check whether lv is successfully mounted use df -hT command
No alt text provided for this image

Step 3:- Creating Hadoop Slave Node

Now let's to create a slave node now use cd /etc/hadoop/ command to gp hadoop configuration directory

  • use vi hdfs-site.xml command and the following lines
No alt text provided for this image
  • use vi core-site.xml command and the following lines
No alt text provided for this image
  • now start data node use hadoop-daemon.sh start datanode command after this use jps command to check whether data node is running or not
No alt text provided for this image

in my case its running successfully

  • now goto master node and run hadoop dfsadmin -report command to check the storage data node is providing to master
No alt text provided for this image
  • in my case it is providing 8.8GB storage which the same as the size of my lv that we created early

Then extending data node size on the fly.

  • now first let find out the total available space left in vg that we create early use vgdisplay myvg command
No alt text provided for this image
  • we have nearly 1GB free space available in myvg
  • to extend vg first we need new pv for new pv we need new hard disk and we already have /dev/sdc use pvcreate /dev/sdc command
No alt text provided for this image
  • now use vgextend myvg /dev/sdc command
No alt text provided for this image
  • now use vgdisplay myvg command the size available after extending vg
No alt text provided for this image
  • now we have 11GB free
  • use lvextend --size +10G /dev/myvg/slave_lv command to extend the size of lv by 10GB
No alt text provided for this image
  • now df -hT to see the size of our lv now
No alt text provided for this image
  • wait a sec what is the problem why is still showing one 8.8GB as the size of our storage may be command didn’t work let vgdisplay myvg command the size available after extending lv
No alt text provided for this image

*it’s showing there we have nearly 1GB left so what is the problem is this

*we extend the lv but we never format again and user can’t store the data in unformatted hard disk that why df -hT command it showing that our lv was not extended so basically it’s showing only that portion of lv in which user can store their data not the actually size of lv

*can’t we just unmount and format lv again yes we can do this but the problem is this our all data has been lost if we do this so for solving this issue we can use resize2fs command this command check the starting block find their format type and format all the unformatted block to the previous format type

note:- what happen if starting format is let
say ext4 and after it’s formatted in let say fat(file allocation table) it convert full hard disk to ext4 and in this process all the data store in fat is deleted because ext4 was first if hard was first fat and then ext4 then all data is converted to fat
  • now use resize2fs /dev/myvg/slave_lv command
No alt text provided for this image
  • now run df -hT command to see whether is working or not
No alt text provided for this image
  • now its working we have 19GB storage available
  • now go to master and run hadoop dfsadmin -report command to check how much storage data node is providing to master after all this task
No alt text provided for this image
  • finally our data node automatically start providing nearly 19GB

Conclusion

In end we finally achieve elasticity in data node by increasing the capacity of data node without using any command that is related to hadoop we just we to concept of LVM to do this task

  • Thank for everyone to reading my article till end if you have any doubt please comment if you have any suggestion please mail all comment both positive and negative is more than welcome.

If you have any suggestions, please feel free to connect with me and put the suggestions in the comment section.







要查看或添加评论,请登录

Pawar Suvarna的更多文章

社区洞察

其他会员也浏览了