Providing elasticity to DataNode Storage in Hadoop using LVM
Priyanka Bharti
Software Engineer @ Samsung | C++ | Android Development | Kotlin | Linux
Hello folks!!! Back with another article whereby you'll find how to integrate LVM with Hadoop and provide elasticity to DataNode Storage in cluster.
Before we start with the practical, let's explore, what is LVM and why there's need of it in Hadoop cluster.
Logical Volume Management (LVM) as the name suggests, is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.
If a file system needs more space, it can be added to its logical volumes from the free spaces in its volume group and the file system can be re-sized as we wish. If a disk starts to fail, replacement disk can be registered as a physical volume with the volume group and the logical volumes extents can be migrated to the new disk without data loss.
Why do we need elasticity in Hadoop?
Consider a Hadoop cluster having 'n' DataNodes contributing their storage to cluster. Consider the case when the storages of DataNodes get full, in that case, we required to attach more DataNodes leading to more RAM and CPU requirements.....Instead, we can solve issue by utilizing LVM concept to provide elasticity to DataNode storages in the Cluster so that we can increase or decrease the size of partitions as the requirement arises.
So, let's move to the practical part..
I have a already created hdfs cluster running. We can check report of the hdfs cluster using cmd: hadoop dfsadmin -report
Here, the storages attached to datanodes are static i.e., we can't increase or decrease it. So, let's implement logical volume concept to provide dynamic storage to datanodes..
Implementing LVM concept to create elastic volume for DataNodes:-
Let's attach external physical volume ( hard-disk or pen-drive) to each datanode. We can check the list of volume attached to system using 'fdisk -l' command:
Step 1. Creating physical volume from the attached hard disk:
Step 2. Creating Volume group of physical volume from which logical volume gets storage:
Step 3. Creating a logical volume from the above Volume Group:
We can create any number of logical volumes as we want, from the volume group until volume group gets consumed fully.
Step 4. Formatting the logical volume created:
Step 5. Mounting of partition to the Hadoop DataNode directory:
Do a similar setup in the other DataNodes of the cluster. Now, we'll contribute this DataNode storage to the cluster.
Let's restart the hdfs service in DataNodes. Now, we can see that each datanode is contributing approximately 15 GB to the cluster.
Now, if situation arises that, the DataNodes of the cluster get full and we need to increase its storage. We can do so easily by extending the size of logical volume contributed, thus providing elasticity to Hadoop DataNode storage on the fly.
Let's demonstrate how we can do so!
Here, we just increased the size of a datanode storage from 15 GB to 18 GB on the fly.
So, with the use of LVM concept we able to increase the datanode storage dynamically!!
Thanks for reading!
:):)
DevOps Engineer @ Paytm | RHCA Level 1 Certified | Infrastructure Development, Machine Learning and AI
4 年Great Job!
? Software Engineer 2 @ Atlan ? Speaker @ KubeCon + Cloud NativeCon 2024 ? The Linux Foundation (LFX) Intern, 2022 ? Ex-GDSC HIT Core Member
4 年Well Done ?? Keep Progressing ??????