Integrate LVM with Hadoop and providing Elasticity to DataNode Storage
Umesh Tyagi
DevOps Engineer | Terraform, Kubernetes, CI/CD, Ansible, Python, Podman, AWS, Github Actions |
What is Hadoop?
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.
What is LVM?
Logical volume management provides a higher-level view of the disk storage on a computer system than the traditional view of disks and partitions. This gives the system administrator much more flexibility in allocating storage to applications and users.
The logical volume manager also allows management of storage volumes in user-defined groups, allowing the system administrator to deal with sensibly named volume groups such as “development” and “sales” rather than physical disk names such as “sda” and “sdb”.
Problem Statement
If we have a Hadoop Distributed Cluster as we know in the Hadoop cluster we have Namenode(Which handles all processes of the cluster), Datanode(It’s responsible for sharing his storage to the Namenode), and Client. In simple terms, DataNodes are used as a storage node. But, the DataNode’s storage is static, and the filesystem of DataNode is not too smart to increase their storage size in case of storage limit exceed.
For this scenario, the term LVM comes to solve this problem.
Let’s implement LVM in Hadoop DataNode to provide Elasticity.
Requirements:
- RHEL8
- Installed LVM2
- Created Hadoop Cluster
Attaching Harddisks with Datanode System
We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.
To perform this practical I am using RHEL8 and attach two more harddisks with this OS. We need to follow the steps given in the below image to attach two more hard disks.
First, We go to Oracle VirtualBox and Select the OS in which we configured Datanode.
- Click on the Datanode
- Select the settings option.
- After that select storage.
- At last, attach two hard disks.
Till now we have attached two harddisks with our Operating System. To check available harddisks in the Operating System we can run fdisk -l command.
Process of setup Logical Volume Manager
Let’s understand some commands that will use to setup Logical Volume Manager.
pvcreate: It converts disk into a physical volume known as PV.
vgcreate: it creates the VG(Volume Group) for all physical volumes(PVs).
vgdisplay — List all the VG
lvcreate — It creates Logical Volume from Volume Group.
lvdisplay — To list LV
lvextend — Extend LV size
Converting Disk into Physical Volume
First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.
In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.
Then, we need to convert them into physical volume using “pvcreate” command.
In this /dev/sdb and /dev/sdc both are disk name. Both are 10GB in size.
Now, run the following command to check or confirm whether physical volume(s) are created or not.
Creating Volume Group of Physical Volumes
To create a volume group of physical volumes, we need to run the following command
In this command, the “taskseven” is the desired volume group name, and /dev/sdb/ /dev/sdc is the physical volumes.
We can also verify whether it is created or not by running the following command
T+
hat’s all! Now the volume group “taskseven” has a size of 19.99 GiB.
Creating Logical Volume from Volume Group
Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command
In this command, I’ve used a size of 10 GB, and the desired name for the logical volume is “taskksevenlv1” and “taskseven” as a volume group from where logical volume will be created.
We can verify it by running the following command
This command will show all the information about created and available Logical Volumes
Formatting the Logical Volume
To format the logical volume, we need to run the following command
When you see it, that means the Logical Volume is formatted successfully
Mounting the Logical Volume to the Hadoop DataNode Directory
This is a very important section, Before doing any operation, this information should be remembered that “all the volumes and disk(s) have their own path or directory, it’s like a folder and The Hadoop DataNode also uses a folder as Hadoop File System”. The conclusion is if we mount Logical Volume to the Hadoop DataNode Directory in our case the directory name (/dn1).
Before mount we’ll check our Namenode status to check this we run the following command and check Namenode service start or not?.
Oops,?? It is not started…
For the start name node, we run the following command
Now we check again Namenode services started or not?
Yeah,?? Now Namenode service started successfully.
After Successfully started Namenode, We will check Datanode services with the jps command it is started or not?
Oops??, it is also stopped.
For starting Datanode services we will run the following commands.
Now we check again datanode services with jps command.
Yeah,?? Now we can see datanode service start successfully.
Now we check how many Datanode connected with the Namenode with the following command.
Also, we’ll check how much storage before mounting the LV
Amazing, Till now we have to connect a data node with approx 47 GB shared size.
Now we mount the Logical Volume with the Hadoop Namenode Directory.
To verify whether it is mounted successfully or not, we need to run this command
Great! The device “/dev/mapper/taskseven-taskksevenlv1” is successfully mounted to the Hadoop DataNode folder “/dn1”.
We can check the report of Hadoop in which we can verifying how many size sharing and how many Datanode connected to the Namenode.
Providing Elasticity to Hadoop DataNode using LVM “on the fly”
When we exceed the limit of Datanode of 10 GB which I shared till now after implementing LVM we have one LV with a size of 20 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.
To do this, we need to first run the following command
This will extend the “taskksevenlv1” logical volume from 10 GB to 15GB in size by adding unallocated or remaining 5GB from “taskseven” volume group.
Now we format extended size.
This command will automatically resize or merge the unallocated space by recreating the inode table.
Great! Now we can again verify it by using the command as follows
Finally, it is now 15GB from 10 GB on the fly without stopping Hadoop or any other service or without formatting the partition.
Thank You:)
Keep Learning
Keep safe