Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

What is Hadoop?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

What is LVM?

Logical volume management provides a higher-level view of the disk storage on a computer system than the traditional view of disks and partitions. This gives the system administrator much more flexibility in allocating storage to applications and users.

The logical volume manager also allows management of storage volumes in user-defined groups, allowing the system administrator to deal with sensibly named volume groups such as “development” and “sales” rather than physical disk names such as “sda” and “sdb”.

Problem Statement

If we have a Hadoop Distributed Cluster as we know in the Hadoop cluster we have Namenode(Which handles all processes of the cluster), Datanode(It’s responsible for sharing his storage to the Namenode), and Client. In simple terms, DataNodes are used as a storage node. But, the DataNode’s storage is static, and the filesystem of DataNode is not too smart to increase their storage size in case of storage limit exceed.

For this scenario, the term LVM comes to solve this problem.

Let’s implement LVM in Hadoop DataNode to provide Elasticity.

Requirements:

  1. RHEL8
  2. Installed LVM2
  3. Created Hadoop Cluster

Attaching Harddisks with Datanode System

We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.

To perform this practical I am using RHEL8 and attach two more harddisks with this OS. We need to follow the steps given in the below image to attach two more hard disks.

First, We go to Oracle VirtualBox and Select the OS in which we configured Datanode.

  1. Click on the Datanode
  2. Select the settings option.
  3. After that select storage.
  4. At last, attach two hard disks.
No alt text provided for this image

Till now we have attached two harddisks with our Operating System. To check available harddisks in the Operating System we can run fdisk -l command.

No alt text provided for this image

Process of setup Logical Volume Manager

Let’s understand some commands that will use to setup Logical Volume Manager.

pvcreate: It converts disk into a physical volume known as PV.

vgcreate: it creates the VG(Volume Group) for all physical volumes(PVs).

vgdisplay — List all the VG

lvcreate — It creates Logical Volume from Volume Group.

lvdisplay — To list LV

lvextend — Extend LV size

Converting Disk into Physical Volume

First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.

No alt text provided for this image

In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.

Then, we need to convert them into physical volume using “pvcreate” command.

No alt text provided for this image

In this /dev/sdb and /dev/sdc both are disk name. Both are 10GB in size.

Now, run the following command to check or confirm whether physical volume(s) are created or not.

No alt text provided for this image

Creating Volume Group of Physical Volumes

To create a volume group of physical volumes, we need to run the following command

No alt text provided for this image

In this command, the “taskseven” is the desired volume group name, and /dev/sdb/ /dev/sdc is the physical volumes.

We can also verify whether it is created or not by running the following command

No alt text provided for this image

T+

hat’s all! Now the volume group “taskseven” has a size of 19.99 GiB.

Creating Logical Volume from Volume Group

Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command

No alt text provided for this image

In this command, I’ve used a size of 10 GB, and the desired name for the logical volume is “taskksevenlv1” and “taskseven” as a volume group from where logical volume will be created.

We can verify it by running the following command

No alt text provided for this image

This command will show all the information about created and available Logical Volumes

Formatting the Logical Volume

To format the logical volume, we need to run the following command

No alt text provided for this image

When you see it, that means the Logical Volume is formatted successfully

Mounting the Logical Volume to the Hadoop DataNode Directory

This is a very important section, Before doing any operation, this information should be remembered that “all the volumes and disk(s) have their own path or directory, it’s like a folder and The Hadoop DataNode also uses a folder as Hadoop File System”. The conclusion is if we mount Logical Volume to the Hadoop DataNode Directory in our case the directory name (/dn1).

Before mount we’ll check our Namenode status to check this we run the following command and check Namenode service start or not?.

No alt text provided for this image

Oops,?? It is not started…

For the start name node, we run the following command

No alt text provided for this image

Now we check again Namenode services started or not?

No alt text provided for this image

Yeah,?? Now Namenode service started successfully.

After Successfully started Namenode, We will check Datanode services with the jps command it is started or not?

No alt text provided for this image

Oops??, it is also stopped.

For starting Datanode services we will run the following commands.

No alt text provided for this image

Now we check again datanode services with jps command.

No alt text provided for this image

Yeah,?? Now we can see datanode service start successfully.

Now we check how many Datanode connected with the Namenode with the following command.

Also, we’ll check how much storage before mounting the LV

No alt text provided for this image

Amazing, Till now we have to connect a data node with approx 47 GB shared size.

Now we mount the Logical Volume with the Hadoop Namenode Directory.

No alt text provided for this image

To verify whether it is mounted successfully or not, we need to run this command

No alt text provided for this image

Great! The device “/dev/mapper/taskseven-taskksevenlv1” is successfully mounted to the Hadoop DataNode folder “/dn1”.

We can check the report of Hadoop in which we can verifying how many size sharing and how many Datanode connected to the Namenode.

Providing Elasticity to Hadoop DataNode using LVM “on the fly”

When we exceed the limit of Datanode of 10 GB which I shared till now after implementing LVM we have one LV with a size of 20 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.

To do this, we need to first run the following command

No alt text provided for this image

This will extend the “taskksevenlv1” logical volume from 10 GB to 15GB in size by adding unallocated or remaining 5GB from “taskseven” volume group.

No alt text provided for this image

Now we format extended size.

No alt text provided for this image

This command will automatically resize or merge the unallocated space by recreating the inode table.

Great! Now we can again verify it by using the command as follows

No alt text provided for this image

Finally, it is now 15GB from 10 GB on the fly without stopping Hadoop or any other service or without formatting the partition.

Thank You:)

Keep Learning

Keep safe

要查看或添加评论,请登录

Umesh Tyagi的更多文章

社区洞察

其他会员也浏览了