登录查看更多内容

Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

Umesh Tyagi

DevOps Engineer | Terraform, Kubernetes, CI/CD, Ansible, Python, Podman, AWS, Github Actions |

发布日期: 2021年3月21日

What is Hadoop?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

What is LVM?

Logical volume management provides a higher-level view of the disk storage on a computer system than the traditional view of disks and partitions. This gives the system administrator much more flexibility in allocating storage to applications and users.

The logical volume manager also allows management of storage volumes in user-defined groups, allowing the system administrator to deal with sensibly named volume groups such as “development” and “sales” rather than physical disk names such as “sda” and “sdb”.

Problem Statement

If we have a Hadoop Distributed Cluster as we know in the Hadoop cluster we have Namenode(Which handles all processes of the cluster), Datanode(It’s responsible for sharing his storage to the Namenode), and Client. In simple terms, DataNodes are used as a storage node. But, the DataNode’s storage is static, and the filesystem of DataNode is not too smart to increase their storage size in case of storage limit exceed.

For this scenario, the term LVM comes to solve this problem.

Let’s implement LVM in Hadoop DataNode to provide Elasticity.

Requirements:

RHEL8
Installed LVM2
Created Hadoop Cluster

Attaching Harddisks with Datanode System

We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.

To perform this practical I am using RHEL8 and attach two more harddisks with this OS. We need to follow the steps given in the below image to attach two more hard disks.

First, We go to Oracle VirtualBox and Select the OS in which we configured Datanode.

Click on the Datanode
Select the settings option.
After that select storage.
At last, attach two hard disks.

Till now we have attached two harddisks with our Operating System. To check available harddisks in the Operating System we can run fdisk -l command.

Process of setup Logical Volume Manager

Let’s understand some commands that will use to setup Logical Volume Manager.

pvcreate: It converts disk into a physical volume known as PV.

vgcreate: it creates the VG(Volume Group) for all physical volumes(PVs).

vgdisplay — List all the VG

lvcreate — It creates Logical Volume from Volume Group.

lvdisplay — To list LV

lvextend — Extend LV size

Converting Disk into Physical Volume

First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.

In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.

Then, we need to convert them into physical volume using “pvcreate” command.

In this /dev/sdb and /dev/sdc both are disk name. Both are 10GB in size.

Now, run the following command to check or confirm whether physical volume(s) are created or not.

Creating Volume Group of Physical Volumes

To create a volume group of physical volumes, we need to run the following command

In this command, the “taskseven” is the desired volume group name, and /dev/sdb/ /dev/sdc is the physical volumes.

We can also verify whether it is created or not by running the following command

T+

hat’s all! Now the volume group “taskseven” has a size of 19.99 GiB.

Creating Logical Volume from Volume Group

Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command

In this command, I’ve used a size of 10 GB, and the desired name for the logical volume is “taskksevenlv1” and “taskseven” as a volume group from where logical volume will be created.

We can verify it by running the following command

This command will show all the information about created and available Logical Volumes

Formatting the Logical Volume

To format the logical volume, we need to run the following command

When you see it, that means the Logical Volume is formatted successfully

Mounting the Logical Volume to the Hadoop DataNode Directory

This is a very important section, Before doing any operation, this information should be remembered that “all the volumes and disk(s) have their own path or directory, it’s like a folder and The Hadoop DataNode also uses a folder as Hadoop File System”. The conclusion is if we mount Logical Volume to the Hadoop DataNode Directory in our case the directory name (/dn1).

Before mount we’ll check our Namenode status to check this we run the following command and check Namenode service start or not?.

Oops,?? It is not started…

For the start name node, we run the following command

Now we check again Namenode services started or not?

Yeah,?? Now Namenode service started successfully.

After Successfully started Namenode, We will check Datanode services with the jps command it is started or not?

Oops??, it is also stopped.

For starting Datanode services we will run the following commands.

Now we check again datanode services with jps command.

Yeah,?? Now we can see datanode service start successfully.

Now we check how many Datanode connected with the Namenode with the following command.

Also, we’ll check how much storage before mounting the LV

Amazing, Till now we have to connect a data node with approx 47 GB shared size.

Now we mount the Logical Volume with the Hadoop Namenode Directory.

To verify whether it is mounted successfully or not, we need to run this command

Great! The device “/dev/mapper/taskseven-taskksevenlv1” is successfully mounted to the Hadoop DataNode folder “/dn1”.

We can check the report of Hadoop in which we can verifying how many size sharing and how many Datanode connected to the Namenode.

Providing Elasticity to Hadoop DataNode using LVM “on the fly”

When we exceed the limit of Datanode of 10 GB which I shared till now after implementing LVM we have one LV with a size of 20 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.

To do this, we need to first run the following command

This will extend the “taskksevenlv1” logical volume from 10 GB to 15GB in size by adding unallocated or remaining 5GB from “taskseven” volume group.

Now we format extended size.

This command will automatically resize or merge the unallocated space by recreating the inode table.

Great! Now we can again verify it by using the command as follows

Finally, it is now 15GB from 10 GB on the fly without stopping Hadoop or any other service or without formatting the partition.

Thank You:)

Keep Learning

Keep safe

要查看或添加评论，请登录

Umesh Tyagi的更多文章

Zone, Region, and Multi-Region in GCP

2023年4月19日

Zone, Region, and Multi-Region in GCP

In the context of business, determining the optimal deployment location for an application is crucial for achieving…

1 条评论
Difference between the IaaS, PaaS and, SaaS?

2023年4月8日

Difference between the IaaS, PaaS and, SaaS?

Before going in-depth of any public cloud such AWS, GCP, and Azure, first get the basic knowledge of the cloud…
NEURAL NETWORKS: HOW IT WORKS AND ITS INDUSTRY USE CASES

2021年5月24日

NEURAL NETWORKS: HOW IT WORKS AND ITS INDUSTRY USE CASES

This article looks at the necessity of artificial intelligence and specifically neural systems in today's competitive…

2 条评论
MongoDB: INDUSTRY USE CASES

2021年5月13日

MongoDB: INDUSTRY USE CASES

In this article, We’ll cover what is MongoDB, and why it is so popular in the NoSQL database, How the company…
GUI APP IN DOCKER CONTAINER — FIREFOX

2021年3月22日

GUI APP IN DOCKER CONTAINER — FIREFOX

What is Docker? Docker is a software platform for building applications based on containers — small and lightweight…

3 条评论
Configured of “httpd” Apache web server in Docker Container Using Ansible Playbook.

2021年3月22日

Configured of “httpd” Apache web server in Docker Container Using Ansible Playbook.

Hello Everyone, In this article, we have successfully configured HTTPd apache web server with the combination of two…
Create and Publish Helm Chart for Jenkins

2021年3月10日

Create and Publish Helm Chart for Jenkins

In this article, we are going to discuss the Kubernetes package manager helm and how to create a helm chart for setup…

3 条评论
DevOps: "Creating Ansible Role for configuring Apache Web-Server and Load Balancer"

2020年12月31日

DevOps: "Creating Ansible Role for configuring Apache Web-Server and Load Balancer"

As we know, Ansible is a very easy IT automation tool that is used for Configuration Management, Application…
Pearson's Kubernetes Story

2020年12月26日

Pearson's Kubernetes Story

What is Kubernetes? Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and…
Set up a Network that gives the power to ping Google but can't able to ping Facebook.

2020年12月23日

Set up a Network that gives the power to ping Google but can't able to ping Facebook.

Hello Everyone In this article, I'm gonna show you an interesting networking demo in which we can ping Google but can't…

See all articles

Integrate LVM with Hadoop and providing Elasticity to DataNode Storage

Umesh Tyagi

DevOps Engineer | Terraform, Kubernetes, CI/CD, Ansible, Python, Podman, AWS, Github Actions |

What is Hadoop?

What is LVM?

Problem Statement

Attaching Harddisks with Datanode System

Process of setup Logical Volume Manager

Converting Disk into Physical Volume

Creating Volume Group of Physical Volumes

Creating Logical Volume from Volume Group

Formatting the Logical Volume

Mounting the Logical Volume to the Hadoop DataNode Directory

Providing Elasticity to Hadoop DataNode using LVM “on the fly”

Umesh Tyagi的更多文章

社区洞察

其他会员也浏览了

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Hadoop Ecosystem and Their Components

Understanding Hadoop: The Backbone of Big Data Processing

Navigating the Hadoop Ecosystem: A Hands-On Guide

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

A Comprehensive Overview of Hadoop

The History of Hadoop and Big Data

Hadoop: A Powerful Tool for Big Data Management

Hadoop as a Beginner

What is Hadoop?

What is LVM?

Problem Statement

Attaching Harddisks with Datanode System

Process of setup Logical Volume Manager

Converting Disk into Physical Volume

Creating Volume Group of Physical Volumes

Creating Logical Volume from Volume Group

Formatting the Logical Volume

Mounting the Logical Volume to the Hadoop DataNode Directory

Providing Elasticity to Hadoop DataNode using LVM “on the fly”

Umesh Tyagi的更多文章

Zone, Region, and Multi-Region in GCP

Difference between the IaaS, PaaS and, SaaS?

NEURAL NETWORKS: HOW IT WORKS AND ITS INDUSTRY USE CASES

MongoDB: INDUSTRY USE CASES

GUI APP IN DOCKER CONTAINER — FIREFOX

Configured of “httpd” Apache web server in Docker Container Using Ansible Playbook.

Create and Publish Helm Chart for Jenkins

DevOps: "Creating Ansible Role for configuring Apache Web-Server and Load Balancer"

Pearson's Kubernetes Story

Set up a Network that gives the power to ping Google but can't able to ping Facebook.

社区洞察

其他会员也浏览了

HADOOP: "How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?"

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Hadoop Ecosystem and Their Components

Understanding Hadoop: The Backbone of Big Data Processing

Navigating the Hadoop Ecosystem: A Hands-On Guide

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

A Comprehensive Overview of Hadoop

The History of Hadoop and Big Data

Hadoop: A Powerful Tool for Big Data Management

Hadoop as a Beginner