登录查看更多内容

Leveraging The Power of Logical Volume Manager (LVM) In Hadoop-DFS Cluster

Chetan Vyas

MLOps | DevOps | Hybrid MultiCLoud | Ansible | Flutter | RedHat Linux | Openstack

发布日期: 2021年3月15日

In this article, we are practically going to integrate the LOGICAL VOLUME MANAGER (LVM) concepts with Data-Node of Hadoop Distributed File System (HDFS) Cluster to provide the elasticity to the Data-Node Storage so that we can dynamically scale the storage capacity of our Data-Nodes or the storage capacity of the Cluster.

Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. HDFS employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters.

Hadoop itself is an open-source distributed processing framework that manages data processing and storage for big data applications. HDFS is a key part of the many Hadoop ecosystem technologies. It provides a reliable means for managing pools of bigdata and supporting related big data analytics applications.

HDFS architecture, NameNodes and DataNodes

HDFS uses a primary/secondary architecture. The HDFS cluster's NameNode is the primary server that manages the file system namespace and controls client access to files. As the central component of the Hadoop Distributed File System, the NameNode maintains and manages the file system namespace and provides clients with the right access permissions. The system's DataNodes manage the storage that's attached to the nodes they run on.

HDFS exposes a file system namespace and enables user data to be stored in files. A file is split into one or more of the blocks that are stored in a set of DataNodes. The NameNode performs file system namespace operations, including opening, closing and renaming files and directories. The NameNode also governs the mapping of blocks to the DataNodes. The DataNodes serve read and write requests from the clients of the file system. In addition, they perform block creation, deletion and replication when the NameNode instructs them to do so.

HDFS supports a traditional hierarchical file organization. An application or user can create directories and then store files inside these directories. The file system namespace hierarchy is like most other file systems -- a user can create, remove, rename or move files from one directory to another.

The NameNode records any change to the file system namespace or its properties. An application can stipulate the number of replicas of a file that the HDFS should maintain. The NameNode stores the number of copies of a file, called the replication factor of that file.

LOGICAL VOLUME MANAGER (LVM)

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes.

With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

The physical volumes are combined into logical volumes, with the exception of the /boot partition. The /boot partition cannot be on a logical volume group because the boot loader cannot read it. If the root (/) partition is on a logical volume, create a separate /boot partition which is not a part of a volume group.

Since a physical volume cannot span over multiple drives, to span over more than one drive, create one or more physical volumes per drive.

The volume groups can be divided into logical volumes, which are assigned mount points, such as ' /home ' and ' / ' and file system types, such as ext2 or ext3. When "partitions" reach their full capacity, free space from the volume group can be added to the logical volume to increase the size of the partition. When a new hard drive is added to the system, it can be added to the volume group, and partitions that are logical volumes can be increased in size.

On the other hand, if a system is partitioned with the ext3 file system, the hard drive is divided into partitions of defined sizes. If a partition becomes full, it is not easy to expand the size of the partition. Even if the partition is moved to another hard drive, the original hard drive space has to be reallocated as a different partition or not used.

Integrating LVM (LOGICAL VOLUME) with Hadoop DFS to provide Elasticity to DataNode Storage

We can integrate LVM with Hadoop DFS by using Logical Volume (LV) in Data-Nodes of Hadoop Distributed File System (HDFS) cluster so that in the future we increase (extend) the size or Storage capacity of Data-Nodes by attaching more Block Devices and extending Logical Volume (We can also reduce the storage size of Data-Nodes as per our need).

Implementation :

Step 1: Attaching Block Device (Hard Drive) to the Data-Node

Now I have a 3GB Hard Disk ( /dev/sda ) attached to the Data-Node System

Step 2: Creating Physical Volume ( PV )

pvcreate /dev/sda

Created a physical volume (PV) of that 3GB disk (/dev/sda)

Step 3: Creating Volume Group ( VG )

Create the Volume Group (VG) using the previously created physical volume (PV) - /dev/sda

vgcreate HadoopVG /dev/sda

Volume Group 'HadoopVG' of <3 GB size is created

Step 4: Create Logical Volume ( LV )

Create Logical Volume (LV) using the previously creates Volume Group (HadoopVG) of the desired size. I am going to use the complete size of our Volume Group

lvcreate --name DatanodeLV -l 100%FREE HadoopVG

Logical Volume DatanodeLV of <3GB size is created

Step 5: Format LV and Mount to Data-Node Directory

Now to use this LV we have to Format it first then mount it to the Directory that we are going to use as Data-Node directory

Formating LV

mkfs.ext4 /dev/mapper/HadoopVG-DatanodeLV

Mounting LV

mkdir /lvdir

mount /dev/mapper/HadoopVG-DatanodeLV /lvdir

Our Logical Volume is mounted to /lvdir Directory

Step 6: Configuring Data-Node for HDFS Cluster

Now configure the Data-Node and use '/lvdir' as the Data-Node directory and start the DataNode Daemon

Starting the Data-Node Daemon

hadoop-daemon.sh start datanode

Now our Datanode is configured and if we check our HDFS cluster we can see our Datanode with near about < 3 GB storage capacity

Extending the Storage Capacity of Data Node Dynamically while Cluster is UP

We use Logical Volume in our Data Node so we can increase the storage capacity of our data node dynamically by extending VG and LV without unmounting while our HDFS Cluster is up by attaching more Block Devices to the data node

Step 1: Attach more Hard Drive to Data Node

We can directly extend the Logical Volume (LV) but I don't have more space in my Volume Groupe (VG) so I have to first extend my VG and to extend VG we need Physical Volume and for that, I am going to attach one more 2GB Hard Drive to Data Node

Step 2: Create PV so we can extend our VG

To extend our Volume Group we need Physical Volume so I am going to create PV from the recently attached Disk /dev/sdb of 2GB

pvcreate /dev/sdb

Step 3: Extend Volume Group (VG)

Extending our VG -HadoopVG using the PV -/dev/sdb

vgextend HadoopVG /dev/sdb

VG is extended and it have <2GB free space

Step 4: Extend Logical Volume (LV)

we extended our VG and now it contains <2GB of free space (PE) so now finally we can extend our Logical Volume (DatanodeLV) up to that free space

lvextend -l +100%FREE /dev/HadoopVG/DatanodeLV

now we can see our LV extended to 5GB (4.99GB)

Step 5: Update FileSystem Using resize2fs

After extending our LV we need to update the FileSystem using resize2fs

resize2fs /dev/HadoopVG/DatanodeLV

we can see that the Volume of our data node directory (/lvdir) is extended

Checking our Hadoop-DFS Cluster Report

Now finally we can see that the Storage Capacity of our Data-Node extended to ~5GB (4.85GB)

And now from onward using the LVM, we scale our scale the storage capacity of our Data-Nodes or storage capacity of the HDFS Cluster.

要查看或添加评论，请登录

Chetan Vyas的更多文章

How AMAZON Leveraging Power of Artificial Intelligence Machine Learning

2020年10月22日

How AMAZON Leveraging Power of Artificial Intelligence Machine Learning

What is Artificial Intelligence? Artificial Intelligence (AI) is the branch of computer sciences that emphasizes the…
What is Cloud Computing ? Why AWS Is The Leading Cloud Platform? And How UBISOFT Provides Seamless, Scalable Multiplayer Gaming Experience Using AWS ?

2020年9月22日

What is Cloud Computing ? Why AWS Is The Leading Cloud Platform? And How UBISOFT Provides Seamless, Scalable Multiplayer Gaming Experience Using AWS ?

Organizations of every type, size, and industry are using the cloud for a wide variety of use cases, such as data…
Why and How the world is facing problem of Big Data

2020年9月17日

Why and How the world is facing problem of Big Data

What is Big Data ? Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data…
Automating Deployment Using Amazon Elastic Kubernetes Service

2020年7月17日

Automating Deployment Using Amazon Elastic Kubernetes Service

In this article, you will find out how to use EKS to automate Deployment, scaling, and management of containerized…
Automation and Integration using Git + GitHub + Jenkins + Docker

2020年7月14日

Automation and Integration using Git + GitHub + Jenkins + Docker

This article is all about integrating different environments and automate the deployment process. An end to end…

6 条评论

See all articles

Leveraging The Power of Logical Volume Manager (LVM) In Hadoop-DFS Cluster

Chetan Vyas

MLOps | DevOps | Hybrid MultiCLoud | Ansible | Flutter | RedHat Linux | Openstack

Hadoop Distributed File System (HDFS)

HDFS architecture, NameNodes and DataNodes

LOGICAL VOLUME MANAGER (LVM)

Integrating LVM (LOGICAL VOLUME) with Hadoop DFS to provide Elasticity to DataNode Storage

Implementation :

Step 1: Attaching Block Device (Hard Drive) to the Data-Node

Step 2: Creating Physical Volume ( PV )

Step 3: Creating Volume Group ( VG )

Step 4: Create Logical Volume ( LV )

Step 5: Format LV and Mount to Data-Node Directory

Step 6: Configuring Data-Node for HDFS Cluster

Extending the Storage Capacity of Data Node Dynamically while Cluster is UP

Step 1: Attach more Hard Drive to Data Node

Step 2: Create PV so we can extend our VG

Step 3: Extend Volume Group (VG)

Step 4: Extend Logical Volume (LV)

Step 5: Update FileSystem Using resize2fs

Checking our Hadoop-DFS Cluster Report

Chetan Vyas的更多文章

社区洞察

其他会员也浏览了

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

All about BIG data

HADOOP HDFS

Hadoop Cluster Architecture

Taming Bigdata in Nutshell

Spark Performance Local FS VS HDFS: Replication Factor is the key

??Hadoop: how to contribute limited/specific amount of storage as slave to the cluster?

HDFS

Hadoop: A Powerful Tool for Big Data Management

Understanding Hadoop: Powering Big Data Processing and Analytics

Hadoop Distributed File System (HDFS)

HDFS architecture, NameNodes and DataNodes

LOGICAL VOLUME MANAGER (LVM)

Integrating LVM (LOGICAL VOLUME) with Hadoop DFS to provide Elasticity to DataNode Storage

Implementation :

Step 1: Attaching Block Device (Hard Drive) to the Data-Node

Step 2: Creating Physical Volume ( PV )

Step 3: Creating Volume Group ( VG )

Step 4: Create Logical Volume ( LV )

Step 5: Format LV and Mount to Data-Node Directory

Step 6: Configuring Data-Node for HDFS Cluster

Extending the Storage Capacity of Data Node Dynamically while Cluster is UP

Step 1: Attach more Hard Drive to Data Node

Step 2: Create PV so we can extend our VG

Step 3: Extend Volume Group (VG)

Step 4: Extend Logical Volume (LV)

Step 5: Update FileSystem Using resize2fs

Checking our Hadoop-DFS Cluster Report

Chetan Vyas的更多文章

How AMAZON Leveraging Power of Artificial Intelligence Machine Learning

What is Cloud Computing ? Why AWS Is The Leading Cloud Platform? And How UBISOFT Provides Seamless, Scalable Multiplayer Gaming Experience Using AWS ?

Why and How the world is facing problem of Big Data

Automating Deployment Using Amazon Elastic Kubernetes Service

Automation and Integration using Git + GitHub + Jenkins + Docker

社区洞察

其他会员也浏览了

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

All about BIG data

HADOOP HDFS

Hadoop Cluster Architecture

Taming Bigdata in Nutshell

Spark Performance Local FS VS HDFS: Replication Factor is the key

??Hadoop: how to contribute limited/specific amount of storage as slave to the cluster?

HDFS

Hadoop: A Powerful Tool for Big Data Management

Understanding Hadoop: Powering Big Data Processing and Analytics