登录查看更多内容

Providing elasticity to DataNode Storage in Hadoop using LVM

Priyanka Bharti

Software Engineer @ Samsung | C++ | Android Development | Kotlin | Linux

发布日期: 2021年3月14日

Hello folks!!! Back with another article whereby you'll find how to integrate LVM with Hadoop and provide elasticity to DataNode Storage in cluster.

Before we start with the practical, let's explore, what is LVM and why there's need of it in Hadoop cluster.

Logical Volume Management (LVM) as the name suggests, is a tool for logical volume management which includes allocating disks, striping, mirroring and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

If a file system needs more space, it can be added to its logical volumes from the free spaces in its volume group and the file system can be re-sized as we wish. If a disk starts to fail, replacement disk can be registered as a physical volume with the volume group and the logical volumes extents can be migrated to the new disk without data loss.

Why do we need elasticity in Hadoop?

Consider a Hadoop cluster having 'n' DataNodes contributing their storage to cluster. Consider the case when the storages of DataNodes get full, in that case, we required to attach more DataNodes leading to more RAM and CPU requirements.....Instead, we can solve issue by utilizing LVM concept to provide elasticity to DataNode storages in the Cluster so that we can increase or decrease the size of partitions as the requirement arises.

So, let's move to the practical part..

I have a already created hdfs cluster running. We can check report of the hdfs cluster using cmd: hadoop dfsadmin -report

Here, the storages attached to datanodes are static i.e., we can't increase or decrease it. So, let's implement logical volume concept to provide dynamic storage to datanodes..

Implementing LVM concept to create elastic volume for DataNodes:-

Let's attach external physical volume ( hard-disk or pen-drive) to each datanode. We can check the list of volume attached to system using 'fdisk -l' command:

Step 1. Creating physical volume from the attached hard disk:

Step 2. Creating Volume group of physical volume from which logical volume gets storage:

Step 3. Creating a logical volume from the above Volume Group:

We can create any number of logical volumes as we want, from the volume group until volume group gets consumed fully.

Step 4. Formatting the logical volume created:

Step 5. Mounting of partition to the Hadoop DataNode directory:

Do a similar setup in the other DataNodes of the cluster. Now, we'll contribute this DataNode storage to the cluster.

Let's restart the hdfs service in DataNodes. Now, we can see that each datanode is contributing approximately 15 GB to the cluster.

Now, if situation arises that, the DataNodes of the cluster get full and we need to increase its storage. We can do so easily by extending the size of logical volume contributed, thus providing elasticity to Hadoop DataNode storage on the fly.

Let's demonstrate how we can do so!

Here, we just increased the size of a datanode storage from 15 GB to 18 GB on the fly.

So, with the use of LVM concept we able to increase the datanode storage dynamically!!

Thanks for reading!

:):)

Akanksha Singh, RHCA

DevOps Engineer @ Paytm | RHCA Level 1 Certified | Infrastructure Development, Machine Learning and AI

4 年

Great Job!

1 次回应

Satabrata Paul

? Software Engineer 2 @ Atlan ? Speaker @ KubeCon + Cloud NativeCon 2024 ? The Linux Foundation (LFX) Intern, 2022 ? Ex-GDSC HIT Core Member

4 年

Well Done ?? Keep Progressing ??????

1 次回应

查看更多评论

要查看或添加评论，请登录

Priyanka Bharti的更多文章

Creating a High Availability Architecture with AWS CLI

2021年3月20日

Creating a High Availability Architecture with AWS CLI

Hello Readers!! Back with another article on AWS Command Line Interface whereby you'll find how to create a high…
Resolving restart httpd service challenge using Ansible.

2021年2月22日

Resolving restart httpd service challenge using Ansible.

In automating httpd service using Ansible, we usually restart the service whether we make any changes or not…
Automation of Apache web server configuration on docker using Ansible

2020年11月28日

Automation of Apache web server configuration on docker using Ansible

Before we start, let's know a bit about automation tool, ansible and the containerization tool, docker!! What is…

2 条评论
Interaction with AWS CLI to configure EC2 instance at AWS

2020年11月3日

Interaction with AWS CLI to configure EC2 instance at AWS

Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform, offering over 175 fully…

2 条评论

Providing elasticity to DataNode Storage in Hadoop using LVM

Priyanka Bharti

Software Engineer @ Samsung | C++ | Android Development | Kotlin | Linux

Priyanka Bharti的更多文章

社区洞察

其他会员也浏览了

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Is cloud replacing Hadoop?

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Hadoop File Formats, when and what to use?

Data Analysis Using Apache Hadoop and Apache Spark

Hadoop Distributed File Storage

Increasing/decreasing the size of Hadoop Datanode dynamically

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

All about BIG data

Hadoop Architecture Made Easy!

Priyanka Bharti的更多文章

Creating a High Availability Architecture with AWS CLI

Resolving restart httpd service challenge using Ansible.

Automation of Apache web server configuration on docker using Ansible

Interaction with AWS CLI to configure EC2 instance at AWS

社区洞察

其他会员也浏览了

Integration of LVM with Hadoop-Cluster To contribute limited storage of datanode on aws

Is cloud replacing Hadoop?

Setting Up Hadoop Cluster on Top of AWS & Checking the Existence of Replica by Crashing the data node

Hadoop File Formats, when and what to use?

Data Analysis Using Apache Hadoop and Apache Spark

Hadoop Distributed File Storage

Increasing/decreasing the size of Hadoop Datanode dynamically

Hadoop vs Spark: Which Big Data Framework is the Best Fit for Your Organization?

All about BIG data

Hadoop Architecture Made Easy!