Integrating LVM with Hadoop and provide elasticity to DataNode with LVM Concept
Naga phani
?? Computer Science Engineer || ???? Software Engineer ||?? Cloud & DevOps Enthusiast || ?? AI & Machine Learning Practitioner
What is Hadoop?
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
What is LVM?
LVM, or Logical Volume Management, is a storage device management technology that gives users the power to pool and abstract the physical layout of component storage devices for easier and flexible administration. Utilizing the device mapper Linux kernel framework, the current iteration, LVM2, can be used to gather existing storage devices into groups and allocate logical units from the combined space as needed.
The main advantages of LVM are increased abstraction, flexibility, and control. Logical volumes can have meaningful names like “databases” or “root-backup”. Volumes can be resized dynamically as space requirements change and migrated between physical devices within the pool on a running system or exported easily. LVM also offers advanced features like snapshotting, striping, and mirroring.
In the above diagram we can see that How LVM is Powerful tool to the technology
Task : Attaching LV to the Hadoop Data Node
For this task I am completely Using AWS Cloud
- AWS Linux Instance (Ubuntu 20.04).
- Installed Hadoop(3.3.0) in this instance.
- EBS
Firstly login in to the AWS Instance
After logging I am created a new user for the Hadoop which is named as hduser.Login in to that.
After that check the disks that are in the system with the help of
fdisk -l command
Creating EBS volume and adding to the Instance
We clearly see that there 8gib storage is there.We add the Disk from the EBS with name fordatanode of 2GB volume.
Warning: EBS Should belong to the same Availability Zone.Because it doesn't attached for different availability zone of instance.
Attach this EBS(2GB) to the our Ubuntu Instance.
Now again check the disks in the Instance
fdisk -l
we can see there new disk coming with name /dev/xvdf
Creating Physical Volume
- Creating Physical volume for the newly attached Hard Disk
Command: pvcreate /dev/xvdf
Displaying the volumegroup with
command: pvdisplay /dev/xvdf
Creating volume group(lucky) and attached the physical volume to this volume group
Creating Volume group
Command: vgcreate lucky /dev/xvdf
Display the volume group whether the physical volume is attached or not.
command: vgdisplay lucky
It is successfully attached to the volumegroup
Start Namenode
Now we have to start our Hadoop Namenode
command: hdfs namenode # iamusing 3.3.0 so some commands are vary for different versions.
Now we have to see the report of the hadoop cluster.
command: hdfs dfsadmin -report -live
we can see there is no data nodes are running.
Create Logical volume of 1 Gb
Create the the logical volume of 1Gb of name logvol from the volumegroup lucky
command:lvcreate --size 1G --name logvol lucky
We can see the logical volume details with the command lvdisplay /lucky/logvol
Formatting the LV with ext4 filesystem
Format the logical volume with ext4 file system for storing the data in the LV.
Command: mkfs.ext4 /dev/lucky/logvol
Mount LV to Hadoop Datanode
After creating filesystem for logical volume(logvol) we have to mount this LV to Hadoop DataNode Directory
command: mount /dev/lucky/logvol /usr/local/hadoop_store/hdfs/datanode
After Mounting the LV storage to DataNode dir we can check whether it is mounted or not with the help of df -h command
Wow we can see that it is successfully mounted to the datanode.
Start Datanode
Now we have to start the DataNode .
Command: hadoop datanode
After run the command the datanode had run.Check whether it is running or not with command jps
From this command it is clearly say Namenode and Datanode are running.
Now we have to check the Hadoop cluster info with command
command: hdfs dfsadmin -report -live
There is Onedata node is running and successfully shared the LV to Hadoop Datanode.
Extend the LV of 0.5Gb to Datanode without shutting the Datanode
Now we have to extended the logical volume of 0.5Gb
command: lvextend --size +0.5G /dev/lucky/logvol
It is successfully extended and we have to format the extended partition and in LVM done with Online we don't need to unmount the LV.
To format the extended partition we use the command resize2fs
command:resize2fs /dev/lucky/logvol
After resizing the volume we can check whether it is extended or not with the help of df -h command.
Wow! yes it is successfully extended to logvol.And also we have to check if the datanode storage is increased or not.
command: hdfs dfsadmin -report -live
We made it.We can see clearly that 1Gb storage is increased to 1.45Gb.
Thanks for your patience for reading :)