Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
To understand the term 'Big Data', we first need to understand "What is data?". So, Data are a collection of facts, such as numbers, words, measurements, observations, or just descriptions of things. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects.
What is Big Data?
Big Data is also data but when data is much more from our storage capability, it is called Big Data. We stored the data on the Hard Disk.
For example, if we want to download a 100 Gb file in our system but our system can store only 64 Gb data, then we can be called it Big Data.
If we talk about Facebook, then Facebook revealed some big, big stats on Big Data, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half-hour. As you can see. 500+ Tb data, it's a very huge volume of data, that Facebook received from its users per day and became 15 petabytes in a month. Facebook has to stored data permanently if it wants to run its business because the user can demand any time to see their content.
And again if we talk about Google, Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. Google currently processes over 20 petabytes of data per day and it is more than 40 times of Facebook.
Here in this Task, I'm going to discuss the integration of LVM with Hadoop and here I provide Elasticity to the Datanode Storage.
- First, I create the Cluster of Hadoop with two RHEL 8 systems. Here I'm performing this task on my VMs.
- I configure one system as Master Node and another one as the Data node and share the storage of the data node with the master node.
- Now, I create one logical volume with name lv1 and path=/dev/demovg/lv1 with lv size = 50G in Data node.
- And after that, I mounted this LV partition to a mountpoint => /lv after that give the path of the mountpoint inside the /etc/hadoop/hdfs-site.xml file in the <value>/lv</value>
- Now, I'm going to check that data node successfully share their storage or not by running the command 'hadoop dfsadmin -report'
- Now, increase the size of LV by +5G and on the fly, the size of shared storage also increased. And after increasing the size again check the shared storage of the data node using the command 'hadoop dfsadmin -report'
Here You can see that the size of shared storage increases on the fly from 49 GiB to 54 GiB
So here we have done with the integration of LVM with Hadoop
Thank You!!!