Integrating LVM with Hadoop and 
providing Elasticity to DataNode Storage

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

To understand the term 'Big Data', we first need to understand "What is data?". So, Data are a collection of facts, such as numbers, words, measurements, observations, or just descriptions of things. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects.

What is Big Data?

Big Data is also data but when data is much more from our storage capability, it is called Big Data. We stored the data on the Hard Disk.

For example, if we want to download a 100 Gb file in our system but our system can store only 64 Gb data, then we can be called it Big Data.

If we talk about Facebook, then Facebook revealed some big, big stats on Big Data, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half-hour. As you can see. 500+ Tb data, it's a very huge volume of data, that Facebook received from its users per day and became 15 petabytes in a month. Facebook has to stored data permanently if it wants to run its business because the user can demand any time to see their content.

And again if we talk about Google, Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. Google currently processes over 20 petabytes of data per day and it is more than 40 times of Facebook.

Here in this Task, I'm going to discuss the integration of LVM with Hadoop and here I provide Elasticity to the Datanode Storage.

  • First, I create the Cluster of Hadoop with two RHEL 8 systems. Here I'm performing this task on my VMs.
  • I configure one system as Master Node and another one as the Data node and share the storage of the data node with the master node.
No alt text provided for this image
No alt text provided for this image
  • Now, I create one logical volume with name lv1 and path=/dev/demovg/lv1 with lv size = 50G in Data node.
  • And after that, I mounted this LV partition to a mountpoint => /lv after that give the path of the mountpoint inside the /etc/hadoop/hdfs-site.xml file in the <value>/lv</value>
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
  • Now, I'm going to check that data node successfully share their storage or not by running the command 'hadoop dfsadmin -report'
No alt text provided for this image
  • Now, increase the size of LV by +5G and on the fly, the size of shared storage also increased. And after increasing the size again check the shared storage of the data node using the command 'hadoop dfsadmin -report'
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Here You can see that the size of shared storage increases on the fly from 49 GiB to 54 GiB

So here we have done with the integration of LVM with Hadoop

Thank You!!!


要查看或添加评论,请登录

Hemendra Chaudhary的更多文章

  • Industry use cases of Jenkins

    Industry use cases of Jenkins

    What is Jenkins? Jenkins? is an open-source automation server. With Jenkins, organizations can accelerate the software…

  • The Usecase of JavaScript n industries

    The Usecase of JavaScript n industries

    What is Javascript? JavaScript is a lightweight, open-source and cross-platform programming. It is designed for…

  • K-Means Clustering and UseCases in Security Domain.

    K-Means Clustering and UseCases in Security Domain.

    K means is one of the most popular Unsupervised Machine Learning Algorithms Used for Solving Classification Problems. K…

  • Confusion Matrix And Cyber Crime

    Confusion Matrix And Cyber Crime

    What is Confusion Matrix? When we get the data, after data cleaning, pre-processing, and wrangling, the first step we…

  • Neural Networks and their Applications in Industry

    Neural Networks and their Applications in Industry

    INTRODUCTION Over the past few years, technology has become very dynamic. It is fuelling itself at an ever-increasing…

  • USE-CASE FOR KUBERNETES

    USE-CASE FOR KUBERNETES

    Introduction Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized…

  • Ansible: How industries are solving challenges using Ansible

    Ansible: How industries are solving challenges using Ansible

    In this article, we come to know about: What is Ansible Architecture of Ansible Ansible: Concept Why we need Ansible…

  • Use Case Of ML/AI In Agriculture

    Use Case Of ML/AI In Agriculture

    Artificial Intelligence(AI) refers to the simulation of human intelligence in machines that are programmed to think…

  • Control EC2 Service Using CLI

    Control EC2 Service Using CLI

    In this task, we are going to perform the following: Create a Key Pair Create a Security Group Launch an instance using…

  • Case Studies - Cloud Computing

    Case Studies - Cloud Computing

    An introduction to cloud computing right from the basics What is cloud computing, in simple terms? Cloud computing is…

    1 条评论

社区洞察

其他会员也浏览了