HDFS  Architecture (Basic concepts)

HDFS Architecture (Basic concepts)

HDFS is a blocked file system in which each file is split into blocks of predefined size. These blocks are stored in clusters of one or more machines. The architecture of this file system follows the master / slave architecture, such as a cluster, a single name node (Master Node), and all other nodes, which are the DataNode (Slave Nodes). HDFS can be installed on a range of Java-supported machines. Although anyone can run multiple DataNodes on a machine, but in the world of applications, these DataNodes are published among different machines.

No alt text provided for this image

NameNode

The main node in HDFS is that it maintains and manages the blocks on the DataNodes. NameNode is a very high-availability server that manages the file namespace and controls user access to files. The HDFS architecture is such that user data is not mounted on NameNode. These data are only mounted on NameNodes.

DataNode

Subsidiary nodes (slaves) are in HDFS. Unlike NameNode, there is a commodity hardware, that is, a cheap system that does not have high quality. The DataNode is a block server that stores data in an ex3 or ex4 local file.

Blocks

Blocks are nothing but the smallest sequential location on your hard disk, where the data is stored. In general, in each file system, you save the data as a set of blocks. Similarly, HDFS stores each file in blocks that are scattered throughout the Apache Hadoop cluster. The default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in Apache Hadoop 1.x) that you can configure as you need it.

No alt text provided for this image

It is not necessary to save each file in HDFS to the exact size of the configured block. Let's take an example from the example.txt file of 514 MB in the form shown above. Suppose we use the default block size setting, which is 128 MB. So, how many blocks will be created? 5. The first four blocks are 128 MB. But, the last block is only 2 megabytes.

Well, whenever we talk about HDFS, we talk about large data sets, namely, Terabyte and Petabyte of data. So, if we use a block size of 4 KB, like the Linux file system, we need to have a lot of blocks and over-the-counter metadata. Therefore, managing this number of blocks and metadata generates huge overheads, and that's what we do not want.


要查看或添加评论,请登录

Babak Rezaei Bastani的更多文章

  • NameNode Server in HDFS

    NameNode Server in HDFS

    The main node in HDFS is that it maintains and manages the blocks on the DataNodes. NameNode is a very…

  • What is MapReduce?

    What is MapReduce?

    MapReduce is a processing method and a Java-based distribution model for distributed computing. The MapReduce algorithm…

  • HDFS goals

    HDFS goals

    Fault detection and recovery : Because HDFS contains a large number of commodity hardware, the probability of failure…

  • An overview of HDFS

    An overview of HDFS

    The Hadoop file system was developed using distributed file system design and runs on commodity hardware. Unlike other…

  • Introduction to Hadoop

    Introduction to Hadoop

    Hadoop is an apache-based open source framework written in Java programming language, which allows simple…

  • Data Science Processing Tools

    Data Science Processing Tools

    Once learned with data storage, you need to be familiar with data processing tools for converting data lakes to data…

  • Data Warehouse Bus Matrix

    Data Warehouse Bus Matrix

    The Enterprise Bus Matrix is a data warehouse planning tool developed by Ralph Kimball and is being used by numerous…

  • Data vault

    Data vault

    Data vault modeling, designed by Dan Linstedt, is a database modeling method that has been deliberately structured in…

  • Data Lake

    Data Lake

    A Data lake is a data storage tank for a large amount of raw data. Waiting for future needs, the data lake saves the…

  • Data Science Storage Tools

    Data Science Storage Tools

    The data science ecosystem has a set of tools that we use to build our solutions. The capabilities of this environment…

社区洞察

其他会员也浏览了