An overview of HDFS
The Hadoop file system was developed using distributed file system design and runs on commodity hardware. Unlike other distribution systems, the HDFS is very fault tolerant and is designed to run on low-cost hardware.
HDFS saves a lot of data and provides easier access. To store such huge information, files are stored in several machines. These files are stored in a redundant fashion to save the system from possible data lost in case of a crash. HDFS also provides applications for parallel processing.
HDFS Architecture
Below is the file system architecture of the Hadoop:
HDFS follows the master-slave architecture and has the following components:
Namenode
Namenode is a commodity hardware that contains the GNU / Linux operating system and Namenode software. Namenode is a software that can run on commodity hardware. The system that has the namenode acts as the primary server and does the following:
? Manages the file system name space.
? Set client access to files.
? Also executes file system operations such as rename, close, and open files and directories.
Datanode
Datanode is a commodity hardware that has a GNU / Linux operating system and datanode software. For each node in a cluster, there will be a datanode. These nodes manage their system data storage.
? Datanodes performs read and write operations on the file system when requested by the client.
? They also perform operations such as block creation, deletion, and replication according to the namenode instructions.
Block
Generally, user data is stored in HDFS files. Files in a file system are divided into one or more sections and stored in separate data nodes. These file sections are called blocks. In other words, the minimum amount of data that the HDFS can read or write is called a block. The block size is 64MB by default, but can be increased as needed by changing HDFS settings.