登录查看更多内容

An overview of HDFS

Babak Rezaei Bastani

Senior Web Developer

发布日期: 2019年6月28日

The Hadoop file system was developed using distributed file system design and runs on commodity hardware. Unlike other distribution systems, the HDFS is very fault tolerant and is designed to run on low-cost hardware.

HDFS saves a lot of data and provides easier access. To store such huge information, files are stored in several machines. These files are stored in a redundant fashion to save the system from possible data lost in case of a crash. HDFS also provides applications for parallel processing.

HDFS Architecture

Below is the file system architecture of the Hadoop:

HDFS follows the master-slave architecture and has the following components:

Namenode

Namenode is a commodity hardware that contains the GNU / Linux operating system and Namenode software. Namenode is a software that can run on commodity hardware. The system that has the namenode acts as the primary server and does the following:

? Manages the file system name space.

? Set client access to files.

? Also executes file system operations such as rename, close, and open files and directories.

Datanode

Datanode is a commodity hardware that has a GNU / Linux operating system and datanode software. For each node in a cluster, there will be a datanode. These nodes manage their system data storage.

? Datanodes performs read and write operations on the file system when requested by the client.

? They also perform operations such as block creation, deletion, and replication according to the namenode instructions.

Block

Generally, user data is stored in HDFS files. Files in a file system are divided into one or more sections and stored in separate data nodes. These file sections are called blocks. In other words, the minimum amount of data that the HDFS can read or write is called a block. The block size is 64MB by default, but can be increased as needed by changing HDFS settings.

要查看或添加评论，请登录

Babak Rezaei Bastani的更多文章

NameNode Server in HDFS

2019年7月11日

NameNode Server in HDFS

The main node in HDFS is that it maintains and manages the blocks on the DataNodes. NameNode is a very…
HDFS Architecture (Basic concepts)

2019年7月11日

HDFS Architecture (Basic concepts)

HDFS is a blocked file system in which each file is split into blocks of predefined size. These blocks are stored in…
What is MapReduce?

2019年6月30日

What is MapReduce?

MapReduce is a processing method and a Java-based distribution model for distributed computing. The MapReduce algorithm…
HDFS goals

2019年6月28日

HDFS goals

Fault detection and recovery : Because HDFS contains a large number of commodity hardware, the probability of failure…
Introduction to Hadoop

2019年6月27日

Introduction to Hadoop

Hadoop is an apache-based open source framework written in Java programming language, which allows simple…
Data Science Processing Tools

2019年6月11日

Data Science Processing Tools

Once learned with data storage, you need to be familiar with data processing tools for converting data lakes to data…
Data Warehouse Bus Matrix

2019年6月8日

Data Warehouse Bus Matrix

The Enterprise Bus Matrix is a data warehouse planning tool developed by Ralph Kimball and is being used by numerous…
Data vault

2019年6月8日

Data vault

Data vault modeling, designed by Dan Linstedt, is a database modeling method that has been deliberately structured in…
Data Lake

2019年6月7日

Data Lake

A Data lake is a data storage tank for a large amount of raw data. Waiting for future needs, the data lake saves the…
Data Science Storage Tools

2019年6月6日

Data Science Storage Tools

The data science ecosystem has a set of tools that we use to build our solutions. The capabilities of this environment…

See all articles

An overview of HDFS

Babak Rezaei Bastani

Senior Web Developer

HDFS Architecture

Namenode

Datanode

Block

Babak Rezaei Bastani的更多文章

社区洞察

其他会员也浏览了

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Microsoft Azure and Hadoop Ecosystem

Menu driven Program for a Linux Control

How Facebook uses Ansible for Hadoop Setup

Configuring Hadoop Cluster and Solving HTTPD Service idempotence challenge using Ansible

YARN (Yet Another Resource Negotiator)

Hadoop installation on Ubuntu

Introduction to Hive

Sharing Limited Storage of a Slave Node to the Master Node in an HDFS Cluster

Configuring Hadoop(NN/DN) via Ansible

HDFS Architecture

Namenode

Datanode

Block

Babak Rezaei Bastani的更多文章

NameNode Server in HDFS

HDFS Architecture (Basic concepts)

What is MapReduce?

HDFS goals

Introduction to Hadoop

Data Science Processing Tools

Data Warehouse Bus Matrix

Data vault

Data Lake

Data Science Storage Tools

社区洞察

其他会员也浏览了

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Microsoft Azure and Hadoop Ecosystem

Menu driven Program for a Linux Control

How Facebook uses Ansible for Hadoop Setup

Configuring Hadoop Cluster and Solving HTTPD Service idempotence challenge using Ansible

YARN (Yet Another Resource Negotiator)

Hadoop installation on Ubuntu

Introduction to Hive

Sharing Limited Storage of a Slave Node to the Master Node in an HDFS Cluster

Configuring Hadoop(NN/DN) via Ansible