登录查看更多内容

HDFS Architecture

Madhusudhan Rao Mulagala

Big Data Engineer

发布日期: 2023年6月17日

+ 关注

? Master Node:(Name Node)

? The name node holds the namespace information or metadata (in the form of a table) mentioning what is kept where.

? Name Nodes are made of high-quality software. There are very less chances for failing of Name Node.

? Slave Node: (Data Node)

? Data nodes hold the actual data in the form of “Blocks”.

? Data nodes are made using commodity (cheap) hardware to ensure the cost of the cluster is low. There are very high chances for failure of the data node.

? Working of HDFS:

? If someone sitting on a computer and makes a request to read “file1”. Then this request will always go to the Name Node (Name Node holds complete information on what is kept where).

? Name Node internally checks its metadata table to see where exactly “file1” is kept.

? Name Node will provide information about the location of “file1” back to the client node.

? Once you get information about the location of “file1” then the client will read data from whatever location it received.?

? Failure Management of Data Node:

Suppose if the data node fails then the data which is present in that node is lost. To avoid this problem Hadoop comes with a solution called “Replication Factor”.

??? Replication Factor:?The default Replication Factor is “3”. That means each block has ‘3’ copies. And we need to keep these ‘3’ copies of the same data in three different nodes.

? If the data node fails, then the data is not lost because the other two copies are present in other data nodes.

[Note: Default Replication factor is ‘3’. We can change the replication factor as per our requirements.]

Q). How does Name Node know that the data node has failed?

Ans: For this, there is a concept of the “Heartbeat” mechanism:

? Each data node sends heartbeats to the Name Node in every 3 seconds (By Default).

? If a Name Node doesn’t receive 10 consecutive heartbeats, it assumes that the data node is dead or running very slow and will mark it for deletion.

Q). What is Fault Tolerance?

Ans: If a data node goes down then the replication factor came down to <3. So, the Name Node will create one more copy to maintain the replication factor (In the remaining data nodes). ?

? Failure Management of a Name Node:

? In Hadoop V1, Name Node is a single point of failure. That means if the Name Node fails then there used to be downtime involved. So, no access to metadata. It results in the loss of block mapping information. [No Metadata means no access to the cluster.]

To avoid this problem, Hadoop2 comes with some kind of solution -?

?? There are 2 things that help us to make sure there is no downtime involved:

1. 2 important metadata files [fsimage, edit logs (edits)].

2. Secondary Name Node.

?? 2 important metadata files [fsimage, edit logs (edits)]:

? Fsimage: Snapshot of the in-memory file system at a given moment. [Contains actual information related to each block of the file.]

Ex: Cricket score card at a given point of time => 40 overs, 240 runs 6 wickets.

? Edit Logs: Records all recent changes or modifications made to the file system after that snapshot is taken.??

Merging of fsimage and edit logs continuously will give you the latest ‘fsimage’. And it is a heavy compute process. So, Name Node should not take the activity of Merging of these 2 files (fsimage + edit logs). As Name Node is busy doing a lot of other things.

?? Secondary Name Node (SNN):

The merging of these 2 files (fsimage + edit logs) is taken care by SNN.

How will it work:

? Name Node will put these 2 (fsimage + edit logs) files in a shared folder. Both the Name Node and Secondary Name Node have access to this shared folder location.

? Now SNN will read the ‘fsimage and edit logs’ from the shared folder location and merge it. After merging, it will place the ‘New Fsimage’ in the shared folder location. It will overwrite the “old FsImage data” with ‘New FsImage data’ and “Edit logs” reset to empty. This process repeats after every 30 seconds.

领英推荐

Top 10 Big Data Tools & Technologies To Watch Out In…

ITIO Innovex Pvt. Ltd. 10 个月前

How we designed an effective data lake solution for a…

Pronix Inc 2 年前

The History and Evolution of Open Table Formats

Alireza Sadeghi 6 个月前

Where is this shared folder located?

Ans: It will be on some other machine or server. It is a shared place or location where Name Node keeps on writing (fsimage & edit logs) and Secondary Name Node keeps on reading.

? Failure Management of Primary Name Node:

? If a Primary Name Node fails, then the Secondary Name Node becomes a Primary Name Node. And the Secondary Name Node will place the latest fsimage in memory and things will start working as usual.

? Responsibility of Hadoop Admin to introduce a New Secondary Name Node which can do ‘Check Pointing’. Because Check Pointing won’t happen without SNN.

Q). What is Check Pointing?

Ans: The Name Node sends these two files in a shared location and then the secondary name node reads and keeps on merging for every 30 seconds and puts back the new ‘file system image’ in the shared location. This entire process is called “Check Pointing”.

Q). How SNN Knows that PNN failed?

Ans: By using “checkpointing” mechanism.?

Imp Points:

? If we increase the block size, then it can lead to “Underutilization of Cluster”. There will be of course less parallelism.

? If we decrease the block size [It will lead to the creation of million blocks], then it will lead to overburdening of the name node.

? Rack Awareness Mechanism:

Rack: Rack means the group of systems or servers placed in different geographical locations.

Q). Why do we not want to group all the computers or all the nodes together?

Ans: Because if there is a natural calamity and everything goes for a toss then we will lose our data. That’s why we want to spread it out across geographical locations.

Choosing Single Rack:

Suppose if we keep all three copies in the same location then there is one advantage:

? It will take very less time because everything is distributed in the same region. So, the write bandwidth will be less because it doesn’t take much time to transfer within the same location.

? But there is a big downside: The downside is what if some natural calamity comes and this rack goes for a toss. Then we will lose all three replicas. The whole point of doing a replica will be in vain.

? That’s why keeping all the blocks in the same region is never the right choice.

Choosing Multiple Racks:

Advantage: Very Less chance of data loss.

Disadvantage: Takes a lot of time to write from rack1 to rack2 to rack3. Because the write bandwidth will be very High.

Balanced Approach:

? The balanced approach is to place replicas in two different racks. One Rack will have one copy and 2nd rack will have other two copies. This is the default approach which is used in Hadoop. And this mechanism is called the “Rack Awareness Mechanism”.

? Block Report:

Each Data Node sends a block report to the Name Node at a fixed frequency indicating if any blocks are corrupted. If any block is corrupted, it will be mentioned in that block report. The Name Node will delete that block and create another copy for it.?

? Two Ways to achieve High Availability of Name Node:

1.???Using QJM (Quorum Journal Manager).

2.???Using Shared Location.

Q). Difference between QJM and Shared Location?

? Instead of one shared location, we have a quorum of journal nodes to make sure there are fewer chances of getting the metadata to be lost. (To avoid the loss of edit logs).

? Instead of a secondary name node, we have standbyNameNode which will be doing the check Pointing.

Q). How does standybyNameNode know Primary Node is dead?

Ans: There is a concept called “Zookeeper”. The primary Node keeps a lock in the zookeeper. Whenever the primary node goes down, this lock is open. Then the standByNameNode acquires that lock and becomes active.

要查看或添加评论，请登录

Madhusudhan Rao Mulagala的更多文章

Classification of Attributes in DBMS

2024年6月16日

Classification of Attributes in DBMS

1. Simple Attribute: A simple attribute is an attribute that cannot be divided further.
Decide which architecture is your best fit:

2024年5月11日

Decide which architecture is your best fit:

For better or for worse, we have a far more complex process of selecting a data warehouse environment that's simply…
Hive: Transforming Data Warehousing for Modern Businesses

2023年11月25日

Hive: Transforming Data Warehousing for Modern Businesses

Hive is an open-source data warehouse. Hive is meant to solve analytical problems.
Dive into Databricks Clusters: The Engine for Data Revolution

2023年11月21日

Dive into Databricks Clusters: The Engine for Data Revolution

A cluster is a collection of virtual machines that helps to achieve distributed data processing. Clusters can be…
Mastering Data Management: A Battle of Transactional and Analytical Systems

2023年10月6日

Mastering Data Management: A Battle of Transactional and Analytical Systems

Transactional systems: Transactional systems are the ones where we deal with day-to-day data or present data. In such…
Mastering Scala: Unveiling the Power of Functional Programming and Functions

2023年8月20日

Mastering Scala: Unveiling the Power of Functional Programming and Functions

? ? ? Scala is a hybrid programming language that supports both object-oriented programming and functional programming.…
Harnessing Broadcast Join and Accumulator Magic!

2023年7月19日

Harnessing Broadcast Join and Accumulator Magic!

? ?? Broadcast Join: It is an optimization technique in the Spark SQL engine that is used to join two Data Frames. This…
Understanding YARN (Yet Another Resource Negotiator)

2023年5月31日

Understanding YARN (Yet Another Resource Negotiator)

For understanding YARN first, we need to understand Hadoop1/ MR1 Architecture: ?? From Storage Perspective – HDFS ?…

1 条评论
Spark Internals

2023年5月4日

Spark Internals

?? Learning through question-and-answer format has been ingrained in us since childhood, as it promotes active…

See all articles

HDFS Architecture

Madhusudhan Rao Mulagala

Big Data Engineer

领英推荐

Madhusudhan Rao Mulagala的更多文章

社区洞察

其他会员也浏览了

The Evolution of Data Engineering: From Batch Processing to Real-Time Insights

Availability vs Consistency in System Design part -4

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

The Ultimate Guide to Data Engineering: Mastering Tools, Techniques, and Trends

Lambda VS Kappa Architectures

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

Future Proof Big Data Architecture - Comprehensive Guide

Delta Lake Format: Understanding Parquet under the hood.

Fixed Partitions (Design Pattern of Distributed Systems)

领英推荐

Madhusudhan Rao Mulagala的更多文章

Classification of Attributes in DBMS

Decide which architecture is your best fit:

Hive: Transforming Data Warehousing for Modern Businesses

Dive into Databricks Clusters: The Engine for Data Revolution

Mastering Data Management: A Battle of Transactional and Analytical Systems

Mastering Scala: Unveiling the Power of Functional Programming and Functions

Harnessing Broadcast Join and Accumulator Magic!

Understanding YARN (Yet Another Resource Negotiator)

Spark Internals

社区洞察

其他会员也浏览了

The Evolution of Data Engineering: From Batch Processing to Real-Time Insights

Availability vs Consistency in System Design part -4

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

The Ultimate Guide to Data Engineering: Mastering Tools, Techniques, and Trends

Lambda VS Kappa Architectures

Why Open Table Formats and Apache Iceberg Are Reshaping Data Engineering

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

Future Proof Big Data Architecture - Comprehensive Guide

Delta Lake Format: Understanding Parquet under the hood.

Fixed Partitions (Design Pattern of Distributed Systems)