HDFS Architecture

HDFS Architecture

? Master Node:(Name Node)

? The name node holds the namespace information or metadata (in the form of a table) mentioning what is kept where.

? Name Nodes are made of high-quality software. There are very less chances for failing of Name Node.

? Slave Node: (Data Node)

? Data nodes hold the actual data in the form of “Blocks”.

? Data nodes are made using commodity (cheap) hardware to ensure the cost of the cluster is low. There are very high chances for failure of the data node.

? Working of HDFS:

? If someone sitting on a computer and makes a request to read “file1”. Then this request will always go to the Name Node (Name Node holds complete information on what is kept where).

? Name Node internally checks its metadata table to see where exactly “file1” is kept.

? Name Node will provide information about the location of “file1” back to the client node.

? Once you get information about the location of “file1” then the client will read data from whatever location it received.?

? Failure Management of Data Node:

Suppose if the data node fails then the data which is present in that node is lost. To avoid this problem Hadoop comes with a solution called “Replication Factor”.

??? Replication Factor:?The default Replication Factor is “3”. That means each block has ‘3’ copies. And we need to keep these ‘3’ copies of the same data in three different nodes.

? If the data node fails, then the data is not lost because the other two copies are present in other data nodes.

[Note: Default Replication factor is ‘3’. We can change the replication factor as per our requirements.]

Q). How does Name Node know that the data node has failed?

Ans: For this, there is a concept of the “Heartbeat” mechanism:

? Each data node sends heartbeats to the Name Node in every 3 seconds (By Default).

? If a Name Node doesn’t receive 10 consecutive heartbeats, it assumes that the data node is dead or running very slow and will mark it for deletion.

Q). What is Fault Tolerance?

Ans: If a data node goes down then the replication factor came down to <3. So, the Name Node will create one more copy to maintain the replication factor (In the remaining data nodes). ?

? Failure Management of a Name Node:

? In Hadoop V1, Name Node is a single point of failure. That means if the Name Node fails then there used to be downtime involved. So, no access to metadata. It results in the loss of block mapping information. [No Metadata means no access to the cluster.]

To avoid this problem, Hadoop2 comes with some kind of solution -?

?? There are 2 things that help us to make sure there is no downtime involved:

1. 2 important metadata files [fsimage, edit logs (edits)].

2. Secondary Name Node.

?? 2 important metadata files [fsimage, edit logs (edits)]:

? Fsimage: Snapshot of the in-memory file system at a given moment. [Contains actual information related to each block of the file.]

Ex: Cricket score card at a given point of time => 40 overs, 240 runs 6 wickets.

? Edit Logs: Records all recent changes or modifications made to the file system after that snapshot is taken.??

Merging of fsimage and edit logs continuously will give you the latest ‘fsimage’. And it is a heavy compute process. So, Name Node should not take the activity of Merging of these 2 files (fsimage + edit logs). As Name Node is busy doing a lot of other things.

?? Secondary Name Node (SNN):

The merging of these 2 files (fsimage + edit logs) is taken care by SNN.

How will it work:

? Name Node will put these 2 (fsimage + edit logs) files in a shared folder. Both the Name Node and Secondary Name Node have access to this shared folder location.

? Now SNN will read the ‘fsimage and edit logs’ from the shared folder location and merge it. After merging, it will place the ‘New Fsimage’ in the shared folder location. It will overwrite the “old FsImage data” with ‘New FsImage data’ and “Edit logs” reset to empty. This process repeats after every 30 seconds.

Where is this shared folder located?

Ans: It will be on some other machine or server. It is a shared place or location where Name Node keeps on writing (fsimage & edit logs) and Secondary Name Node keeps on reading.

? Failure Management of Primary Name Node:

? If a Primary Name Node fails, then the Secondary Name Node becomes a Primary Name Node. And the Secondary Name Node will place the latest fsimage in memory and things will start working as usual.

? Responsibility of Hadoop Admin to introduce a New Secondary Name Node which can do ‘Check Pointing’. Because Check Pointing won’t happen without SNN.

Q). What is Check Pointing?

Ans: The Name Node sends these two files in a shared location and then the secondary name node reads and keeps on merging for every 30 seconds and puts back the new ‘file system image’ in the shared location. This entire process is called “Check Pointing”.

Q). How SNN Knows that PNN failed?

Ans: By using “checkpointing” mechanism.?

Imp Points:

? If we increase the block size, then it can lead to “Underutilization of Cluster”. There will be of course less parallelism.

? If we decrease the block size [It will lead to the creation of million blocks], then it will lead to overburdening of the name node.

? Rack Awareness Mechanism:

Rack: Rack means the group of systems or servers placed in different geographical locations.

Q). Why do we not want to group all the computers or all the nodes together?

Ans: Because if there is a natural calamity and everything goes for a toss then we will lose our data. That’s why we want to spread it out across geographical locations.

Choosing Single Rack:

Suppose if we keep all three copies in the same location then there is one advantage:

? It will take very less time because everything is distributed in the same region. So, the write bandwidth will be less because it doesn’t take much time to transfer within the same location.

? But there is a big downside: The downside is what if some natural calamity comes and this rack goes for a toss. Then we will lose all three replicas. The whole point of doing a replica will be in vain.

? That’s why keeping all the blocks in the same region is never the right choice.

Choosing Multiple Racks:

Advantage: Very Less chance of data loss.

Disadvantage: Takes a lot of time to write from rack1 to rack2 to rack3. Because the write bandwidth will be very High.

Balanced Approach:

? The balanced approach is to place replicas in two different racks. One Rack will have one copy and 2nd rack will have other two copies. This is the default approach which is used in Hadoop. And this mechanism is called the “Rack Awareness Mechanism”.

? Block Report:

Each Data Node sends a block report to the Name Node at a fixed frequency indicating if any blocks are corrupted. If any block is corrupted, it will be mentioned in that block report. The Name Node will delete that block and create another copy for it.?

? Two Ways to achieve High Availability of Name Node:

1.???Using QJM (Quorum Journal Manager).

2.???Using Shared Location.

Q). Difference between QJM and Shared Location?

? Instead of one shared location, we have a quorum of journal nodes to make sure there are fewer chances of getting the metadata to be lost. (To avoid the loss of edit logs).

? Instead of a secondary name node, we have standbyNameNode which will be doing the check Pointing.

Q). How does standybyNameNode know Primary Node is dead?

Ans: There is a concept called “Zookeeper”. The primary Node keeps a lock in the zookeeper. Whenever the primary node goes down, this lock is open. Then the standByNameNode acquires that lock and becomes active.




要查看或添加评论,请登录

Madhusudhan Rao Mulagala的更多文章

社区洞察

其他会员也浏览了