#10 Key features of HDFS

#10 Key features of HDFS

Name Node Federation:

  • In the newer version of Hadoop, Name node federation is introduced where there can be more than one Name node to handle growing metadata.

  • It prevents performance issues by distributing the workload across multiple NameNodes.
  • This approach helps manage growing metadata more efficiently.
  • Helps in achieving Scalability.


Fault Tolerance:

What if the data node fails..?

  • Replication factor ensures copies of data are stored on multiple DataNodes.
  • If a DataNode fails, data can still be accessed from replicated copies on other nodes.
  • This redundancy helps maintain data availability and prevents loss in case of node failures.


What if the Name node fails..?

  • Secondary Name node come into the picture to keep the system up and running.
  • The Secondary NameNode regularly checks the state of the NameNode.
  • In case of a failure, this checkpoint can be used to restore the state of the NameNode up to the last checkpoint, reducing the downtime.

要查看或添加评论,请登录

Mohammad Azzam的更多文章

  • #33 what is broadcast join in spark

    #33 what is broadcast join in spark

    In Apache Spark, a "broadcast join" is a type of join operation used to optimize performance when joining large and…

  • #32 Repartition vs coalsece

    #32 Repartition vs coalsece

    repartition() and coalesce() are both methods in Apache Spark used to manage the number of partitions in an RDD or…

  • #31: Partitions in spark

    #31: Partitions in spark

    In Apache Spark, partitions are the basic units of parallelism and data distribution. When you create an RDD (Resilient…

  • #30 Task, job and stage in spark

    #30 Task, job and stage in spark

    In Apache Spark, jobs, tasks, and stages are fundamental concepts that play a crucial role in the distributed execution…

  • #29 ReduceBy() key vs groupBy() key in spark RDD

    #29 ReduceBy() key vs groupBy() key in spark RDD

    In the context of Apache Spark's Resilient Distributed Datasets (RDDs), both reduceByKey and groupByKey are…

  • #28: reduce VS reduceByKey in Apache Spark RDDs

    #28: reduce VS reduceByKey in Apache Spark RDDs

    reduce() and reduceByKey() are two distinct operations available in Apache Spark, a distributed computing framework for…

    2 条评论
  • #27 Narrow vs Wide Transformations in Spark

    #27 Narrow vs Wide Transformations in Spark

    In Apache Spark, transformations are broadly categorized into two types based on how they operate across partitions of…

  • #26: Shuffling and Sorting in Apache Spark

    #26: Shuffling and Sorting in Apache Spark

    Shuffling and sorting are fundamental operations in Apache Spark, especially in distributed data processing. They play…

  • #25: Transformation and Action in Apache Spark

    #25: Transformation and Action in Apache Spark

    In Apache Spark, there are two types of operations that can be applied to RDDs (Resilient Distributed Datasets):…

  • #24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

    #24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

    Certainly! Here are 10 majorly used transformations in RDDs (Resilient Distributed Datasets) in Apache Spark:…

社区洞察

其他会员也浏览了