登录查看更多内容

#10 Key features of HDFS

Mohammad Azzam

Immediate Joiner Snaplogic Developer| Python | SQL | Spark | PySpark | Databricks | SnapLogic | ADF | Glue | Redshift | S3 | AWS Certified x2 | Databricks Certified Data Engineer Associate | SnapLogic Certified

发布日期: 2024年3月4日

+ 关注

Name Node Federation:

In the newer version of Hadoop, Name node federation is introduced where there can be more than one Name node to handle growing metadata.

It prevents performance issues by distributing the workload across multiple NameNodes.
This approach helps manage growing metadata more efficiently.
Helps in achieving Scalability.

Fault Tolerance:

What if the data node fails..?

Replication factor ensures copies of data are stored on multiple DataNodes.
If a DataNode fails, data can still be accessed from replicated copies on other nodes.
This redundancy helps maintain data availability and prevents loss in case of node failures.

What if the Name node fails..?

Secondary Name node come into the picture to keep the system up and running.
The Secondary NameNode regularly checks the state of the NameNode.
In case of a failure, this checkpoint can be used to restore the state of the NameNode up to the last checkpoint, reducing the downtime.

要查看或添加评论，请登录

Mohammad Azzam的更多文章

#33 what is broadcast join in spark

2024年4月22日

#33 what is broadcast join in spark

In Apache Spark, a "broadcast join" is a type of join operation used to optimize performance when joining large and…
#32 Repartition vs coalsece

2024年4月12日

#32 Repartition vs coalsece

repartition() and coalesce() are both methods in Apache Spark used to manage the number of partitions in an RDD or…
#31: Partitions in spark

2024年4月10日

#31: Partitions in spark

In Apache Spark, partitions are the basic units of parallelism and data distribution. When you create an RDD (Resilient…
#30 Task, job and stage in spark

2024年4月9日

#30 Task, job and stage in spark

In Apache Spark, jobs, tasks, and stages are fundamental concepts that play a crucial role in the distributed execution…
#29 ReduceBy() key vs groupBy() key in spark RDD

2024年4月8日

#29 ReduceBy() key vs groupBy() key in spark RDD

In the context of Apache Spark's Resilient Distributed Datasets (RDDs), both reduceByKey and groupByKey are…
#28: reduce VS reduceByKey in Apache Spark RDDs

2024年4月5日

#28: reduce VS reduceByKey in Apache Spark RDDs

reduce() and reduceByKey() are two distinct operations available in Apache Spark, a distributed computing framework for…

2 条评论
#27 Narrow vs Wide Transformations in Spark

2024年4月4日

#27 Narrow vs Wide Transformations in Spark

In Apache Spark, transformations are broadly categorized into two types based on how they operate across partitions of…
#26: Shuffling and Sorting in Apache Spark

2024年4月3日

#26: Shuffling and Sorting in Apache Spark

Shuffling and sorting are fundamental operations in Apache Spark, especially in distributed data processing. They play…
#25: Transformation and Action in Apache Spark

2024年4月2日

#25: Transformation and Action in Apache Spark

In Apache Spark, there are two types of operations that can be applied to RDDs (Resilient Distributed Datasets):…
#24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

2024年4月1日

#24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

Certainly! Here are 10 majorly used transformations in RDDs (Resilient Distributed Datasets) in Apache Spark:…

See all articles

#10 Key features of HDFS

Mohammad Azzam

Immediate Joiner Snaplogic Developer| Python | SQL | Spark | PySpark | Databricks | SnapLogic | ADF | Glue | Redshift | S3 | AWS Certified x2 | Databricks Certified Data Engineer Associate | SnapLogic Certified

Mohammad Azzam的更多文章

社区洞察

其他会员也浏览了

Five Steps to Successfully Manage Multiple Data Platforms

DATAWAREHOUSE VS HADOOP VS DBMS

The 5 Values of Managing Metadata in Big Data !

Open source hype bubble burst

Attend Live online Introductory session on Apache Flink - 4G of Big Data

How to Master the concepts of Big Data

??Limiting the storage of data node in the hadoop cluster??

“Hadoop” If you know how to pronounce it, You want to see this 5 minute video

Hadoop Summit 2016 - Big Data Ready Enterprise

Data lakes: Look before you leap

Mohammad Azzam的更多文章

#33 what is broadcast join in spark

#32 Repartition vs coalsece

#31: Partitions in spark

#30 Task, job and stage in spark

#29 ReduceBy() key vs groupBy() key in spark RDD

#28: reduce VS reduceByKey in Apache Spark RDDs

#27 Narrow vs Wide Transformations in Spark

#26: Shuffling and Sorting in Apache Spark

#25: Transformation and Action in Apache Spark

#24: 10 Majorly Used Transformations in RDDs (Resilient Distributed Datasets)

社区洞察

其他会员也浏览了

Five Steps to Successfully Manage Multiple Data Platforms

DATAWAREHOUSE VS HADOOP VS DBMS

The 5 Values of Managing Metadata in Big Data !

Open source hype bubble burst

Attend Live online Introductory session on Apache Flink - 4G of Big Data

How to Master the concepts of Big Data

??Limiting the storage of data node in the hadoop cluster??

“Hadoop” If you know how to pronounce it, You want to see this 5 minute video

Hadoop Summit 2016 - Big Data Ready Enterprise

Data lakes: Look before you leap