登录查看更多内容

Hadoop – Architecture

Vanshika Munshi

HR Manager

发布日期: 2022年9月22日

As we all know Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to maintain and store big size data. Hadoop works on MapReduce Programming Algorithm that was introduced by Google. Today lots of Big Brand Companies are using Hadoop in their Organization to deal with big data, eg. Facebook, Yahoo, Netflix, eBay, etc. The Hadoop Architecture Mainly consists of 4 components.?

MapReduce
HDFS(Hadoop distributed File System)
YARN(Yet Another Resource Framework)
Common Utilities or Hadoop Common

?1. MapReduce

MapReduce nothing but just like an Algorithm or a?data structure?that is based on the YARN framework. The major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster which Makes Hadoop working so fast. When you are dealing with Big Data, serial processing is no more of any use. MapReduce has mainly 2 tasks which are divided phase-wise:?

In first phase,?Map?is utilized and in next phase?Reduce?is utilized.?

领英推荐

The Evolution of Apache Hadoop: A Revolutionary Big…

Sachin D N ???? 1 年前

2. HDFS

HDFS(Hadoop Distributed File System) is utilized for storage permission is a Hadoop cluster. It mainly designed for working on commodity Hardware devices(inexpensive devices), working on a distributed file system design. HDFS is designed in such a way that it believes more in storing the data in a large chunk of blocks rather than storing small data blocks.?

HDFS in Hadoop provides Fault-tolerance and High availability to the storage layer and the other devices present in that Hadoop cluster. Data storage Nodes in HDFS.?

3. YARN(Yet Another Resource Negotiator)

YARN is a Framework on which MapReduce works. YARN performs 2 operations that are Job scheduling and Resource Management. The Purpose of Job schedular is to divide a big task into small jobs so that each job can be assigned to various slaves in a Hadoop cluster and Processing can be Maximized. Job Scheduler also keeps track of which job is important, which job has more priority, dependencies between the jobs and all the other information like job timing, etc. And the use of Resource Manager is to manage all the resources that are made available for running a Hadoop cluster.?

4. Hadoop common or Common Utilities

Hadoop common or Common utilities are nothing but our java library and java files or we can say the java scripts that we need for all the other components present in a Hadoop cluster. these utilities are used by HDFS, YARN, and MapReduce for running the cluster. Hadoop Common verify that Hardware failure in a Hadoop cluster is common so it needs to be solved automatically in software by Hadoop Framework.

要查看或添加评论，请登录

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

2024年8月13日

Key Data Engineer Skills and Responsibilities

Over time, there has been a significant transformation in the realm of data and its associated domains. Initially, the…
What Is Financial Planning? Definition, Meaning and Purpose

2024年8月12日

What Is Financial Planning? Definition, Meaning and Purpose

Financial planning is the process of taking a comprehensive look at your financial situation and building a specific…
What is Power BI?

2024年8月10日

What is Power BI?

The parts of Power BI Power BI consists of several elements that all work together, starting with these three basics: A…
Abinitio Graphs

2024年8月8日

Abinitio Graphs

Graph Concept Graph : A graph is a data flow diagram that defines the various processing stages of a task and the…
Abinitio Interview Questions

2024年8月6日

Abinitio Interview Questions

1. What is Ab Initio? Ab Initio is a robust data processing and analysis tool used for ETL (Extract, Transform, Load)…
Big Query

2024年8月5日

Big Query

BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of…
Responsibilities of Abinitio Developer

2024年8月3日

Responsibilities of Abinitio Developer

Job Description Project Role : Application Developer Project Role Description : Design, build and configure…
Abinitio Developer

2024年8月2日

Abinitio Developer

Responsibilities Monitor and Support existing production data pipelines developed in AB Initio Analysis of highly…
Data Engineer

2024年8月1日

Data Engineer

Data engineering is the practice of designing and building systems for collecting, storing, and analysing data at…
Pyspark

2024年7月31日

Pyspark

What is PySpark? Apache Spark is written in Scala programming language. PySpark has been released in order to support…

See all articles

Hadoop – Architecture

Vanshika Munshi

HR Manager

?1. MapReduce

领英推荐

2. HDFS

3. YARN(Yet Another Resource Negotiator)

4. Hadoop common or Common Utilities

Vanshika Munshi的更多文章

社区洞察

其他会员也浏览了

WHAT IS HADOOP

Why do we need Hadoop for Data Science - NareshIT

HADOOP

Hadoop Developer

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

What are the prerequisites to learn Hadoop?

Expertzlab the best IT finishing school in Kerala

Introduction to Hadoop

?1. MapReduce

领英推荐

2. HDFS

3. YARN(Yet Another Resource Negotiator)

4. Hadoop common or Common Utilities

Vanshika Munshi的更多文章

Key Data Engineer Skills and Responsibilities

What Is Financial Planning? Definition, Meaning and Purpose

What is Power BI?

Abinitio Graphs

Abinitio Interview Questions

Big Query

Responsibilities of Abinitio Developer

Abinitio Developer

Data Engineer

Pyspark

社区洞察

其他会员也浏览了

WHAT IS HADOOP

Why do we need Hadoop for Data Science - NareshIT

HADOOP

Hadoop Developer

Task Efficiency: A Comparative Study of Hadoop MapReduce, Apache Spark

Hadoop 3: Comparison with Hadoop 2 and Spark

Apache Hadoop vs Apache Spark

What are the prerequisites to learn Hadoop?

Expertzlab the best IT finishing school in Kerala

Introduction to Hadoop