登录查看更多内容

Hadoop vs Hive

Darshika Srivastava

Associate Project Manager @ HuQuo | MBA,Amity Business School

发布日期: 2024年2月15日

Difference Between Hadoop vs Hive

Hadoop is a Framework or Software invented to manage huge data or Big Data. Hadoop stores and processes extensive data distributed across a cluster of commodity servers. Hadoop stores the data using Hadoop distributed file system and process/query it using the Map-Reduce programming model. Hive is an application that runs over the Hadoop framework and provides an SQL-like interface for processing/querying the data. Hive was designed and developed by Facebook before becoming part of the Apache-Hadoop project. Hive runs its query using HQL (Hive query language). Hive has the same structure as RDBMS, and almost the same commands can be used in Hive. Hive can store the data in external tables, so it’s not mandatory to use HDFS. Also, it supports file formats such as ORC, Avro files, Sequence Files and Text files, etc.

Popular Course in this category

HIVE Course Bundle - 7 Courses in 1

Hadoop’s Major Components

Figure 1, a Basic architecture of a Hadoop component.

Hadoop Base/Common: Hadoop Common will provide one platform to install all its components.

HDFS (Hadoop Distributed File System): HDFS is a major part of the Hadoop framework. It?takes care of all the data in the Hadoop Cluster. It works on Master/Slave Architecture and stores the data using replication.

领英推荐

Basics of Hadoop

Vivek Bansal 11 个月前

Sqoop Tutorial: Big Data on Hadoop

Girish V 1 年前

Apache Hadoop YARN

Rohit Singh 1 个月前

Master/Slave Architecture & Replication

Master Node/Name Node: The name node stores the metadata of each block/file stored in HDFS; HDFS can have only one Master Node (Incase of HA, another Master Node will work as a Secondary Master Node).
Slave Node/Data Node: Data nodes contain actual data files in blocks. HDFS can have multiple Data Nodes.
Replication: HDFS stores its data by dividing it into blocks. The default block size is 64 MB. Due to replication, data gets stored into 3 (Default Replication factor, which can be increased as per requirement) different Data Nodes; hence there is the slightest possibility of losing the data in case of any node failure.

YARN (Yet Another Resource Negotiator): It manages Hadoop resources. Also, it plays a vital role in scheduling users’ applications.

MR (Map Reduce): This is the primary programming model of Hadoop. It is used to process/query the data within the Hadoop framework.

Hive’s Major Components

Figure 2: Hive’s Architecture & Its Major Components

Hive Clients: Besides SQL, Hive also supports programming languages like Java, C, and Python using various drivers such as ODBC, JDBC, and Thrift. One can write any Hive client application in other languages and can run in Hive using these Clients.

Hive Services: Under Hive services, execution of commands and queries take place. Hive Web Interface has five sub-components.

CLI: Default command line interface provided by Hive for the execution of Hive queries/commands.
Hive Web Interfaces: It is a simple graphical user interface. This provides an alternative to the Hive command line and enables running queries and commands within the Hive application.
Hive Server: It is also called Apache Thrift. It is responsible for taking commands from different command-line interfaces and submitting all the commands/queries to Hive; also, it retrieves the final result.
Apache Hive Driver: It is responsible for taking the inputs from the CLI, the web UI, ODBC, JDBC, or Thrift interfaces by a client and passing the information to the meta store where all the file information is stored.
Metastore: Metastore is a repository to store all Hive metadata information. Hive’s metadata stores information such as the structure of tables, partitions & column type, etc.

要查看或添加评论，请登录

Darshika Srivastava的更多文章

LGD Model

2025年3月22日

LGD Model

Loss Given Default (LGD) models play a crucial role in credit risk measurement. These models estimate the potential…
CCAR ROLE

2025年3月21日

CCAR ROLE

What is the Opportunity? The CCAR and Capital Adequacy role will be responsible for supporting the company’s capital…
End User

2025年3月20日

End User

What Is End User? In product development, an end user (sometimes end-user)[a] is a person who ultimately uses or is…
METADATA

2025年3月19日

METADATA

WHAT IS METADATA? Often referred to as data that describes other data, metadata is structured reference data that helps…
SSL

2025年3月18日

SSL

What is SSL? SSL, or Secure Sockets Layer, is an encryption-based Internet security protocol. It was first developed by…
BLOATWARE

2025年3月17日

BLOATWARE

What is bloatware? How to identify and remove it Unwanted pre-installed software -- also known as bloatware -- has long…
Data Democratization

2025年3月15日

Data Democratization

What is Data Democratization? Unlocking the Power of Data Cultures For Businesses Data is a vital asset in today's…
Rooting

2025年3月13日

Rooting

What is Rooting? Rooting is the process by which users of Android devices can attain privileged control (known as root…
Data Strategy

2025年3月12日

Data Strategy

What is a Data Strategy? A data strategy is a long-term plan that defines the technology, processes, people, and rules…
Product

2025年3月11日

Product

What is the Definition of Product? Ask a few people that question, and their specific answers will vary, but they’ll…

See all articles

Hadoop vs Hive

Darshika Srivastava

Associate Project Manager @ HuQuo | MBA,Amity Business School

Difference Between Hadoop vs Hive

Hadoop’s Major Components

领英推荐

Master/Slave Architecture & Replication

Hive’s Major Components

Darshika Srivastava的更多文章

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

Hadoop versus Spark: Who’s winning?

Difference between RDBMS and HBase

Do I need Hadoop to be a good Data Scientist?

What Are The Key Differences Between Spark And Hadoop?

Introduction to Hadoop

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy

Configuration of HDFS Cluster with Ansible

What is Hive?

Hadoop 3: Comparison with Hadoop 2 and Spark

Difference Between Hadoop vs Hive

Hadoop’s Major Components

领英推荐

Master/Slave Architecture & Replication

Hive’s Major Components

Darshika Srivastava的更多文章

LGD Model

CCAR ROLE

End User

METADATA

SSL

BLOATWARE

Data Democratization

Rooting

Data Strategy

Product

社区洞察

其他会员也浏览了

Why do we need Hadoop for Data Science - NareshIT

Hadoop versus Spark: Who’s winning?

Difference between RDBMS and HBase

Do I need Hadoop to be a good Data Scientist?

What Are The Key Differences Between Spark And Hadoop?

Introduction to Hadoop

Hadoop – Hive, Impala, Zookeeper, and a Data Strategy

Configuration of HDFS Cluster with Ansible

What is Hive?

Hadoop 3: Comparison with Hadoop 2 and Spark