Introduction to Hive

Introduction to Hive

  • The term ‘Big Data’ is used for collections of large datasets that include huge volume, high velocity, and a variety of data that is increasing day by day.
  • Using traditional data management systems, it is difficult to process Big Data.
  • Therefore, the Apache Software Foundation introduced a framework called Hadoop to solve Big Data management and processing challenges.

Hadoop

Hadoop is an open-source framework to store and process Big Data in a distributed environment.

It combination of two modules, one is MapReduce and another is Hadoop Distributed File System (HDFS).

  • HDFS: Hadoop Distributed File System is a part of Hadoop framework, used to store and process the datasets. It provides a fault-tolerant file system to run on commodity hardware.(Place where your data is get distributed)
  • MapReduce: It is a parallel programming model for processing large amounts of structured, semi-structured, and unstructured data on large clusters of commodity hardware.(Place where we write transformation jobs using java and then MapReduce will do distributed processing of your code by reading the data from HDFS)

now imagine you have to write join query so you will write 10-15 lines of java code but like oracle you can simply join it by single line.

so the SQL developers says that we are in to Hadoop because it is really good but the only problem is java is used in MapReduce so it should have SQL to communicate with MapReduce so here the use of Hive comes.

What is Hive?

? Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

? It was developed by Facebook.

? Hive is vehicle which runs on the vehicle engine i.e. MapReduce.

? Hive is a query engine because it doesn’t have storage to store the data it use HDFS to store the data.

? Hive is abstraction of MapReduce.

  • Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.
  • It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.

Hive is not

  • A relational database
  • A design for OnLine Transaction Processing (OLTP)
  • A language for real-time queries and row-level updates


SOURA SHANKAR SINHA

TIRC || Cost Engineering || Supply Chain || Analytics ||

11 个月

Amazing ??

Harsh Pratap Singh

Final year undergrad at HBTU Kanpur | Competitive Programmer | Frontend Web Developer

11 个月

Informative Article Sir ??

要查看或添加评论,请登录

Ankit Singh的更多文章

社区洞察

其他会员也浏览了