Introduction to Hive
Ankit Singh
Research Scholar @Glbimr || 5?in SQL @Hackerrank || Data Analyst || SQL || Python || Machine Learning|| Data Scientist || Poem Writing skill
Hadoop
Hadoop is an open-source framework to store and process Big Data in a distributed environment.
It combination of two modules, one is MapReduce and another is Hadoop Distributed File System (HDFS).
now imagine you have to write join query so you will write 10-15 lines of java code but like oracle you can simply join it by single line.
so the SQL developers says that we are in to Hadoop because it is really good but the only problem is java is used in MapReduce so it should have SQL to communicate with MapReduce so here the use of Hive comes.
What is Hive?
? Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
? It was developed by Facebook.
? Hive is vehicle which runs on the vehicle engine i.e. MapReduce.
? Hive is a query engine because it doesn’t have storage to store the data it use HDFS to store the data.
? Hive is abstraction of MapReduce.
Hive is not
TIRC || Cost Engineering || Supply Chain || Analytics ||
11 个月Amazing ??
Final year undergrad at HBTU Kanpur | Competitive Programmer | Frontend Web Developer
11 个月Informative Article Sir ??