Writing Queries on Hadoop using Hive (Bigdata)
Abhishek Choudhary
Data Infrastructure Engineering in RWE/RWD | Healthtech DhanvantriAI
Hive is Data Warehouse Infrastructure on Hadoop.
It Simply means you need not to write code in any language if you don't know and simply use basic SQL queries to fetch data so the obvious benefit for Non programmer and even programmer as well :-)
Mainly its focused on Analytical Querying of data over time.
Hive is mainly focused on using Hbase as Data Warehouse Analytic Processing when fast response times are not required.
even Hive can be used to bulk load data into a Table
HBase as a Hive source and sink
Main components of Hive are -
- Interactive Command Line Shell like sql shell
- JDBC , ODBC drivers are provided to access Hive like any other traditional Database
- Apache Thrift client is provided to use Hive from any other languages like Java , Python etc.
- Driver moves HQL statements through all phases to become a series of MapReduce jobs and returns the result
- Metastore contains meta data of Hive Table, contains the Table definition, Table Name , columns and data type.
Ways to Use Hive-
- Hive Managed Table - use Hive as both source and sink
- Hive External table - Map external Hbase table to Hive
What is Hive Queries-
They are just series of Map Reduce jobs as follows-
Keep one thing in mind , if you want real time analytics , its NOT recommended, better you check Apache Spark or Apache Ignite. But even Hive can be used in Apache Spark for analytics so explore more...