Writing Queries on Hadoop using Hive (Bigdata)
Mapr

Writing Queries on Hadoop using Hive (Bigdata)

Hive is Data Warehouse Infrastructure on Hadoop.

It Simply means you need not to write code in any language if you don't know and simply use basic SQL queries to fetch data so the obvious benefit for Non programmer and even programmer as well :-)

Mainly its focused on Analytical Querying of data over time.

Hive is mainly focused on using Hbase as Data Warehouse Analytic Processing when fast response times are not required.
even Hive can be used to bulk load data into a Table
HBase as a Hive source and sink

Main components of Hive are -

  • Interactive Command Line Shell like sql shell
  • JDBC , ODBC drivers are provided to access Hive like any other traditional Database
  • Apache Thrift client is provided to use Hive from any other languages like Java , Python etc.
  • Driver moves HQL statements through all phases to become a series of MapReduce jobs and returns the result
  • Metastore contains meta data of Hive Table, contains the Table definition, Table Name , columns and data type.

 

Ways to Use Hive-

  • Hive Managed Table - use Hive as both source and sink
  • Hive External table - Map external Hbase table to Hive

What is Hive Queries-

They are just series of Map Reduce jobs as follows-

 

Keep one thing in mind , if you want real time analytics , its NOT recommended, better you check Apache Spark or Apache Ignite. But even Hive can be used in Apache Spark for analytics so explore more...

 

要查看或添加评论,请登录

Abhishek Choudhary的更多文章

  • Slack New Architecture

    Slack New Architecture

    This article presented the architecture/engineering decisions and changes brought in Slack to Scale it massively but by…

  • Unit Testing Apache Spark Applications in Scala or Python

    Unit Testing Apache Spark Applications in Scala or Python

    I saw a trend that developers usually find it very complicated to test spark application, may be no good library…

  • Spark On YARN cluster, Some Observations

    Spark On YARN cluster, Some Observations

    1. Number of partitions in Spark Basic => n Number of cores = n partitions = Number of executors Good => 2-3 times of…

    4 条评论
  • Apache Spark (Big Data) Cache - Something Nice to Know

    Apache Spark (Big Data) Cache - Something Nice to Know

    Spark Caching is one of the most important aspect of in-memory computing technology. Spark RDD Caching is required when…

  • Apache Airflow - if you are bored of Oozie & style

    Apache Airflow - if you are bored of Oozie & style

    Apache Airflow is an incubator Apache project for Workflow or Job Scheduler. DAG is the backbone of airflow.

    1 条评论
  • Apache Spark Serialization issue

    Apache Spark Serialization issue

    Its bit common to face Spark Serialization Issue while working with Streaming or basic Spark Job org.apache.

    3 条评论
  • Few points On Apache Spark 2.0 Streaming Over cluster

    Few points On Apache Spark 2.0 Streaming Over cluster

    Experience on Apache Spark 2.0 Streaming Over cluster Apache Spark streaming documentation has enough details about its…

  • Facebook Architecture (Technical)

    Facebook Architecture (Technical)

    Facebook's current architecture is: Web front-end written in PHP. Facebook's HipHop Compiler [1] then converts it to…

  • Apache Flink ,From a Developer point of View

    Apache Flink ,From a Developer point of View

    What is Apache Flink ? Apache Flink is an open source platform for distributed stream and batch data processing Flink’s…

    2 条评论
  • Apache Spark (big Data) DataFrame - Things to know

    Apache Spark (big Data) DataFrame - Things to know

    What is the architecture of Apache Spark Now? What is the point of interaction in Spark? Previously it was RDD but…

    6 条评论

社区洞察