登录查看更多内容

Apache Hive

Sejal Baweja

Teller at HDFC Bank

发布日期: 2023年1月5日

Apache Hive is an open source data warehouse software for reading, writing and managing large data set files that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems such as Apache HBase. Hive enables SQL developers to write Hive Query Language (HQL) statements that are similar to standard SQL statements for data query and analysis. It is designed to make MapReduce programming easier because you don’t have to know and write lengthy Java code. Instead, you can write queries more simply in HQL, and Hive can then create the map and reduce the functions

Included with the installation of Hive is the Hive metastore, which enables you to apply a table structure onto large amounts of unstructured data. Once you create a Hive table, defining the columns, rows, data types, etc., all of this information is stored in the metastore and becomes part of the Hive architecture. Other tools such as Apache Spark and Apache Pig can then access the data in the metastore.

As with any database management system (DBMS), you can run your Hive queries from a command-line interface (known as the Hive shell), from a Java? Database Connectivity (JDBC) or from an Open Database Connectivity (ODBC) application, using the Hive JDBC/ODBC drivers. You can run a Hive Thrift Client within applications written in C++, Java, PHP, Python or Ruby, similar to using these client-side languages with embedded SQL to access a database such as IBM Db2? or IBM Informix?.

Hive looks like traditional database code with SQL access. However, Hive is based on Apache Hadoop and Hive operations, resulting in key differences. First, Hadoop is intended for long sequential scans and, because Hive is based on Hadoop, queries have a very high latency (many minutes). This means Hive is less appropriate for applications that need very fast response times. Second, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations. It is better suited for data warehousing tasks such as extract/transform/load (ETL), reporting and data analysis and includes tools that enable easy access to data via SQL.

If you're interested in SQL on Hadoop, in addition to Hive, IBM offers IBM Db2 Big SQL, which makes accessing Hive data sets faster and more secure. Check out the video below for a quick overview of Hive and Db2 Big SQL..

要查看或添加评论，请登录

Sejal Baweja的更多文章

Team Unity

2023年3月14日

Team Unity

One of the most basic and foundational aspects of team building is the concept of team cohesion. It’s the motivating…
Random Forest

2023年3月13日

Random Forest

Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele Cutler, which combines…
Decision Tree

2023年3月11日

Decision Tree

A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and…
Hibernate

2023年3月9日

Hibernate

Hibernate is an open source?object relational mapping (ORM)?tool that provides a?framework?to…
YARN

2023年3月7日

YARN

YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.
Medical Coding

2023年3月6日

Medical Coding

Medical coding is the transformation of healthcare diagnosis, procedures, medical services, and equipment into…
CCAR

2023年3月4日

CCAR

Comprehensive Capital Analysis and Review (CCAR)The Comprehensive Capital Analysis and Review is a stress-test regime…
Logistic Regression

2023年3月3日

Logistic Regression

Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning…
Model Validation

2023年3月2日

Model Validation

Model validation?is the process that is carried out after?Model Training?where the trained model is evaluated with a…
Node.js

2023年3月1日

Node.js

Node.js is an open-source, cross-platform JavaScript runtime environment and library for running web applications…

See all articles

Apache Hive

Sejal Baweja

Teller at HDFC Bank

Sejal Baweja的更多文章

社区洞察

其他会员也浏览了

HIVE

Hive

Bulk Data Load using Apache Sqoop

Beginners Guide to Apache HIVE.

Hive Metastore

Apache Sqoop

APACHE HIVE

Sqoop

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Hive vs Spark

Sejal Baweja的更多文章

Team Unity

Random Forest

Decision Tree

Hibernate

YARN

Medical Coding

CCAR

Logistic Regression

Model Validation

Node.js

社区洞察

其他会员也浏览了

HIVE

Hive

Bulk Data Load using Apache Sqoop

Beginners Guide to Apache HIVE.

Hive Metastore

Apache Sqoop

APACHE HIVE

Sqoop

CONFIGURING HADOOP CLUSTER USING ANSIBLE

Hive vs Spark