登录查看更多内容

How to configure HUE in Cloudera CDH for JDBC compatible databases integration

Irfan Elahi

Specialist Solutions Architect @ Databricks | ex Deloitte | Author | Blogger | Instructor

发布日期: 2017年5月8日

If you are a Data Scientist or a Big Data Engineer then chances are that you must have heard of HUE (which stands for Hadoop User Experience). HUE is a tool of choice for many as it provides a GUI to interact with your Hadoop cluster. Using HUE, you can do quite a lot in your Hadoop cluster for instance:

HDFS: Browse, add/upload and delete files, create directories, set permissions
Interact with Hadoop Data Stores (e.g. Hive, Impala)
Configure Oozie jobs
Using Spark-Shell
Executing Map Reduce jobs

among many other tasks. It comes pre-packaged with Cloudera Distribution of Hadoop and also with HortonWorks (though HortonWorks tend to deprecate it and favors Apache Zeppelin or Ambari).

Also, HUE can be integrated with LDAP for user management (using Kerberos and it works with Sentry for enforcing respective authorization).

Meaning HUE is great when it comes to interacting with Hadoop. Fair enough.

This post isn't intended to be a thorough introduction or walk-through of HUE. What I wanted to cover is yet another interesting use-case that I came across which looked something like this:

Can HUE be used to query Non-Hadoop datastores or databases? or more specifically, can it be used to query Microsoft SQL Server (or if you are in Azure, then Azure SQL Data Warehouse?)

At first sight, this seems to be quite a stretch as HUE seems to strictly belong to and is meant for Hadoop right? However, interestingly, after extensive research and efforts, I was able to use HUE to query Azure SQL Data warehouse database. Cool isn't it? Not only that, I found that it can be used to query almost all of the databases which understand JDBC (which I think most of them should). I just thought to share the steps about how to configure HUE to query Non-Hadoop Databases e.g. SQL Server or Azure SQL Data warehouse.

Read the rest of the article here.

要查看或添加评论，请登录

Irfan Elahi的更多文章

FAQ – Apache Spark Tutorial for Beginners (Part -2)

2018年2月6日

FAQ – Apache Spark Tutorial for Beginners (Part -2)

Continuing from my last post where I answered some basic and frequently asked questions in the form of a novel Apache…
FAQ – Apache Spark Tutorial for Beginners (Part -1)

2018年1月15日

FAQ – Apache Spark Tutorial for Beginners (Part -1)

I am thinking to launch a new type of blog-post series which will comprise of FAQ (Frequently Asked Questions) about…
How To: Kafka Console Console Consumer and Producer in a Kerberized Hadoop Cluster

2017年11月30日

How To: Kafka Console Console Consumer and Producer in a Kerberized Hadoop Cluster

Kafka is one of the most widely used, reliable, scalable and distributed publish/subscribe messaging system used in…

1 条评论
[Deck] DataWorks Summit 2017 Sydney Talk - Memory Speed Big Data Analytics

2017年10月8日

[Deck] DataWorks Summit 2017 Sydney Talk - Memory Speed Big Data Analytics

On 20-21 September 2017, Hortoworks, in collaboration with the leading technology companies like IBM and others…

2 条评论
How to: Spark to read HDFS files, union DataFrames and store in Parquet format (Scala, Spark SQL)

2017年8月24日

How to: Spark to read HDFS files, union DataFrames and store in Parquet format (Scala, Spark SQL)

Lets consider this interesting use-case: You are using Apache Spark as your distributed in-memory computation…
How to: Apache Spark to find and filter Dirty Records (Scala/RDD)

2017年6月22日

How to: Apache Spark to find and filter Dirty Records (Scala/RDD)

Getting valuable insights out of Data Science pipeline is never that straight forward. Data folks usually encounter a…
Part 2: How to use Apache Spark for Business Analytics (Scala/RDD APIs)

2017年6月15日

Part 2: How to use Apache Spark for Business Analytics (Scala/RDD APIs)

In previous post, we started off by exploring how can we make use of Apache Spark for business analytics. We answered a…
How to use Apache Spark for Business Analytics (Scala, RDDs)

2017年6月1日

How to use Apache Spark for Business Analytics (Scala, RDDs)

The world of big data is pivoting away from legacy Map Reduce programming model on Hadoop to DAG (directed Acyclic…
How to run Apache Spark on a Windows Machine (using Scala/SBT)

2017年5月25日

How to run Apache Spark on a Windows Machine (using Scala/SBT)

If you bear some affinity with big data community, then I am sure that you definitely would’ve heard of Apache Spark…
Most Common and Important Data Science Job Interview Questions in Australia

2017年3月14日

Most Common and Important Data Science Job Interview Questions in Australia

Data Science is quite a diverse domain and the nature of recruitment process varies significantly. However these are…

See all articles

How to configure HUE in Cloudera CDH for JDBC compatible databases integration

Irfan Elahi

Specialist Solutions Architect @ Databricks | ex Deloitte | Author | Blogger | Instructor

Irfan Elahi的更多文章

社区洞察

其他会员也浏览了

Limiting The Storage In Hadoop Cluster By Data Node

"Getting Started with Hadoop on Ubuntu: Installation Made Easy"

Big Data Quick Tricks(Hive-Fixing Small File Issue)

Bigger Data, Smaller Problems: Managing Security Permissions of Data Subsets in Hadoop

Task 9.2: Create a Web Menu Using Python-CGI and API :"Integrating all the different technologies..!!"

Spark Or Hadoop -- Which Is The Best Big Data Framework?

How To Choose Hadoop - File Formats

UNDERSTANDING HADOOP DISTRIBUTED FILE SYSTEM {HDFS}

Setting Up Hadoop Cluster (HDFS) Locally

?? Task 9.2: Create a Web Menu Using Python-CGI and API :"Integrating all the different technologies!!!!"

Irfan Elahi的更多文章

FAQ – Apache Spark Tutorial for Beginners (Part -2)

FAQ – Apache Spark Tutorial for Beginners (Part -1)

How To: Kafka Console Console Consumer and Producer in a Kerberized Hadoop Cluster

[Deck] DataWorks Summit 2017 Sydney Talk - Memory Speed Big Data Analytics

How to: Spark to read HDFS files, union DataFrames and store in Parquet format (Scala, Spark SQL)

How to: Apache Spark to find and filter Dirty Records (Scala/RDD)

Part 2: How to use Apache Spark for Business Analytics (Scala/RDD APIs)

How to use Apache Spark for Business Analytics (Scala, RDDs)

How to run Apache Spark on a Windows Machine (using Scala/SBT)

Most Common and Important Data Science Job Interview Questions in Australia

社区洞察

其他会员也浏览了

Limiting The Storage In Hadoop Cluster By Data Node

"Getting Started with Hadoop on Ubuntu: Installation Made Easy"

Big Data Quick Tricks(Hive-Fixing Small File Issue)

Bigger Data, Smaller Problems: Managing Security Permissions of Data Subsets in Hadoop

Task 9.2: Create a Web Menu Using Python-CGI and API :"Integrating all the different technologies..!!"

Spark Or Hadoop -- Which Is The Best Big Data Framework?

How To Choose Hadoop - File Formats

UNDERSTANDING HADOOP DISTRIBUTED FILE SYSTEM {HDFS}

Setting Up Hadoop Cluster (HDFS) Locally

?? Task 9.2: Create a Web Menu Using Python-CGI and API :"Integrating all the different technologies!!!!"