How to configure HUE in Cloudera CDH for JDBC compatible databases integration

If you are a Data Scientist or a Big Data Engineer then chances are that you must have heard of HUE (which stands for Hadoop User Experience). HUE is a tool of choice for many as it provides a GUI to interact with your Hadoop cluster. Using HUE, you can do quite a lot in your Hadoop cluster for instance:

  • HDFS: Browse, add/upload and delete files, create directories, set permissions
  • Interact with Hadoop Data Stores (e.g. Hive, Impala)
  • Configure Oozie jobs
  • Using Spark-Shell
  • Executing Map Reduce jobs

among many other tasks. It comes pre-packaged with Cloudera Distribution of Hadoop and also with HortonWorks (though HortonWorks tend to deprecate it and favors Apache Zeppelin or Ambari). 

Also, HUE can be integrated with LDAP for user management (using Kerberos and it works with Sentry for enforcing respective authorization).

Meaning HUE is great when it comes to interacting with Hadoop. Fair enough.

This post isn't intended to be a thorough introduction or walk-through of HUE. What I wanted to cover is yet another interesting use-case that I came across which looked something like this:

Can HUE be used to query Non-Hadoop datastores or databases? or more specifically, can it be used to query Microsoft SQL Server (or if you are in Azure, then Azure SQL Data Warehouse?)

At first sight, this seems to be quite a stretch as HUE seems to strictly belong to and is meant for Hadoop right? However, interestingly, after extensive research and efforts, I was able to use HUE to query Azure SQL Data warehouse database. Cool isn't it? Not only that, I found that it can be used to query almost all of the databases which understand JDBC (which I think most of them should). I just thought to share the steps about how to configure HUE to query Non-Hadoop Databases e.g. SQL Server or Azure SQL Data warehouse.

Read the rest of the article here.


要查看或添加评论,请登录

Irfan Elahi的更多文章

社区洞察

其他会员也浏览了