Unnoticed Gem - HBASE
Last decade of technology and data world was dominated by Hadoop and NoSQL. Organizations were racing to adopt these technologies. They were trying their best to set up bigdata, data science environments. Open source communities were fueling this rat-race by frequently adding new products to Hadoop eco-systems. This cycle of hype and disruption might continue for next few years. There is lot to be talked about each Hadoop component and it’s raise and fall. In this article I would like to focus on unnoticed gem – HBASE.
This all was triggered by a Google’s paper on BIGFILE and BIGTABLE. HDFS evolved from BIGFILE, it is still center of attraction. Surprisingly HBASE built on BIGTABLE couldn’t catch imagination of crowd.
What went wrong with HBASE?
NoSQL databases like MongoDb and Cassandra provided alternatives to HBASE. Commercial supports to these database and marketing strategies proved them better. Many organization accepted Hive with HDFS as a better option in data exploration. Lack of technical expertise required for HBASE impacted it’s popularity. And slowly this gem was ignored and forgotten in many cases.
Is HBASE required at all?
In my view, adopting HDFS and not setting a layer of HBASE on top of it looks like a big mistake. HDFS is immutable – this makes it bad for warehouse and ETL use cases. Ironically, Organizations built their own smart solutions as a workaround for this. No one realize there is superior, readymade solution available in the form of HBASE.
HBASE is the cheapest option to store billions of records that can be updated and retrieved randomly. It’s selective record retrieval is quite fast too. It can handle massive inserts and updates(transactions per second). These features can make it a perfect backend system for many use cases.
What’s next? – what makes me call it as a gem.
apache phoenix is trying to make HBASE much more simpler for SQL users. HBASE with Phoenix can act as a platform for low latency queries and data discovery.
HBASE comes with rest API that makes it as a ready to plug-in backend for digital channels . I was able to built a webpage in few minutes to retrieve and display data from HBASE.
I am always a fan of AWS :). Recently I tried amazon EMR cluster with HBASE. AWS has made HBASE installation and configuration quite simple. Most remarkable feature is - possibility of using S3 for HBASE. I am pretty sure, in coming years we will see HBASE rising in popularity
Senior Technical Consultant IV at NCR Atleos
5 年Thanks Sudhir
Solutions architect/manager & leadership
5 年interesting read Sudhir