Swimming in a lake of confusion: Does the Hadoop data lake make sense?

All hail the data lake, destroyer of enterprise data warehouses and the solution to all our enterprise data access problems! Ok – well, maybe not. In this post I want to talk about the confusion in the market I am seeing around the data lake phrase, including a look at how the term seems to be evolving within organizations based on my recent interactions.

In previous blog posts, I discussed three common use cases for deploying Hadoop alongside existing enterprise data warehouse infrastructures, along with some of the benefits and reasons for doing so. All three cases either required no change to existing approaches, or they required only small changes in where the data should be flowing when Hadoop was introduced. Effectively, they are less disruptive approaches.

Over the past six months, however, I've started hearing more questions around something that many refer to as the data lake (you might also hear it called the enterprise data hub or other derivatives of that term). As you'll commonly find during emerging phases of a popular new technology, there's great confusion about the meaning of this term, and I think it is evolving.

Click the link to read more: https://blogs.sas.com/content/sascom/2014/10/20/swimming-in-a-lake-of-confusion-does-the-hadoop-data-lake-make-sense/

Avik Chakrabarti

Author, Digital Technologist

9 年

True, but since in your article you said that orgs are operating 2 systems parallelly, one the EDW and another Hadoop deployment (may be any platform like Greenplum etc etc). Now if going forward you they want to reach a sweeter point of a single system (say a Hadoop file system but with an SQL interface layer like Cloudera Impala or Splice machine) then they will have to plan the migration including the tools like BI /Marketing Automation etc along with the workforce (the larger analytics workforce not only Data Science). I understand your point of the structured and unstructured part but what I am seeing within orgs(worlds largest companies i mean here) are struggling with their Big data adoption strategy as they are thinking that its an unique discipline which they need to build which is not the case. It is about incorporating extra dimensions to the existing data ...let the entire analytics workforce (just with knowledge of basic SQL and data manipulation) get on it ...only then you will be successful ...not with a handful of Data Scientists.Hope this explains pretty crudely.

回复
Avik Chakrabarti

Author, Digital Technologist

9 年

Whatever may it be called as "Data Lake" /Data Hub/ Enterprise Information Hub the only thing is needs to follow is that it should be compatible to existing BI/Marketing/Analytics tools in the org and the workforce should be able to access that and work on it (not ONLY restricted to "Data Scientists")

回复

要查看或添加评论,请登录

Mark Torr的更多文章

社区洞察

其他会员也浏览了