Same but Different, Different but Same

Same but Different, Different but Same

Have you ever been confused by some terminologies, like DataProc, EMR, Hadoop, HDInsight? If you are feeling bad, then, don’t. I am getting confused a lot, specially in the beginning when I heard some terminologies. Just recently realized confusion is a necessary step to understand it.

Recently I got the exact question regarding DataProc and EMR. They are in the situation of same but different, different but the same. So here in this article, I am try to give some clarification with example of DataProc, EMR and basic cloud services. I hope it helps, a bit.

A brief history of Hadoop and different distributions

When a new solution comes out in the market, either open sources or commercial. It will be adapted by different players as soon as it become reasonably successful. Let’s take an example of Hadoop.

Hadoop first releases at April 1, 2006. ?Hadoop makes big data possible first time on commodity hardware. And it sooner get popularized in the market. And the big 3 started offering Hadoop service one after another.

AWS then started offering EMR (Elastic Map Reduce) ?in 2009 which is Amazon’s version of Hadoop.

Azure started offering HDInsight in 2013 ?which is again Microsoft’s version of Hadoop.

Google Cloud Platform started offering their version in 2015 , named DataProc.

Hadoop arrived its hype peak position. Quite many companies deploy their Hadoop solutions.

In the recent years, Hadoop has been falling sharply in the market. Mainly it is due to the large adaptence of cloud computing. Leave a comment below if you are interested why it happens.

Different but Same

AWS EMR, Azure HDInsight, Google DataProc are all based on the same underlying technology, Hadoop. They have a lot of same components, a little bit of different add-ons individually.

So they can conceptually the same.

Hadoop is falling sharply. The same apply to AWS EMR, Azure HDInsight, or Google DataProc. They are all falling.

If you expected things working out exactly the same, then the expectation is too high. They are different.

Same but Different

Each company, when they package Hadoop, they add a bit of their flavor in. Just like with the same recipe, different chef will cook a bit differently. For the better or worse.

In the technology space, different means, you can not just click a button and move over from HDInsight to DataProc. There are quite some detail differences need be handled.

Relationship of EMR, HDInsight, DataProc

Here is a simplified illustration of their relationship.

No alt text provided for this image

For simplicity, an extra layer of distributions like Hortonwork is abbreviated

Basic cloud service in the big 3 cloud providers

Why only list so few service? Because with the following basic services, powerful systems can be build.

AWSAzureGCPS3: Simple Storage ServiceADL: Azure Data LakeGCS: Google Cloud StorageEC2: Elastic Compute CloudVirtual MachineCompute EngineRDS: Relational Database ServicePostgresCloud SQLRouter 53Azure DNSCloud DNS

Summary

Hope you enjoy reading.

* Same but different

* Different but the same

* For the better or the worse

* For the grow of ecos

Just a recommendation, don’t use DataProc, EMR, HDInsight if you start a brand new project. Leave comment below if you wanna hear why and why Hadoop is fading out.

Thanks to?Ruben Laguna ?for prove reading

The article is originally published at https://knockdata.com/blog/same-but-different-but-same.html

Alex S.

Platform Engineer | NoSQL Databases @ Adyen | ASF contributor

2 年

Have you considered adding #databricks to the party? Nice comparison btw! ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了