Unbiased view of bringing Synapse Analytics and Azure Databricks together

Unbiased view of bringing Synapse Analytics and Azure Databricks together

Disclaimer. This article represents personal experience and understanding of the authors. This article doesn’t represent official position of Microsoft.

There are many challenges that prevent organizations from realizing their advanced analytics mission:

  • There are so many advanced analytics solutions and offerings out there, many of which are difficult to understand and implement.?
  • Siloed data across teams and departments inhibit the development of unified data pipelines.?
  • Scaling challenges and performance constraints often represent a cost and implementation barrier for advanced analytics teams.?

As Azure Synapse brings the worlds of data warehousing, big data, and data integration into a single unified analytics platform,?there is continued investment?in improving performance for Apache Spark workloads?in?Azure Synapse.?

Spark in Azure Synapse Analytics is the OSS Apache Spark distribution with additional Microsoft proprietary optimizations. It also is deeply integrated in Azure Synapse and benefits from a unified security, networking, monitoring, CI/CD, management experience and meet strict JEDI compliance requirements. ?

Azure Databricks provides the premium Spark experience targeting data engineering, data science, and data analysis on Azure and contains unique Databricks IP that is not available in OSS Apache Spark distribution. Capabilities unique to Azure Databricks include a Databricks-optimized high-performance Spark engine, managed Delta Lake, and with ML Flow an enterprise data science workspace with collaborative notebooks. ?

We have made our first attempt?to create a decision tree that?gives an?unbiased view of bringing?Synapse and Azure Databricks together. You can access this interactive Decision Tree by following this link: Azure Synapse And Azure Databricks

Below are some of the things we have taken into consideration while creating this decision tree

  • Differences & Preferences are originated in the technology itself – they were built / meant for different things?
  • Synapse meant to solve “stitching together” problem but its core is Data?Warehouse?nevertheless?
  • Databricks is built for massive processing on read (applying transformations while reading)?

Write path:??

  • Synapse – ELT
  • Databricks – large ETL volumes?

Read path:??

  • Synapse – lots of “smallish” queries with substantial amount of joins
  • Databricks – large volumes?of data?with few joins & processing on read?

Feature by feature?comparison?doesn’t make a lot of sense generally, but:?

  • Time travel is only available in Databricks?
  • Databricks?provides more sophisticated security model on Spark than Synapse?
  • Native?Column-Level Security,?Row-level Security & Dynamic Data Masking?(without building views & with full integration with AAD) is only available in Synapse Dedicated?SQL Pool?
  • Synapse provides some sort of DR on top of Storage (DDL / definition-wise).?
  • Note: Synapse Storage is same price as storage accounts?

You can access this interactive Decision Tree by following this link: Azure Synapse And Azure Databricks and provide your feedback / submit questions in a public GitHub Repository. Thank you and have a very pleasant day!

@Elizabeth and @Andrei

?Simplicity is an ultimate sophistication. -- Leonardo Da Vinci

Pradeep Dadlani

Data Strategy and Architecture | Data Platforms | Data Management

2 年

Interesting article this one Elizabeth Antoine. I have always read/seen the two tech stacks along the lines as described. Good to see someone who is closely involved present an unbiased view.

Elizabeth Antoine

Regional Analytics Leader @ Microsoft | Board Director @ Avivo: Live Life | Executive MBA

3 年

We (Andrei Zaichikov,?Elizabeth Antoine,?Eleni Santorinaiou) are often asked a question - how up-to-date is?https://albero.cloud/? And we saw the same question in the comments. We are happy to announce that our post-Ignite review is finished. We have added all updates from Ignite and improved loads of things for you. 21 issues are closed, major updates now include rework of our Synapse & Databricks Decision Tree, added Auto-Scaling capabilities directly to Main DT, rework of Modern Data Analytics DT and more. In addition, we have added a list of all major public datasets available in Azure (look into Useful Materials section)

Andrei Zaichikov

Director, Enterprise Technology Strategy, EMEA at Pure Storage

3 年

Ramki, Rodney As for the cost – this is indeed a thing we are considering but as Elizabeth mentioned it is extremely complex product-wise. And, it has one more dimension which is people. Low qualification can lead to excessive consumption which doesn’t have anything to do with the functional fit of the service. On contrary, if service fits the purpose, it will be used more effectively and optimally. Unfortunately, quite often fake “simplicity” of the picture hides actual complexity and future troubles. This is what we would like to avoid by providing more robust and comparable criteria. Thank you once again for your feedback.

要查看或添加评论,请登录

Elizabeth Antoine的更多文章

社区洞察

其他会员也浏览了