Day 3 of Databricks vs Snowflake vs Fabric: Which One Should I Pick?

Day 3 of Databricks vs Snowflake vs Fabric: Which One Should I Pick?

While I do believe Databricks is generally speaking the best of the three platforms, this series is written with the intent of being as objective as possible and providing valuable feedback to individuals and businesses evaluating their data stack choices, as well to vendors to take the feedback from both what I say and what others write in the comments, and help them develop better products.

Things to consider: I've spent a good amount of time over the years with these platforms (and their components) in different capacities. Additional knowledge on the different areas has been primarily gained from reading vendor marketing and documentation websites, watching videos from experts in the platforms, as well as spending hands-on time.

This is the last post in this series.

Intro

I get asked fairly often what my thoughts are across these platforms. In today's post, I've gone through a few different scenarios where each of the tool would be suitable. It is entirely possible for two tools to be able to cover the same need, though I did try to be deterministic when I felt one had a significant enough edge over a close competitor.

Last but not least, this is not an exhaustive list, but it will hopefully paint a picture on how to think about stack choices from a technical perspective. I've listed them in order from what I believe to be the best all the way to the worst.

Suitable Scenarios For Each Platform

(1) Databricks is very good when the following conditions are true:

Team(s):

A) Experienced SQL developers.

B) Mid level and above Python developers.

C) Any level Scala developers.

Any of the following needs:

A) Processing and storing data for BI tools, ad-hoc reporting, and web-based applications.

B) Any ML/AI related uses.

C) Working with structured, semi-structured, and unstructured data.

Scale:

A) Any scale. All else being equal, I would favor Databricks over other options if I expected the size of data to grow significantly over time.

Why?

Databricks is the jack of all trades and the master of some in the data platform world. Right out of the box, it offers best-in-class AI/ML and data engineering features, while offering a strong performance in analytical capabilities with its SQL engine, AI/BI Genie, and even its own built-in Dashboards. It is built in a way that plays well with small and large workloads, and any kind of data.

Considerations

1) Databricks offers a lot of flexibility and features right out of the box, and with those good things comes steeper learning curve and complexity that that is amplified by inconsistent quality of documentation.

2) Cost controls historically have not been as robust as the ones offered by Snowflake, and it is an area I've openly criticized. In the last few months , I've seen more and more features that are helping close the gap.

3) In testing that I did roughly 2-3 months ago in the low billion-rows scale, when it came to analytical queries, Snowflake performed better, though Databricks' results were still acceptable and not a deal breaker.


(2) Snowflake is very good when the following conditions are true:

Team(s):

A) Experienced SQL developers.

B) Individuals new to SQL but willing to learn.

Any of the following needs:

A) Processing and storing data for BI tools, ad-hoc reporting, and web-based applications.

B) Data and application monetization through the Marketplace.

Scale:

A) While it can support more than 10B row tables, I personally would not use it after that row count for cost purposes. See point 3 under "Considerations".

Why Snowflake?

Snowflake does SQL right. The UI is fairly straight forward, solid price cost management options, and offers a very robust & mature partner ecosystem allowing great levels of customization to suite your developers' needs. The documentation it offers might make it less intimidating to those that are new to SQL and cloud platforms as well.

Considerations:

1) Vanilla Snowflake generally lacks depth when it comes to the development experience and richness of features, and is best paired up with additional tools that may or may not increase the total cost of ownership. Note this does not apply to the richness of the SQL functions it supports, in which it does a fantastic job, as well as other exceptions.

2) Though it offers AI/ML capabilities and supports other languages besides SQL (Python, Scala, Java), Snowflake is not known as the industry leader in either of these categories, but it can be used for these purposes and many individuals do use it for such.

3) In testing that I did roughly 2-3 months ago in the low billion-rows scale, when it came to ETL, Databricks generally performed significantly better, though Snowflake's results were still acceptable and not a deal breaker.


(3) Fabric is very good when the following conditions are true:

Team(s):

A) Experienced Power BI developers with no SQL experience or analytical type SQL experience only.

Any of the following needs:

A) Internal Reporting & customer facing reporting.

B) Reporting centric applications.

Scale:

A) Tables ranging from thousands of rows to the ~1B row mark. It can support more, but costs are likely to scale disproportionally to your data needs.

Why?

An experienced PBI developer is able to utilize a lot of the powerful features built into Power BI. Since Fabric is built around PBI, this gives these developers a strong head start in being able to leverage what the Fabric ecosystem has to offer.

Considerations:

1) Power BI is easy to get started with, but tough to master. If the team is not experienced with Power BI, it will be easy to build inefficient data models that will result in decreased reporting performance and increased Fabric capacity costs.

2) Doing just about anything outside of the listed business needs listed is likely to require make-shift solutions.

3) Proper architecture in Fabric is still up in the air. Experienced Power BI developers without architecture experience will be more prone to building spaghetti architecture that will be hard to maintain for the existing team and be painful to manage with employee attrition.

Closing Words

Hope you enjoyed this series! I welcome and encourage respectful debate on what I shared here! Two thing that are important to call out:

A) Bad architecture will be costly, no matter the platform.

B) It is a fantastic thing to be able to have a choice when it comes to platforms and truly appreciate what each of these platforms do for the data world.

Tarun Kumar

Builder @ Snowcap-datalab | Fractional Data Architect | AI & LLM Ops Expert

2 个月

As someone who loves both Snowflake and Databricks, Here’s the thing: both are incredible in their own right, and it’s exciting to see advocates championing their preferred platform. That’s how innovation grows. let’s celebrate the strengths of both. To add some perspective, I’m sharing this insightful article: https://select.dev/posts/snowflake-vs-databricks Let’s focus on building solutions, not silos. What’s your take?

回复
Peter Wilkerson

Data guides; Knowledge directs.

3 个月

Love your observations. Getting into MS Fabric myself. I find that people who are already familiar with parts of the Microsoft data ecosystem sometimes seem to have a tendency to let the familiar obscure the new. If you think about it, the pieces that made up Guttenberg's printing press wasn't new. Moveable type, presses, paper printing had existed before. The revolutionary part was bringing what was know together in a new way. Not saying MS Fabric is the same as the printing press...too early to tell. BTW, I had quite the chuckle about death from a thousand clicks!

回复
Simone Nogara

Cloud Security Architect | Securing the Future of Multi-Cloud | Reformed Black Hat Computer Hacker

4 个月

Great wrap-up to the series! ??

回复
Josue A. Bogran

VP of Data + AI @ zeb | Advisor to Sigma, Kythera Labs, and Lumel | Databricks Product Advisory Board Member & Databricks MVP

5 个月

To any of the vendors/PMs that disagree with what I shared: As long as you do so respectfully, I'll happily hear what you have to say if you reach out to me privately or set a 1 on 1 with me. I'll even happily write about your point of view in a respectful way.

Philipp Schwade

Director Strategic Accounts at Snowflake | Pioneering the Digital Frontier | Specialist in Manufacturing & Automotive Industries

5 个月

How objective can your evaluation and comparison be….??

  • 该图片无替代文字

要查看或添加评论,请登录

Josue A. Bogran的更多文章

社区洞察

其他会员也浏览了