Data Gravity & AI flares up EDP wars
Images from the recent Snowflake, Databricks & Chill Data Summit with speakers and attendees mashed up with old Mac v/s PC Ad campaign

Data Gravity & AI flares up EDP wars


Do you remember the? “I’m a Mac and I’m a PC” ad campaign at the height of Microsoft v/s?Apple tech rivalry?

Well, attending recent data conferences and observations from the recent Snowflake v/s Databricks actions and market moves have made me go down the memory lane and reminded me of the bygone era of the personal computer OS wars from early 2000’s?

What do I mean by that? I’ll get to that story in a bit in this newsletter later but if you are in a hurry, here’s the TL;DR

“Data has gravity and the fight for Enterprise Data Platform (EDP) supremacy has AI applications at its core with Databricks and Snowflake heating up that market”

The fight for enterprise data platform supremacy has AI at its core

With the rapid pace of innovation in AI over the past decade, and the fast paced experimentation with GenAI since ChatGPT rolled out less than 2 years ago, one thing is clear - AI will eat all software and disrupt industries and our society in unimaginable ways.

The adoption of impactful use cases by organizations of all sizes to leverage their foundational data assets to better serve their customers with AI is quite promising and per McKinsey and PwC research studies, AI could increase global GDP by a staggering $13-15.7 Trillion by 2030.

If we zero in on enterprise data and how it is stored and used to make decisions that make organizations thrive, it should set the context of how? Data and AI are interrelated through the concept called Data Gravity.

Data gravity is the tendency of large data sets to attract smaller data sets, applications, and services. This is similar to how a planet's gravity pulls objects toward it, with the accumulation of data increasing its "gravitational pull". Benefits of data gravity include full visibility and more data volumes. More data gathered can help teams paint a full story and make informed decisions as well as have a bigger data set for AI to train on and hence better predictions. However, data gravity can also present challenges, such as difficulty and costs in moving unwieldy data. As datasets grow, they can become "heavy" and unwieldy, making it more complicated and expensive to migrate data from one source to another and? make management even harder. Data warehouses and data lakes are primary examples of data gravity.

The backstory: The elusive search for “Single source of truth” in the enterprise?

I started my career journey in Data three decades ago so let me share what I bserved from the trenches. Data warehouses initially emerged in the 1980s as a solution for organizing structured business data in enterprises and the repository of “single source of truth” for the company. However, by 2010, organizations began accumulating a significant amount of unstructured data to support more varied use cases, such as big data, predictive and prescriptive analytics. To address this, data lakes were introduced as an open, scalable system for any type of data and distributed data processing software Hadoop became synonymous with that hype. 4 V’s of data - Volume, Velocity, Variety & Veracity were discussed in every presentation I saw or made. ? By 2015, it became common for most organizations to operate both data warehouses and data lakes. This dual-platform approach, however, presented significant challenges in governance, security, reliability and management as the fundamentals of building a solid data strategy and data foundation layered architecture were often skipped.

Hyperscaler cloud? technologies like AWS, GCP and Azure gathered steam in the “move to cloud” wave and soon thereafter (~10 years ago), Snowflake emerged as the leading choice of “data warehouse on the cloud". Snowflake built its early success around a SQL-centric architecture and tight integration with BI tools, catering to data analysts and traditional IT departments with a closed, “it just works” solution.

Conceived in UC Berkeley AMP labs and Apache foundation driven open-source software Spark (later commercialized by Databricks) stepped in initially to overcome the shortcomings of the Hadoop ecosystem as an efficient in-memory distributed data processing system that was 100X faster and supported the iterative experiments that machine learning and data science use cases demanded.

Rise of the Enterprise Data Architecture and EDPs?

A few years later, the concept of the lakehouse to combine and unify the best of both data lakes and data warehouses was introduced by Databricks. Lakehouses store and govern all data in open formats, and natively support workloads ranging from BI to AI. Lakehouses offered a unified system to (1) query all data sources in an organization together and (2) govern all the workloads that use data (BI, AI, etc.) in a unified way. Lakehouse became its own category of data platform and is now widely adopted by enterprises and incorporated into most vendors' stacks.

Despite the progress, available data platforms in the market still face several major challenges such as steep learning curve & technical skill barrier, data quality, skyrocketing costs and poor performance of mismanaged data platform due to complexity, concerns on lineage, privacy,? security and governance of globally distributed data amplified by compliance mandates of regulations such as GDPR, HIPPA, CCPA and now the recent European AI Act and finally the iterative tuning and engineering demands of emerging AI applications that needs deep domain data specific knowledge

Many of these issues arise because current data platforms do not fundamentally understand the data in organizations and how it is used. Fortunately, generative AI presents a powerful new way to address exactly these challenges.

In essence, the impact of AI on data platforms will not be incremental, but fundamental: massively democratizing access to data, automating manual administration, and enabling turnkey creation of custom AI applications. All this will be enabled by a new wave of unified platforms that deeply understand an organization's data.?

Besides the leaders Databricks and Snowflake, all cloud as well as legacy tech vendors including IBM, Oracle, Google, Amazon, Microsoft, Salesforce are vying for a stronger foothold in the Enterprise Data Platform market.

?

Market Moves (some public and some stealthy) by the two leaders?

Recent high profile public conferences in the past couple of months by Snowflake and Databricks showcased their strategy as well as outcomes of their aggressive market moves including M&A activity from last year. Here are some market round-up observations?:

  • Snowflake has a "Spark Attack" initiative targeted at Spark workloads?
  • ?Databricks has a "Snowmelt" initiative that gives sellers bonuses for displacing Snowflake. It provides customers with discounts and credits if they switch

  • Driven by applied AI? turf war, last year Databricks paid $1.2 B for MosaicML acquisition while Snowflake acquired Reka.AI for $1 Billion and Generative AI search engine startup Neeva as an acquihire which eventually paved the way for? CEO Frank Slootman to yield reins to Google veteran and Neeva founder Sridhar Ramaswamy? to step up to? drive an AI makeover for Snowflake

  • Having AI model development inside the data platform allows Snowflake or Databricks to enable customers to securely use their enterprise data to build, fine-tune, and augment machine learning and generative AI models. This approach ensures that sensitive data and intellectual property remain within the organization's control, enhancing privacy and security — and importantly for Snowflake and Databricks, their moat and lock-in

  • Snowflake fired the first shot in open source recently? by open-sourcing Polaris , its catalog for Apache Iceberg, a popular open-source table format that’s compatible with any compute engine. Databricks countered by announcing its acquisition of Tabular , a managed solution for Iceberg created by the project's founders, right in the middle of Snowflake's conference. The following week, at their own conference? Databricks further upped the ante by open-sourcing its Unity catalog in front of a live audience

  • Insider scoop is that Snowflake had offered $600M for Tabular and Databricks outbid them for $2B This is considered a key win reinforcing Databricks’ position in the Open source community while Snowflake is trying hard to break free from its “closed black box” image
  • 8 of Snowflake's top 10 customers have moved workloads to Databricks
  • A Canadian customer Bond Brand Loyalty saved money by standardizing data work on Snowflake since less-technical users were able to use Snowflake?
  • The cloud providers are viewed as the largest competitive risk to both platforms. Snowflake noted Google Big Query is their biggest competitor - Microsoft an emerging threat. Microsoft added Databricks as a competitor in their most recent SEC filing
  • Databricks has reached out to existing investors to get detailed ownership records. Often viewed as a precursor to an IPO filing?
  • While both companies were running neck to neck in market capitalization hovering around $43-45B till July,? as of this publication date (Aug 31) Snowflake at $38.2 B is trailing behind Databricks at $46.1B and is on a nice trajectory for an IPO most likely in 2025 (my speculation)

Key technology & strategy announcements by the leaders and its ecosystem impact?

Each platform can be used for ingesting and analyzing huge sums of data — such as an airline trying to understand which customers are most likely to cancel their flights based on ticket price, destination and weather patterns. The market for this kind of software is rapidly growing and not entirely zero sum — many companies use both Databricks and Snowflake for different types of work, while countless others are still using older-generation tools that are traditional replacement targets, according to data from market research firms.

The strategic maneuvers highlighted in the previous section underscore how AI is redrawing the battle lines in enterprise data infrastructure. Enterprises are increasingly demanding interoperability and portable compute.

For Databricks, with its open-source roots, this is a natural evolution. For Snowflake, it marks a major shift from its traditionally closed approach. Both are racing to adapt as value migrates up the stack toward dynamic systems of models and tools built on top of their offerings.?

Lets see the implications of some recent technology and strategy moves and announcements by Databricks:

1. Databricks Unity Catalog Metrics enhancements and Unity Open Sourced. This simplifies data governance and opens up the platform for broader developer collaboration and accelerates development and innovation. Also leads to? enhanced customization, flexibility, and faster advancements. Key features include Unified data view, fine-grained access controls, automated data lineage tracking, comprehensive auditing. Unity now offers improved lineage capture and customization, surpassing Snowflake's Polaris Iceberg Catalog. Enhanced Unity metrics centralizes metric definitions for consistent and governed business metrics, accessible from various Databricks interfaces and seamlessly integrates with third-party tools.

2. GA of Lakehouse Federation and Monitoring: This improves data integration and real-time insights, enhancing governance and operational efficiency. Streamlines data management, real-time operational insights and cross-platform integration and governance.

3. Attribute-Based Access Control (ABAC): Fine-grained access control, dynamic policies, simplified security management. Enhanced security, compliance, and scalability.Detailed access permissions based on user attributes, dynamic policy adjustments. Easy integration with existing systems, improved regulatory compliance, and better data governance.

4 Enhanced Serverless Offerings: Targeting Snowflake's ease-of-use customers with streamlined deployment and management. Serverless-only features in 2025, highlighting a shift despite the open-source ethos.

5 Metadata and AI Integration: Lakehouse IQ: AI-driven data catalog enhancing querying, search, and documentation. Mosaic Integration: Embeds AI into data workflows for advanced analytics.

Impact of GenAI on convergence of Data, Analytics and EDPs

As the generative AI revolution has accelerated, the lines between the once distinct domains of data processing and iterative modeling have blurred. Building generative AI applications requires the ability to manage and process data (a traditional analytics skill) along with the ability to experiment with and fine-tune models (a data science skill). The worlds of analytics and AI are rapidly converging.

Databricks anticipated this convergence early and bet big on its "lakehouse" architecture, as discussed previously. This AI-friendly approach can efficiently store and process massive amounts of structured and unstructured data. Snowflake, despite its success in BI, was slower to adapt to the rising importance of AI. As the market shifted towards AI-centric use cases, it found itself falling behind, with support only for structured and semi-structured data.

Conclusion: What's in it for the CDOs & CIOs?

In short, it is getting interesting to watch these two EDP leaders fight it out in the market just like the “I’m a Mac and I'm a PC” era I mentioned in the beginning. The competitive environment in EDPs is intensifying both between Snowflake & Databricks as well as with first-party products from the mega cloud providers. While Databricks has a smaller revenue base, it is growing faster than Snowflake and? is outperforming for net-new IT? spend in the enterprise(SQL workloads, AI, legacy migrations etc).

As AI reshapes the software world, it is a common belief that the leaders in every industry will be those who leverage data and AI deeply to power their organizations.??EDPs will be a cornerstone for these organizations, enabling them to create the next generation of data and AI applications with quality, speed and agility.

Having been deeply embedded in the Data and Applied AI world for the past 3 decades with clients big and small around the world, I personally have a lot of war stories and “in the trenches” wounds and back stories of success from the frontlines of customer implementations. It's an exciting time to see the transformation from the front seat? and correlate with the rear view.


Parting thoughts &? key questions:

Do get in touch if you wish to discuss more especially if the #TheDigitalAgenda could be of help from strategy to blueprinting to organizational enablement for the Data & AI driven future.

1 Dear reader, what are your thoughts on this ?

2 Do you have similar? experiences or wish to collaborate?

(Please do share your comments, reshare with your network ?? and subscribe to this newsletter and click ?? if not already done so)

Piyush Malik

LinkedIN TopVoice 2023 | Data, AppliedAI, Technology & Strategy | CXO | BOD Advisor | Entrepreneur | Analytics | Cloud | Do click ?? to be notified of my latest posts

1 个月

In related news, Apache Iceberg cemented it's position further in the industry this week. More here : https://www.theregister.com/2024/10/14/apache_iceberg_feature_announcements/

回复
Kumar B Goel

CEO @ Lighted Road AI | Insurtech | Data | AI/ML | Drive profitable growth in Medicare

2 个月

Excellent article. ?? ??

Erica Brown

GenAI Research Scientist

2 个月

How to process 1 trillion rows in mere seconds? Can Databricks do it? It is now a reality, see https://mltblog.com/3z71oeP

  • 该图片无替代文字
回复
Christian Walenta

Program and Operations Manager * Data Governance * Data Quality * Supply Chain Management * Business Intelligence *

2 个月

very insightful, thanks Piyush ??

Great article Piyush Malik, this is the perfect read for a Monday morning over a coffee ?? Delighted to hear you found the Chill Data Summit in San Francisco helpful - thank you for sharing your photos and giving us a mention ??

要查看或添加评论,请登录

Piyush Malik的更多文章

  • How to Decode The Future?

    How to Decode The Future?

    What an amazing lineup of topics speakers at the IIT Bay Area Leadership conference this past weekend! My customary…

    10 条评论
  • China, AI and the Silk Road

    China, AI and the Silk Road

    It was this month seventeen years ago when I visited Beijing, China for the first time in 2007. Preparations for the…

    5 条评论
  • Humans of Data

    Humans of Data

    Last few editions of the TheDigitalAgenda (TDA ) newsletter have dealt with rapid advancements in spacial computing…

    18 条评论
  • Rise of the Agents:From 007 to AI Agent

    Rise of the Agents:From 007 to AI Agent

    In last month’s TDA newsletter edition we talked about advancements in all aspects of Movie-making facilitated by the…

    14 条评论
  • Movies and AI

    Movies and AI

    Welcome to my LinkedIN newsletter! In each issue of The Digital Agenda, I share my thoughts on what's hot in applied…

    5 条评论
  • Twinning is in vogue: The case of Digital Twins

    Twinning is in vogue: The case of Digital Twins

    This edition of The Digital Agenda newsletter explores the concept of digital twins and their transformative impact on…

    8 条评论
  • Will spatial computing make us more productive?

    Will spatial computing make us more productive?

    Welcome to my LinkedIN newsletter! In each issue of The Digital Agenda, I'll share my thoughts on what's hot in applied…

    4 条评论
  • State of the Union: ASEI@40

    State of the Union: ASEI@40

    Over the last weekend, I was priviledged to address the ASEIcon2023 attendees in Michigan gathered for ASEI's 35th…

    4 条评论
  • Design: The X-Factor in Transforming Enterprises in the Digital Age

    Design: The X-Factor in Transforming Enterprises in the Digital Age

    ** This post shares my POV, delves into the background of design, explores design thinking and methodologies in the…

    5 条评论
  • Silicon Valley is not a place...

    Silicon Valley is not a place...

    “Are you stupid? Why do you continue to stay in Silicon Valley with its high cost of living and high taxes?” I often…

    32 条评论

社区洞察

其他会员也浏览了