Why Databases Won't Charge for Storage in the Future

Why Databases Won't Charge for Storage in the Future

The database is being unbundled. Historically, a database like Snowflake sold both data storage & a query engine (& the computing power to execute the query). That’s step 1 above.

But, customers are pushing for a deeper separation of compute & storage. The recent Snowflake earnings call highlighted the trend. Larger customers prefer open formats for interoperability (step 2 & 3).

A lot of big customers want to have open file formats to give them the options…So data interoperability is very much a thing and our AI products can generally act on data that is sitting in cloud storage as well.
We do expect a number of our large customers are going to adopt Iceberg formats and move their data out of Snowflake where we lose that storage revenue and also the compute revenue associated with moving that data into Snowflake.

Instead of locking the data in one database, customers prefer to have it in open formats like Apache Arrow, Apache Parquet, Apache Iceberg.

As data use inside of an enterprise has expanded, so has the diversity of demands on that data.

Rather than copying it each time for a different purpose whether it’s exploratory analytics, business intelligence, or AI workloads, why not centralize the data and then have many different systems access it?

This saves money : Storage is about $280m-300m overall for Snowflake.

As a reminder, about 10% to 11% of our overall revenue is associated with storage.

But it also simplifies architectures.

It also ushers in an epoch where the query engines will compete for different workloads with price & performance. Snowflake may be better for large-scale BI ; Databricks’ Spark for AI data pipelines ; MotherDuck for interactive analytics.

Data warehouse vendors have marketed the separation of storage & computein the past. But, that message was about scaling the system to handle bigger data within their own product.

Customers demand a deeper separation - a world in which databases don’t charge for storage.

Marco Ullasci

Data Solutions Architect, Singapore PEP

5 个月

The market is right even when it's (environmentally) wrong: no one can win against it. in the process R.i.P. computational efficiency sacrificed on the altar of open formats. Additional caching mechanisms will come to the rescue once the tradeoffs of the opens formats become unacceptable. Just like it happened with many data federation solutions in the past.

回复
Matthew Birdsall

Experienced Data Leader | Builder | Father

11 个月

This has been one of the intentions of the big data revolution—why are we saying that we’re inching towards such an era in Cloud Data Warehousing? Presto, Trino… even Redshift to some extent—have had independently scalable compute and external storage layers. I guess I’m just confused? Maybe this is specific to Snowflakes strategy and not their technology. I worked on an implementation on snowflake in 2019-2020 that I didn’t design. The design called for Snowflake manage storage but I used extensively external schemas pointed at S3 and saved storage costs. Using Athena right now is extremely cost effective, and its compute is completely decoupled from its storage layer.

回复

Tomasz Tunguz a question. is the main reason customers are calling for the decoupling "cost of storage" , "interoperability" for different usecases (as in 3), or is it to make the current combination of storage compute more efficient? (as in 2). The reason for the question is that databricks and snowflake has indeed made a lot of changes and have been switching their architecture towards these open formats. As you mention, this is to handle more data more efficiently in their own platform. but....this might be its own reward for customers since more efficiency brings down cost in the current operations (where storage is not the main cost) so can you tell us a bit more about the motivations you see?

回复

This assumes that all data must be moved to "storage" first. What about keeping data in the source system and query this when needed. Right now we are already doing one copy of all data we think we need. Storage his cheaper now and fast enough for most uses. Streaming data might be the exception! But there is still duplication and it can create inconsistent data (between storage and source) An interesting future would be when data (mostly) remains in source systems. The data that has most speed requirements will be moved to storage. The rest remains. Everything gets a metadata tag. Queries are done on metadata and retrieved from either What would be needed for this? Faster transfer? Other type of indexing and tagging of source system data? Better search? Different source systems? Snowflakes invention was to hash everything in a different way and that meant they could distribute data differently while keeping qry and retrieval time as before (or faster). Could something similar be made with a hybrid storage approach?

回复

Great article Tomasz Tunguz! This is the biggest trend in the data space right now and is going to have a huge impact. This solves one of the biggest challenges that enterprises face - data lock-in. Freeing your core data from the clutches of vendors enables better value generation from it - from a variety of tools that can process that single copy of data. As an analytics vendor, open table formats allow us to employ specialized compute engines tailored for specialized workloads such as event data, time series, graph etc. instead of being forced to work with a lowest common denominator SQL engine; and we can do that without having to make copies of the data into proprietary stores. Further, it lets us monetize better by being able to own and charge for compute too. Game changing!

要查看或添加评论,请登录

Tomasz Tunguz的更多文章

  • The Third UI : The Rise & Fall of the Keyboard

    The Third UI : The Rise & Fall of the Keyboard

    I remember the day I received it : my first Blackberry. A few weeks later I lost it in the back of a taxi cab in Paris.

    18 条评论
  • The Implications of the Wiz/Google Deal

    The Implications of the Wiz/Google Deal

    Is tech M&A back? Google announced its intention to buy Wiz for $32b today. If approved by regulators, it would be the…

    6 条评论
  • Halving R&D with AI & the Impact to Valuation

    Halving R&D with AI & the Impact to Valuation

    Engineering teams within AI application startups are much smaller than a classic software company - maybe half the size…

    8 条评论
  • The Mirage in the Software Clouds

    The Mirage in the Software Clouds

    Public SaaS companies’ growth rates have halved since 2023, as David Spitz pointed, from 36% to 17%. Why? There are…

    12 条评论
  • This Analysis Cost 27 Cents

    This Analysis Cost 27 Cents

    Monday’s analysis cost about 27 cents to produce. This little screenshot is of Claude Code, the product I use now to…

    9 条评论
  • Positioning Startups in the Age of AI

    Positioning Startups in the Age of AI

    How do you position and scale an AI company in a rapidly evolving market? Join us for an in-person Office Hours session…

    6 条评论
  • How Much Is A Venture Firm Worth?

    How Much Is A Venture Firm Worth?

    A small spin-out from a publicly traded behemoth launched with the ambitious vision of transforming their entire…

    5 条评论
  • Why War & Peace Is Killing Your Data Budget

    Why War & Peace Is Killing Your Data Budget

    Imagine if every time you edited a document, the word processor forced you to retype everything that had been written…

    3 条评论
  • A Founder's Guide: Essential Management Advice for Startups

    A Founder's Guide: Essential Management Advice for Startups

    As startups scale, effective management becomes the difference between chaotic growth and sustainable success. After…

    10 条评论
  • Lopsided AI Revenues

    Lopsided AI Revenues

    Which is the best business in AI at the moment? I analyzed Q4 revenue data from publicly traded companies across…

    8 条评论

社区洞察

其他会员也浏览了