登录查看更多内容

A Casual Chat on Data Access

Jose Morales

Innovative Technology Strategist | Transforming Challenges into Opportunities through Smart Technology Solutions

发布日期: 2025年3月10日

From Application Intimacy to AI Pipelines

Earlier today, I had an engaging chat with a friend, colleague, and even a customer who asked a simple yet critical question: “How do you choose the right storage technology?” What started as a casual conversation quickly evolved into an in-depth exploration of the four major storage approaches—DAS, SAN, NAS, and Object Storage. In our discussion, we dissected each option, weighing their benefits and trade-offs. Here’s a condensed summary of everything we talked about, taking you on a journey from direct, application-managed storage all the way to modern, AI-driven data pipelines.

We started at the closest level – where the application itself is in charge of handling files – and then expand outwards into layers of abstraction that add convenience (and sometimes complexity) to the mix.

Application and Storage: Intimacy and Performance

At the most intimate level, the application interacts directly with file storage. This is like having a personal librarian who knows exactly where every book is located. The application “opens” a file and communicates directly with the lowest-level I/O primitives – no intermediaries, no extra metadata juggling.

This approach is super fast because there’s minimal overhead. However, raw device access has its trade-offs. While it offers peak performance, dedicating a set of resources solely to this task can drive up the total cost of ownership (TCO). With the high costs of SSD capacity these days, many organizations have moved toward sharing resources rather than each application having its own dedicated hardware.At the most intimate level, the application interacts directly with file storage. This is like having a personal librarian who knows exactly where every book is located. The application “opens” a file and communicates directly with the lowest-level I/O primitives – no intermediaries, no extra metadata juggling. However, fewer and fewer application write RAW device support with the complexity of the mediums and high availability scenarios, which I will not cover in this talk.

Block Storage (SAN or Direct Attached): Performance Meets Complexity

When raw device access isn’t feasible, the next logical step is to introduce a file system—ushering in the world of block storage. Whether deployed as part of a Storage Area Network (SAN) or through direct-attached storage, block storage offers a structured approach where the application still maintains a degree of control over the data. In this setup, the application can manage parts of the metadata embedded in the binary structure, giving it more familiarity with the underlying data compared to fully abstracted storage solutions.

However, layering in components such as the operating system and file system inevitably adds complexity. Managing I/O operations now requires additional memory and CPU cycles, as the OS intermediates every data call. In SAN environments, the situation becomes even more intricate because you must account for dedicated networks and a comprehensive supportability matrix. This matrix brings together everyone from OS and application developers to network gear manufacturers and storage array providers.

Leading the conversation in this arena are companies like #DellEMC, #NetApp, and #PureStorage. These industry giants have earned their stripes by consistently delivering high-performance, reliable block storage solutions that integrate seamlessly into modern IT infrastructures. Their products not only address the inherent challenges of managing layered storage but also offer innovative features—such as advanced management software, high throughput, and low latency—that help enterprises optimize their storage performance while mitigating operational complexity.

The strong market adoption of these solutions is underscored by industry forecasts, which estimate that the Total Addressable Market (TAM) for the SAN sector could reach between USD 28–30 billion by 2024. This robust TAM reflects a high level of acceptance and trust from enterprises around the globe, who recognize that the benefits of over-provisioning and concentrated resource allocation can outweigh the additional complexities involved. In essence, while block storage introduces layers that may seem cumbersome at first glance, the economies of scale, improved performance, and operational efficiencies it brings make it a preferred choice for many large-scale storage deployments.

NAS: A Hybrid Approach Balancing Intimacy and Centralized Management

NAS stands out as a storage solution that blends the intimacy of direct file access with the benefits of centralized management. At its core, NAS leverages protocols like NFS or SMB to enable file sharing over a network, offloading much of the metadata management to dedicated servers. This design originally catered to human users collaborating via shared directories, offering an intuitive and accessible experience.

Over time, however, NAS has evolved far beyond its early roots. Its capabilities now extend to a broad range of use cases, including virtualization and database management. In virtualized environments, NAS provides a stable and manageable platform where features like snapshots and single-file restores are essential. These data management tools allow administrators to quickly roll back to a previous state, ensuring minimal downtime and robust disaster recovery (DR) strategies. For databases—such as those running on #Postgres—these snapshot and restore capabilities help maintain data integrity while simplifying backup and recovery operations.

Additionally, modern NAS solutions integrate replication features that, although not typically deployed in an active-active configuration, offer a reliable foundation for DR success. When combined with TCP/IP networking protocols, NAS facilitates efficient data replication across geographically separated sites. This replication, paired with advanced snapshot technologies, ensures that organizations can achieve near-continuous data protection and rapid recovery in the event of a failure or disaster.

Virtualization is another domain where NAS truly shines. In virtual environments, the ability to quickly capture snapshots and perform single-file restores is critical—not only for backing up entire virtual machines but also for protecting individual application data. NAS’s flexibility in handling these tasks makes it a preferred choice for many IT infrastructures seeking to balance performance with ease of management.

NAS still offers a compelling mix of simplicity, versatility, and powerful data management features. Its adaptability means it can support a wide array of applications—from collaborative file sharing and database management to virtualization and disaster recovery. With features like snapshots, single-file restores, and robust replication over TCP/IP, NAS has cemented its role as a key player in modern data storage architectures, providing a platform that meets both performance demands and operational efficiency.

Finally, let’s dive into Object Storage—the realm of S3—and explore why it’s often called the machine-to-machine file system. Object storage takes a radically different approach to data management. Unlike traditional file systems, it stores data as discrete objects, each with its own rich metadata, and makes them accessible over simple HTTP/TCP protocols. This simplicity in design makes object storage a perfect interface for modern applications where direct machine-to-machine communication is key.

Over the past four years, the Total Addressable Market (TAM) for object storage has experienced remarkable growth. In 2022, industry estimates valued the global object storage market at around USD 10 billion, and recent forecasts project it will approach nearly USD 17 billion by 2024—reflecting a robust compound annual growth rate in the range of 15–18%. This surge in TAM underscores a high level of market acceptance driven by the rapid expansion of cloud computing and big data analytics. It highlights the increasing reliance on scalable, cost-effective storage solutions, not only among major cloud providers but also within on-premise environments where performance and affordability are critical.

Major players in the cloud space, such as #AmazonS3, #GoogleCloudStorage, and #AzureBlobStorage, have set the benchmark by delivering robust, highly available object storage services. On the non-cloud side, solutions like #DellEMC ECS, #Ceph, and #MinIO offer enterprises the flexibility to deploy object storage within their own data centers. These platforms are designed to bridge the gap between high-performance SSDs and cost-efficient tapes, providing a unified storage interface that works seamlessly across different media types.

The versatility of object storage extends to a wide range of use cases. Initially, it wasn’t designed for human interaction—attempting a traditional NAS-like interface with S3 can be cumbersome. However, modern applications have embraced its flexibility for everything from data lakes built on formats like Apache Iceberg, Avro, or Parquet, to backup and disaster recovery solutions that leverage its inherent scalability. Object storage is especially attractive for archival purposes and for environments where data needs to be accessed and processed in a highly distributed, machine-to-machine fashion.

One of the hottest areas driving object storage adoption today is the AI pipeline. In these scenarios, S3 isn’t merely a repository for static files; it’s an active, integral component of the AI ecosystem. Whether feeding data into deep learning training processes or handling real-time inference, the ability to scale seamlessly and tier storage options based on performance or cost (from SSD to tape) makes object storage the backbone of many AI-driven workflows. Its API simplicity not only appeals to developers but also ensures interoperability across a diverse range of platforms and services.

Furthermore, object storage’s design makes it an ideal interface for technologies like TAPEs (Tape Archives). By providing a unified system that can sit in front of any media type, object storage enables organizations to optimize both performance and cost. It allows for the rapid retrieval of hot data on SSDs while also managing long-term, cold data on tapes in a cost-effective manner. Want a backup? Find the then find the object you need. What could be easier to integrate with an RMAN script than a HTTP target to write a backup to?

In my personal opinion, object storage is much more than a file repository—it’s a flexible, scalable, and efficient platform that caters to a variety of modern computing needs. From powering AI pipelines and big data analytics to providing robust backup and disaster recovery solutions, its machine-to-machine architecture ensures that data is always just a few API calls away, regardless of the underlying hardware. This unique blend of performance, versatility, and cost efficiency is why object storage continues to redefine how we think about data management in today’s digital landscape.

Wrapping Up: The Balancing Act of Performance, Cost, and Flexibility

So, where do we land? It really depends on your priorities:

Application Storage offers unmatched performance when the application knows best, but at a high cost.

Block Storage (SAN/Direct Attached) strikes a balance, giving you direct control with a little extra complexity.

NAS provides an excellent compromise with built-in collaboration and robust metadata management.

Object Storage (S3), while less “human friendly,” opens up a world of possibilities – especially for modern AI pipelines and large-scale, streaming data applications. It’s the Machine—to—Machine filesystem.

Each approach has its place in the modern data center, and as your needs evolve, so does the way you access your data. The journey from raw, direct access to a fully abstracted, cloud-scale object storage system isn’t just about technology – it’s about finding the right balance between performance, cost, and operational complexity.

I hope this provides some insight into storage, while not a deep dive into each storage technology, I hope it adds a casual look on how to approach each one fi the options. This was a blast for me to talk about and write. Feel free to engage and counter of extend any part of this fun conversation.

Luca Rossetti

Business Technologist | CMC | Lieutenant (Navy, reserve) | Maker

1 周

Really enjoyed your article, Jose. It’s a fantastic overview of how storage options evolve from application-level control to AI pipelines! I was wondering about something: how latency might play a role in picking the right tech. Like, object storage is amazing for scalability and AI, but its HTTP access could feel a bit slow for real-time stuff compared to SAN’s block storage (even with "fast" object storage). I’d love to hear your thoughts on how that factors in.

1 次回应

Tomas Calvo Gomez

If it’s easy, it ain’t for me

1 周

I would add in the checks and balances the total cost of maintenance over time for each option. Moving from one NAS to another (for more performance or to manage obsolescence) is fairly trivial, SAN more complicated or more sensitive at least, and moving from one Object Store to another could be a daunting task if you have to rename every single object and change it in your application. As always, mileage may vary

1 次回应

查看更多评论

要查看或添加评论，请登录

Jose Morales的更多文章

Domain-Specific Distillation and Adaptive Routing

2025年3月3日

Domain-Specific Distillation and Adaptive Routing

Over the past year, I’ve been exploring a paradigm shift in how we deploy large language models (LLMs). Considering the…

1 条评论
S3 Table, New Paradigm in Object Storage

2024年12月5日

S3 Table, New Paradigm in Object Storage

Reflecting on the recent AWS re:Invent event, I’m genuinely thrilled by the array of innovative technologies that AWS…

2 条评论
Broadcom / VMware done!

2023年11月23日

Broadcom / VMware done!

Is VMware Missing the Boat, or Is Broadcom Seizing Its Golden Ticket? In a recent, engaging discussion with former…

1 条评论
Leveraging Embeddings: Beyond the Obvious

2023年10月25日

Leveraging Embeddings: Beyond the Obvious

In the contemporary tech landscape, Large Language Models (LLMs) stand out prominently. While systems like ChatGPT…
Navigating the Data Deluge: A Reflection on Accelerating Business Value through M2M Data Management

2023年10月6日

Navigating the Data Deluge: A Reflection on Accelerating Business Value through M2M Data Management

In the contemporary digital epoch, the ascension of data to an almost gravitational force within organizational realms…
The Rising Impact of Large Language Models in the Enterprise

2023年7月19日

The Rising Impact of Large Language Models in the Enterprise

In the ever-evolving landscape of artificial intelligence, Large Language Models or #LLMs like #ChatGPT are making…
Starting a Startup: It's Hard, but Worth It

2023年7月7日

Starting a Startup: It's Hard, but Worth It

Three weeks ago, I was on the verge of succumbing to the monotony of my everyday life. The routine was stifling, and…

8 条评论
ChatGPT *LLM is the endgame for most databases.

2023年2月1日

ChatGPT *LLM is the endgame for most databases.

Get ready to be stunned! The latest breakthrough in disruptive technology is none other than Chat-GPT, powered by Large…
The Ransomware Discussion...

2021年6月30日

The Ransomware Discussion...

I have been speaking to many customer lately, in those discussions, there has not been a single customer that is no…
Software Defined HCI?

2018年6月1日

Software Defined HCI?

Disclosure, I work at Pure Storage, but I have my own mind and share ideas publicly with no direct endorsement of my…

2 条评论

See all articles

From Application Intimacy to AI Pipelines

Application and Storage: Intimacy and Performance

Block Storage (SAN or Direct Attached): Performance Meets Complexity

NAS: A Hybrid Approach Balancing Intimacy and Centralized Management

Wrapping Up: The Balancing Act of Performance, Cost, and Flexibility

Jose Morales的更多文章

Domain-Specific Distillation and Adaptive Routing

S3 Table, New Paradigm in Object Storage

Broadcom / VMware done!

Leveraging Embeddings: Beyond the Obvious

Navigating the Data Deluge: A Reflection on Accelerating Business Value through M2M Data Management

The Rising Impact of Large Language Models in the Enterprise

Starting a Startup: It's Hard, but Worth It

ChatGPT *LLM is the endgame for most databases.

The Ransomware Discussion...

Software Defined HCI?

社区洞察