Data Singularity: how all-inclusive data platforms are conquering the entire data landscape
By "data singularity", I would like to refer to a defined, controlled space brimming with all the essentials for managing and exploiting data, which continually evolves and grows. A data singularity can include an entire universe within a single point of infinite density and gravity in space, where the laws of physics and time as we know them cease to exist.
Just as a singularity lies at the core of a black hole, the data singularity stands at the heart of our data platforms. It’s not merely a repository; it’s a dynamic environment where data, features and applications don’t just accumulate - they expand and interconnect in complex, yet harmonious ways. This intricate web of interconnections offers boundless possibilities for those who navigate it wisely.
Let’s dive into the world of today's data platforms, reflecting on deeper insights matured from Snowflake’s Data Cloud Summit. After some time to organize my thoughts post-summit, I'd like to share more than just the day-to-day highlights, which, while interesting, offered a localized view - take a look at my previous articles if you're interested. The timing couldn’t be better, as also Databricks’ AI and Data Summit has just finished, it has also been at the same Moscone Center - just swap out the blue banners for red!
Data gravity and platforms' evolution
Data gravity and vendor lock-in aren’t new phenomena. What’s changed I guess is the extent of lock-in - it’s more limited now. Most SaaS platforms transparently support multiple cloud providers and open formats for interoperability. This means that with the right architectural design and decoupling, you can still make incremental changes without excessive effort.
Historically, data gravity led to oversized, complex vertical applications that were too costly to extract data from, triggering a vicious cycle often resulting from Shadow IT. Nowadays, we’re seeing a different kind of data gravity that feeds all-inclusive platforms where data is enriched, organized, and shared to drive the entire organization - a positive trend of cooperation and centralized, interoperable views. This phenomenon is so strong that it may not even require data to leave the platform, thanks to all the capabilities available and connectors for data movement, eliminating the need for additional products.
This situation is reminiscent of the concept of singularity in physics - hence the inspiration for the title “data singularity”. Additionally, this metaphor helps to emphasize that data is centralized and shared without making useless copies. Drawing a parallel between a modern data platform and a singularity isn't about labeling it as good or bad. It's a neutral stance, as the impact - positive or negative - largely depends on the platform's internal organization and the operational model of the organization using it.
These platforms have evolved from their original focus on data warehousing to encompass data lakes, lakehouses, machine learning, operational workloads, AI, and beyond. The negative connotation arises if it's all managed as a monolith. But if you treat data as a product, adhere to best practices, and maintain an efficient and effective operational model, then having a rich toolbox with everything accessible near the data is a boon. Therefore, the essence lies in a modular organization aimed at synergy and interoperability.
We’re nearing the maturation of all-inclusive SaaS data platforms, which are increasingly feature-rich but still easy to manage, reducing margins of error and simplifying operational management. This trend has been ongoing for years, but we may be approaching saturation. Solutions now offer out-of-the-box resources to cover all necessary data management capabilities or are easily integrated with vertical products to fill any gaps until they’re absorbed: for instance, think about the power of Snowflake's native apps and how they enable vendors to offer products just within the data cloud and close to the data, ensuring data remains on the platform for enhanced control and reduced customer effort.
The success of solutions like Snowflake’s Data Cloud lies in its simplicity (one product versus many to integrate and manage separately) and effectiveness (data at the center with functionalities around it, rather than the reverse). They offer a unique development, management, and usage experience with reduced cognitive load, contrasting with a more heterogeneous and customizable “bag of tools” approach.?
The road to data management saturation
The saturation point is linked to the introduction of more self-service features, particularly to facilitate development, especially data transformation, and governance, with an increasing focus on integrated catalog, lineage, and data quality.?
领英推荐
Platforms expand as they cover new capabilities, increasing their maturity level and staying competitive through the all-inclusive formula, making life easier for consumers by providing what they would otherwise have to seek elsewhere.?
With saturation (no uncovered capabilities and platforms very close to each other), the focus shifts to strengthening the data supply chain where it’s weakest:
There's less attention on data movement, that is increasingly seen as a commodity, limited to automating the transfer of raw data for enrichment within the platforms.?
The role of AI
AI acts as an accelerator for many functionalities, particularly in interfacing with developers, curators, and consumers. However, it hasn’t yet significantly impacted data management practices or opened new scenarios.?
While AI plays a central role in the arms race and convergence, the competition in data management is broad with many shadow areas. In contrast, AI competition is concentrated on a limited range of unchanged use cases over the past year (categorization, summarization, translation, chatbots for text processing).?
Competition among tech giants increasingly relies on sharing and opening solutions to the public, as seen with polaris catalog to unity catalog - parallels with the AI race, like Meta Llama-3.?
The evolving role of data professionals?
Data professionals’ roles are far from endangered. From analysis to ideation of new initiatives and their execution within a data strategy, to designing tailored solutions considering internal and external contexts, to efficient development following best practices. And then there’s the uncharted territory with much potential: managing organizational and cultural aspects!?
Platforms are like increasingly advanced cars, offering more comfort, performance, and services, culminating in self-driving cars (as I’ve seen in San Francisco) - although this doesn’t seem feasible yet with data platforms.?
As data professionals, we help clients choose, optimize, and steer the identified vehicle - the managed platform on which data products are developed as atomic and interoperable units - and plan and manage the journey - the data strategy, with planning according to needs and priorities, evaluating contingencies, responses and costs to build a tailored experience.?